1
|
ChIP-GSM: Inferring active transcription factor modules to predict functional regulatory elements. PLoS Comput Biol 2021; 17:e1009203. [PMID: 34292930 PMCID: PMC8330942 DOI: 10.1371/journal.pcbi.1009203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 08/03/2021] [Accepted: 06/20/2021] [Indexed: 11/19/2022] Open
Abstract
Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.
Collapse
|
2
|
Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biol 2020; 21:22. [PMID: 32014034 PMCID: PMC6996192 DOI: 10.1186/s13059-020-1929-3] [Citation(s) in RCA: 216] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/08/2020] [Indexed: 12/16/2022] Open
Abstract
Assay of Transposase Accessible Chromatin sequencing (ATAC-seq) is widely used in studying chromatin biology, but a comprehensive review of the analysis tools has not been completed yet. Here, we discuss the major steps in ATAC-seq data analysis, including pre-analysis (quality check and alignment), core analysis (peak calling), and advanced analysis (peak differential analysis and annotation, motif enrichment, footprinting, and nucleosome position analysis). We also review the reconstruction of transcriptional regulatory networks with multiomics data and highlight the current challenges of each step. Finally, we describe the potential of single-cell ATAC-seq and highlight the necessity of developing ATAC-seq specific analysis tools to obtain biologically meaningful insights.
Collapse
Affiliation(s)
- Feng Yan
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia
| | - David R Powell
- Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia
| | - David J Curtis
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia.,Department of Clinical Haematology, Alfred Health, Melbourne, VIC, Australia
| | - Nicholas C Wong
- Australian Centre for Blood Diseases, Central Clinical School, Monash University, Melbourne, VIC, Australia. .,Monash Bioinformatics Platform, Monash University, Melbourne, VIC, Australia.
| |
Collapse
|
3
|
Essebier A, Lamprecht M, Piper M, Bodén M. Bioinformatics approaches to predict target genes from transcription factor binding data. Methods 2017; 131:111-119. [DOI: 10.1016/j.ymeth.2017.09.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2017] [Revised: 08/29/2017] [Accepted: 09/03/2017] [Indexed: 12/28/2022] Open
|
4
|
Cacho A, Yao W, Cui X. Base-Calling Using a Random Effects Mixture Model on Next-Generation Sequencing Data. STATISTICS IN BIOSCIENCES 2017. [DOI: 10.1007/s12561-017-9190-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
5
|
Hansen P, Hecht J, Ibn-Salem J, Menkuec BS, Roskosch S, Truss M, Robinson PN. Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus. BMC Genomics 2016; 17:873. [PMID: 27814676 PMCID: PMC5097360 DOI: 10.1186/s12864-016-3164-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Accepted: 10/12/2016] [Indexed: 12/22/2022] Open
Abstract
Background ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal of PCR duplicates and for quality control. Furthermore, we developed bespoke methods to estimate the width of the protected region resulting from protein-DNA binding and to infer binding positions from ChIP-nexus data. Finally, we applied our peak calling method as well as the two other methods MACE and MACS2 to the available ChIP-nexus data. Conclusions The Q-nexus software is efficient and easy to use. Novel statistics about duplication rates in consideration of random barcodes are calculated. Our method for the estimation of the width of the protected region yields unbiased signatures that are highly reproducible for biological replicates and at the same time very specific for the respective factors analyzed. As judged by the irreproducible discovery rate (IDR), our peak calling algorithm shows a substantially better reproducibility. An implementation of Q-nexus is available at http://charite.github.io/Q/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3164-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Peter Hansen
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany.,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jonas Ibn-Salem
- Faculty of Biology, Johannes Gutenberg University Mainz, Ackermannweg 4, Mainz, 55128, Germany.,Institute of Molecular Biology, Ackermannweg 4, Mainz, 55128, Germany
| | - Benjamin S Menkuec
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Sebastian Roskosch
- Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany
| | - Matthias Truss
- Labor für Pädiatrische Molekularbiologie, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany
| | - Peter N Robinson
- Institute for Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Berlin Brandenburg Center for Regenerative Therapies (BCRT), Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany. .,Institute for Bioinformatics, Department of Mathematics and Computer Science, Freie Universität Berlin, Arnimallee 14, Berlin, 14195, Germany. .,Max Planck Institute for Molecular Genetics, Inhestr. 63-73, Berlin, 14195, Germany. .,Current address: The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, 06032, CT, USA.
| |
Collapse
|
6
|
Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, Deng X. Identification of genes escaping X inactivation by allelic expression analysis in a novel hybrid mouse model. Data Brief 2015; 5:761-9. [PMID: 26693509 PMCID: PMC4659812 DOI: 10.1016/j.dib.2015.10.033] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Revised: 10/02/2015] [Accepted: 10/19/2015] [Indexed: 11/29/2022] Open
Abstract
X chromosome inactivation (XCI) is a female-specific mechanism that serves to balance gene dosage between the sexes whereby one X chromosome in females is inactivated during early development. Despite this silencing, a small portion of genes escape inactivation and remain expressed from the inactive X (Xi). Little is known about the distribution of escape from XCI in different tissues in vivo and about the mechanisms that control tissue-specific differences. Using a new binomial model in conjunction with a mouse model with identifiable alleles and skewed X inactivation we are able to survey genes that escape XCI in vivo. We show that escape from X inactivation can be a common feature of some genes, whereas others escape in a tissue specific manner. Furthermore, we characterize the chromatin environment of escape genes and show that expression from the Xi correlates with factors associated with open chromatin and that CTCF co-localizes with escape genes. Here, we provide a detailed description of the experimental design and data analysis pipeline we used to assay allele-specific expression and epigenetic characteristics of genes escaping X inactivation. The data is publicly available through the GEO database under ascension numbers GSM1014171, GSE44255, and GSE59779. Interpretation and discussion of these data are included in a previously published study (Berletch et al., 2015) [1].
Collapse
Affiliation(s)
- Joel B Berletch
- Department of Pathology, University of Washington, Seattle, WA, USA
| | - Wenxiu Ma
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Fan Yang
- Department of Pathology, University of Washington, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Christine M Disteche
- Department of Pathology, University of Washington, Seattle, WA, USA ; Department of Medicine, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, WA, USA
| |
Collapse
|
7
|
Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, Deng X. Escape from X inactivation varies in mouse tissues. PLoS Genet 2015; 11:e1005079. [PMID: 25785854 PMCID: PMC4364777 DOI: 10.1371/journal.pgen.1005079] [Citation(s) in RCA: 185] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 02/17/2015] [Indexed: 12/22/2022] Open
Abstract
X chromosome inactivation (XCI) silences most genes on one X chromosome in female mammals, but some genes escape XCI. To identify escape genes in vivo and to explore molecular mechanisms that regulate this process we analyzed the allele-specific expression and chromatin structure of X-linked genes in mouse tissues and cells with skewed XCI and distinguishable alleles based on single nucleotide polymorphisms. Using a binomial model to assess allelic expression, we demonstrate a continuum between complete silencing and expression from the inactive X (Xi). The validity of the RNA-seq approach was verified using RT-PCR with species-specific primers or Sanger sequencing. Both common escape genes and genes with significant differences in XCI status between tissues were identified. Such genes may be candidates for tissue-specific sex differences. Overall, few genes (3-7%) escape XCI in any of the mouse tissues examined, suggesting stringent silencing and escape controls. In contrast, an in vitro system represented by the embryonic-kidney-derived Patski cell line showed a higher density of escape genes (21%), representing both kidney-specific escape genes and cell-line specific escape genes. Allele-specific RNA polymerase II occupancy and DNase I hypersensitivity at the promoter of genes on the Xi correlated well with levels of escape, consistent with an open chromatin structure at escape genes. Allele-specific CTCF binding on the Xi clustered at escape genes and was denser in brain compared to the Patski cell line, possibly contributing to a more compartmentalized structure of the Xi and fewer escape genes in brain compared to the cell line where larger domains of escape were observed.
Collapse
Affiliation(s)
- Joel B. Berletch
- Department of Pathology, University of Washington, Seattle, Washington, United States of America
| | - Wenxiu Ma
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Fan Yang
- Department of Pathology, University of Washington, Seattle, Washington, United States of America
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | - Christine M. Disteche
- Department of Pathology, University of Washington, Seattle, Washington, United States of America
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Xinxian Deng
- Department of Pathology, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
8
|
van Dam JCJ, Schaap PJ, Martins dos Santos VAP, Suárez-Diez M. Integration of heterogeneous molecular networks to unravel gene-regulation in Mycobacterium tuberculosis. BMC SYSTEMS BIOLOGY 2014; 8:111. [PMID: 25279447 PMCID: PMC4181829 DOI: 10.1186/s12918-014-0111-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/05/2014] [Indexed: 12/23/2022]
Abstract
BACKGROUND Different methods have been developed to infer regulatory networks from heterogeneous omics datasets and to construct co-expression networks. Each algorithm produces different networks and efforts have been devoted to automatically integrate them into consensus sets. However each separate set has an intrinsic value that is diluted and partly lost when building a consensus network. Here we present a methodology to generate co-expression networks and, instead of a consensus network, we propose an integration framework where the different networks are kept and analysed with additional tools to efficiently combine the information extracted from each network. RESULTS We developed a workflow to efficiently analyse information generated by different inference and prediction methods. Our methodology relies on providing the user the means to simultaneously visualise and analyse the coexisting networks generated by different algorithms, heterogeneous datasets, and a suite of analysis tools. As a show case, we have analysed the gene co-expression networks of Mycobacterium tuberculosis generated using over 600 expression experiments. Regarding DNA damage repair, we identified SigC as a key control element, 12 new targets for LexA, an updated LexA binding motif, and a potential mismatch repair system. We expanded the DevR regulon with 27 genes while identifying 9 targets wrongly assigned to this regulon. We discovered 10 new genes linked to zinc uptake and a new regulatory mechanism for ZuR. The use of co-expression networks to perform system level analysis allows the development of custom made methodologies. As show cases we implemented a pipeline to integrate ChIP-seq data and another method to uncover multiple regulatory layers. CONCLUSIONS Our workflow is based on representing the multiple types of information as network representations and presenting these networks in a synchronous framework that allows their simultaneous visualization while keeping specific associations from the different networks. By simultaneously exploring these networks and metadata, we gained insights into regulatory mechanisms in M. tuberculosis that could not be obtained through the separate analysis of each data type.
Collapse
Affiliation(s)
- Jesse CJ van Dam
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | - Peter J Schaap
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| | - Vitor AP Martins dos Santos
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
- />LifeGlimmer GmbH, Markelstrasse 38, Berlin, Germany
| | - María Suárez-Diez
- />Laboratory of Systems and Synthetic Biology, Wageningen University, Dreijenplein 10, 6703 HB Wageningen, The Netherlands
| |
Collapse
|
9
|
Warton K, Lin V, Navin T, Armstrong NJ, Kaplan W, Ying K, Gloss B, Mangs H, Nair SS, Hacker NF, Sutherland RL, Clark SJ, Samimi G. Methylation-capture and Next-Generation Sequencing of free circulating DNA from human plasma. BMC Genomics 2014; 15:476. [PMID: 24929644 PMCID: PMC4078241 DOI: 10.1186/1471-2164-15-476] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 06/04/2014] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Free circulating DNA (fcDNA) has many potential clinical applications, due to the non-invasive way in which it is collected. However, because of the low concentration of fcDNA in blood, genome-wide analysis carries many technical challenges that must be overcome before fcDNA studies can reach their full potential. There are currently no definitive standards for fcDNA collection, processing and whole-genome sequencing. We report novel detailed methodology for the capture of high-quality methylated fcDNA, library preparation and downstream genome-wide Next-Generation Sequencing. We also describe the effects of sample storage, processing and scaling on fcDNA recovery and quality. RESULTS Use of serum versus plasma, and storage of blood prior to separation resulted in genomic DNA contamination, likely due to leukocyte lysis. Methylated fcDNA fragments were isolated from 5 donors using a methyl-binding protein-based protocol and appear as a discrete band of ~180 bases. This discrete band allows minimal sample loss at the size restriction step in library preparation for Next-Generation Sequencing, allowing for high-quality sequencing from minimal amounts of fcDNA. Following sequencing, we obtained 37 × 10(6)-86 × 10(6) unique mappable reads, representing more than 50% of total mappable reads. The methylation status of 9 genomic regions as determined by DNA capture and sequencing was independently validated by clonal bisulphite sequencing. CONCLUSIONS Our optimized methods provide high-quality methylated fcDNA suitable for whole-genome sequencing, and allow good library complexity and accurate sequencing, despite using less than half of the recommended minimum input DNA.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Goli Samimi
- Garvan Institute and The Kinghorn Cancer Centre, 370 Victoria Street, Darlinghurst, Sydney, NSW 2010, Australia.
| |
Collapse
|
10
|
Silva IT, Rosales RA, Holanda AJ, Nussenzweig MC, Jankovic M. Identification of chromosomal translocation hotspots via scan statistics. ACTA ACUST UNITED AC 2014; 30:2551-8. [PMID: 24860160 DOI: 10.1093/bioinformatics/btu351] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION The detection of genomic regions unusually rich in a given pattern is an important undertaking in the analysis of next-generation sequencing data. Recent studies of chromosomal translocations in activated B lymphocytes have identified regions that are frequently translocated to c-myc oncogene. A quantitative method for the identification of translocation hotspots was crucial to this study. Here we improve this analysis by using a simple probabilistic model and the framework provided by scan statistics to define the number and location of translocation breakpoint hotspots. A key feature of our method is that it provides a global chromosome-wide nominal control level to clustering, as opposed to previous methods based on local criteria. While being motivated by a specific application, the detection of unusual clusters is a widespread problem in bioinformatics. We expect our method to be useful in the analysis of data from other experimental approaches such as of ChIP-seq and 4C-seq. RESULTS The analysis of translocations from B lymphocytes with the method described here reveals the presence of longer hotspots when compared with those defined previously. Further, we show that the hotspot size changes substantially in the absence of DNA repair protein 53BP1. When 53BP1 deficiency is combined with overexpression of activation-induced cytidine deaminase, the hotspot length increases even further. These changes are not detected by previous methods that use local significance criteria for clustering. Our method is also able to identify several exclusive translocation hotspots located in genes of known tumor supressors. AVAILABILITY AND IMPLEMENTATION The detection of translocation hotspots is done with hot_scan, a program implemented in R and Perl. Source code and documentation are freely available for download at https://github.com/itojal/hot_scan.
Collapse
Affiliation(s)
- Israel T Silva
- Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil
| | - Rafael A Rosales
- Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil
| | - Adriano J Holanda
- Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil
| | - Michel C Nussenzweig
- Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil
| | - Mila Jankovic
- Laboratory of Molecular Immunology, The Rockefeller University, 1230 York Avenue, New York, NY 10065, USA, Departamento de Computação e Matemática, Universidade de São Paulo. Av. Bandeirantes, 3900, Ribeirão Preto, CEP 14049-901 and National Institute of Science and Technology in Stem Cell and Cell Therapy and Center for Cell Based Therapy. Rua Catão Roxo, 2501, Ribeirão Preto, CEP 14051-140, SP, Brazil
| |
Collapse
|
11
|
Transcription factor binding sites prediction based on modified nucleosomes. PLoS One 2014; 9:e89226. [PMID: 24586611 PMCID: PMC3931712 DOI: 10.1371/journal.pone.0089226] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2013] [Accepted: 01/17/2014] [Indexed: 11/19/2022] Open
Abstract
In computational methods, position weight matrices (PWMs) are commonly applied for transcription factor binding site (TFBS) prediction. Although these matrices are more accurate than simple consensus sequences to predict actual binding sites, they usually produce a large number of false positive (FP) predictions and so are impoverished sources of information. Several studies have employed additional sources of information such as sequence conservation or the vicinity to transcription start sites to distinguish true binding regions from random ones. Recently, the spatial distribution of modified nucleosomes has been shown to be associated with different promoter architectures. These aligned patterns can facilitate DNA accessibility for transcription factors. We hypothesize that using data from these aligned and periodic patterns can improve the performance of binding region prediction. In this study, we propose two effective features, “modified nucleosomes neighboring” and “modified nucleosomes occupancy”, to decrease FP in binding site discovery. Based on these features, we designed a logistic regression classifier which estimates the probability of a region as a TFBS. Our model learned each feature based on Sp1 binding sites on Chromosome 1 and was tested on the other chromosomes in human CD4+T cells. In this work, we investigated 21 histone modifications and found that only 8 out of 21 marks are strongly correlated with transcription factor binding regions. To prove that these features are not specific to Sp1, we combined the logistic regression classifier with the PWM, and created a new model to search TFBSs on the genome. We tested the model using transcription factors MAZ, PU.1 and ELF1 and compared the results to those using only the PWM. The results show that our model can predict Transcription factor binding regions more successfully. The relative simplicity of the model and capability of integrating other features make it a superior method for TFBS prediction.
Collapse
|
12
|
Wang S, Sun H, Ma J, Zang C, Wang C, Wang J, Tang Q, Meyer CA, Zhang Y, Liu XS. Target analysis by integration of transcriptome and ChIP-seq data with BETA. Nat Protoc 2013; 8:2502-15. [PMID: 24263090 DOI: 10.1038/nprot.2013.150] [Citation(s) in RCA: 353] [Impact Index Per Article: 32.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression. Several recently published methods combine transcription factor (TF) binding and gene expression for target prediction, but few of them provide an efficient software package for the community. Binding and expression target analysis (BETA) is a software package that integrates ChIP-seq of TFs or chromatin regulators with differential gene expression data to infer direct target genes. BETA has three functions: (i) to predict whether the factor has activating or repressive function; (ii) to infer the factor's target genes; and (iii) to identify the motif of the factor and its collaborators, which might modulate the factor's activating or repressive function. Here we describe the implementation and features of BETA to demonstrate its application to several data sets. BETA requires ~1 GB of RAM, and the procedure takes 20 min to complete. BETA is available open source at http://cistrome.org/BETA/.
Collapse
Affiliation(s)
- Su Wang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Peterson KA, Nishi Y, Ma W, Vedenko A, Shokri L, Zhang X, McFarlane M, Baizabal JM, Junker JP, van Oudenaarden A, Mikkelsen T, Bernstein BE, Bailey TL, Bulyk ML, Wong WH, McMahon AP. Neural-specific Sox2 input and differential Gli-binding affinity provide context and positional information in Shh-directed neural patterning. Genes Dev 2013; 26:2802-16. [PMID: 23249739 DOI: 10.1101/gad.207142.112] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
In the vertebrate neural tube, regional Sonic hedgehog (Shh) signaling invokes a time- and concentration-dependent induction of six different cell populations mediated through Gli transcriptional regulators. Elsewhere in the embryo, Shh/Gli responses invoke different tissue-appropriate regulatory programs. A genome-scale analysis of DNA binding by Gli1 and Sox2, a pan-neural determinant, identified a set of shared regulatory regions associated with key factors central to cell fate determination and neural tube patterning. Functional analysis in transgenic mice validates core enhancers for each of these factors and demonstrates the dual requirement for Gli1 and Sox2 inputs for neural enhancer activity. Furthermore, through an unbiased determination of Gli-binding site preferences and analysis of binding site variants in the developing mammalian CNS, we demonstrate that differential Gli-binding affinity underlies threshold-level activator responses to Shh input. In summary, our results highlight Sox2 input as a context-specific determinant of the neural-specific Shh response and differential Gli-binding site affinity as an important cis-regulatory property critical for interpreting Shh morphogen action in the mammalian neural tube.
Collapse
Affiliation(s)
- Kevin A Peterson
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Christova R. Detecting DNA–Protein Interactions in Living Cells—ChIP Approach. PROTEIN-NUCLEIC ACIDS INTERACTIONS 2013; 91:101-33. [DOI: 10.1016/b978-0-12-411637-5.00004-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
15
|
Uenishi H, Morozumi T, Toki D, Eguchi-Ogawa T, Rund LA, Schook LB. Large-scale sequencing based on full-length-enriched cDNA libraries in pigs: contribution to annotation of the pig genome draft sequence. BMC Genomics 2012; 13:581. [PMID: 23150988 PMCID: PMC3499286 DOI: 10.1186/1471-2164-13-581] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2011] [Accepted: 08/09/2012] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Along with the draft sequencing of the pig genome, which has been completed by an international consortium, collection of the nucleotide sequences of genes expressed in various tissues and determination of entire cDNA sequences are necessary for investigations of gene function. The sequences of expressed genes are also useful for genome annotation, which is important for isolating the genes responsible for particular traits. RESULTS We performed a large-scale expressed sequence tag (EST) analysis in pigs by using 32 full-length-enriched cDNA libraries derived from 28 kinds of tissues and cells, including seven tissues (brain, cerebellum, colon, hypothalamus, inguinal lymph node, ovary, and spleen) derived from pigs that were cloned from a sow subjected to genome sequencing. We obtained more than 330,000 EST reads from the 5'-ends of the cDNA clones. Comparison with human and bovine gene catalogs revealed that the ESTs corresponded to at least 15,000 genes. cDNA clones representing contigs and singlets generated by assembly of the EST reads were subjected to full-length determination of inserts. We have finished sequencing 31,079 cDNA clones corresponding to more than 12,000 genes. Mapping of the sequences of these cDNA clones on the draft sequence of the pig genome has indicated that the clones are derived from about 15,000 independent loci on the pig genome. CONCLUSIONS ESTs and cDNA sequences derived from full-length-enriched libraries are valuable for annotation of the draft sequence of the pig genome. This information will also contribute to the exploration of promoter sequences on the genome and to molecular biology-based analyses in pigs.
Collapse
Affiliation(s)
- Hirohide Uenishi
- Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
- Division of Animal Sciences, National Institute of Agrobiological Sciences, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
- Animal Genome Research Program, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
| | - Takeya Morozumi
- Animal Genome Research Program, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
- Animal Research Division, Japan Institute of Association for Techno-innovation in Agriculture, Forestry and Fisheries, 446-1 Ippaizuka, Kamiyokoba, Tsukuba, Ibaraki, 305-0854, Japan
| | - Daisuke Toki
- Animal Genome Research Program, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
- Animal Research Division, Japan Institute of Association for Techno-innovation in Agriculture, Forestry and Fisheries, 446-1 Ippaizuka, Kamiyokoba, Tsukuba, Ibaraki, 305-0854, Japan
| | - Tomoko Eguchi-Ogawa
- Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
- Animal Genome Research Program, 2 Ikenodai, Tsukuba, Ibaraki, 305-8602, Japan
| | - Lauretta A Rund
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, IL, 61801, USA
| | - Lawrence B Schook
- Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1206 West Gregory Drive, Urbana, IL, 61801, USA
| |
Collapse
|
16
|
Park JS, Ma W, O'Brien LL, Chung E, Guo JJ, Cheng JG, Valerius MT, McMahon JA, Wong WH, McMahon AP. Six2 and Wnt regulate self-renewal and commitment of nephron progenitors through shared gene regulatory networks. Dev Cell 2012; 23:637-51. [PMID: 22902740 DOI: 10.1016/j.devcel.2012.07.008] [Citation(s) in RCA: 188] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Revised: 05/24/2012] [Accepted: 07/15/2012] [Indexed: 01/09/2023]
Abstract
A balance between Six2-dependent self-renewal and canonical Wnt signaling-directed commitment regulates mammalian nephrogenesis. Intersectional studies using chromatin immunoprecipitation and transcriptional profiling identified direct target genes shared by each pathway within nephron progenitors. Wnt4 and Fgf8 are essential for progenitor commitment; cis-regulatory modules flanking each gene are cobound by Six2 and β-catenin and are dependent on conserved Lef/Tcf binding sites for activity. In vitro and in vivo analyses suggest that Six2 and Lef/Tcf factors form a regulatory complex that promotes progenitor maintenance while entry of β-catenin into this complex promotes nephrogenesis. Alternative transcriptional responses associated with Six2 and β-catenin cobinding events occur through non-Lef/Tcf DNA binding mechanisms, highlighting the regulatory complexity downstream of Wnt signaling in the developing mammalian kidney.
Collapse
Affiliation(s)
- Joo-Seop Park
- Division of Pediatric Urology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Abstract
Chromatin immunoprecipitation (ChIP) is used to map the interaction between proteins and DNA at a specific genomic locus in the living cell. The protein-DNA complexes are stabilized already in vivo by reversible crosslinking and the DNA is sheared by sonication or enzymatic digestion into fragments suitable for the subsequent immunoprecipitation step. Antibodies recognizing chromatin-linked proteins, transcription factors, artificial tags, or specific protein modifications are then used to pull down DNA-protein complexes containing the target. After reversal of crosslinks and DNA purification locus-specific quantitative PCR is used to determine the amount of DNA that was associated with the target at a given time point and experimental condition. DNA quantification can be carried out for several genomic regions by multiple qPCRs or at a genome-wide scale by massive parallel sequencing (ChIP-Seq).
Collapse
|