1
|
Jalili V, Cremona MA, Palluzzi F. Rescuing biologically relevant consensus regions across replicated samples. BMC Bioinformatics 2023; 24:240. [PMID: 37286963 PMCID: PMC10246347 DOI: 10.1186/s12859-023-05340-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 05/16/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND Protein-DNA binding sites of ChIP-seq experiments are identified where the binding affinity is significant based on a given threshold. The choice of the threshold is a trade-off between conservative region identification and discarding weak, but true binding sites. RESULTS We rescue weak binding sites using MSPC, which efficiently exploits replicates to lower the threshold required to identify a site while keeping a low false-positive rate, and we compare it to IDR, a widely used post-processing method for identifying highly reproducible peaks across replicates. We observe several master transcription regulators (e.g., SP1 and GATA3) and HDAC2-GATA1 regulatory networks on rescued regions in K562 cell line. CONCLUSIONS We argue the biological relevance of weak binding sites and the information they add when rescued by MSPC. An implementation of the proposed extended MSPC methodology and the scripts to reproduce the performed analysis are freely available at https://genometric.github.io/MSPC/ ; MSPC is distributed as a command-line application and an R package available from Bioconductor ( https://doi.org/doi:10.18129/B9.bioc.rmspc ).
Collapse
Affiliation(s)
- Vahid Jalili
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Marzia A Cremona
- Department of Operations and Decision Systems, Université Laval, Quebec, Canada.
- CHU de Québec - Université Laval Research Center, Quebec, Canada.
| | - Fernando Palluzzi
- Department of Brain and Behavioral Sciences, Università di Pavia, Pavia, Italy.
| |
Collapse
|
2
|
Comprehensive assessment of differential ChIP-seq tools guides optimal algorithm selection. Genome Biol 2022; 23:119. [PMID: 35606795 PMCID: PMC9128273 DOI: 10.1186/s13059-022-02686-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 05/09/2022] [Indexed: 11/21/2022] Open
Abstract
Background The analysis of chromatin binding patterns of proteins in different biological states is a main application of chromatin immunoprecipitation followed by sequencing (ChIP-seq). A large number of algorithms and computational tools for quantitative comparison of ChIP-seq datasets exist, but their performance is strongly dependent on the parameters of the biological system under investigation. Thus, a systematic assessment of available computational tools for differential ChIP-seq analysis is required to guide the optimal selection of analysis tools based on the present biological scenario. Results We created standardized reference datasets by in silico simulation and sub-sampling of genuine ChIP-seq data to represent different biological scenarios and binding profiles. Using these data, we evaluated the performance of 33 computational tools and approaches for differential ChIP-seq analysis. Tool performance was strongly dependent on peak size and shape as well as on the scenario of biological regulation. Conclusions Our analysis provides unbiased guidelines for the optimized choice of software tools in differential ChIP-seq analysis. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02686-y.
Collapse
|
3
|
Complete loss of H3K9 methylation dissolves mouse heterochromatin organization. Nat Commun 2021; 12:4359. [PMID: 34272378 PMCID: PMC8285382 DOI: 10.1038/s41467-021-24532-8] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 06/17/2021] [Indexed: 12/26/2022] Open
Abstract
Histone H3 lysine 9 (H3K9) methylation is a central epigenetic modification that defines heterochromatin from unicellular to multicellular organisms. In mammalian cells, H3K9 methylation can be catalyzed by at least six distinct SET domain enzymes: Suv39h1/Suv39h2, Eset1/Eset2 and G9a/Glp. We used mouse embryonic fibroblasts (MEFs) with a conditional mutation for Eset1 and introduced progressive deletions for the other SET domain genes by CRISPR/Cas9 technology. Compound mutant MEFs for all six SET domain lysine methyltransferase (KMT) genes lack all H3K9 methylation states, derepress nearly all families of repeat elements and display genomic instabilities. Strikingly, the 6KO H3K9 KMT MEF cells no longer maintain heterochromatin organization and have lost electron-dense heterochromatin. This is a compelling analysis of H3K9 methylation-deficient mammalian chromatin and reveals a definitive function for H3K9 methylation in protecting heterochromatin organization and genome integrity.
Collapse
|
4
|
Beacon TH, Delcuve GP, López C, Nardocci G, Kovalchuk I, van Wijnen AJ, Davie JR. The dynamic broad epigenetic (H3K4me3, H3K27ac) domain as a mark of essential genes. Clin Epigenetics 2021; 13:138. [PMID: 34238359 PMCID: PMC8264473 DOI: 10.1186/s13148-021-01126-1] [Citation(s) in RCA: 74] [Impact Index Per Article: 24.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Accepted: 06/30/2021] [Indexed: 02/06/2023] Open
Abstract
Transcriptionally active chromatin is marked by tri-methylation of histone H3 at lysine 4 (H3K4me3) located after first exons and around transcription start sites. This epigenetic mark is typically restricted to narrow regions at the 5`end of the gene body, though a small subset of genes have a broad H3K4me3 domain which extensively covers the coding region. Although most studies focus on the H3K4me3 mark, the broad H3K4me3 domain is associated with a plethora of histone modifications (e.g., H3 acetylated at K27) and is therein termed broad epigenetic domain. Genes marked with the broad epigenetic domain are involved in cell identity and essential cell functions and have clinical potential as biomarkers for patient stratification. Reducing expression of genes with the broad epigenetic domain may increase the metastatic potential of cancer cells. Enhancers and super-enhancers interact with the broad epigenetic domain marked genes forming a hub of interactions involving nucleosome-depleted regions. Together, the regulatory elements coalesce with transcription factors, chromatin modifying/remodeling enzymes, coactivators, and the Mediator and/or Integrator complex into a transcription factory which may be analogous to a liquid–liquid phase-separated condensate. The broad epigenetic domain has a dynamic chromatin structure which supports frequent transcription bursts. In this review, we present the current knowledge of broad epigenetic domains.
Collapse
Affiliation(s)
- Tasnim H Beacon
- CancerCare Manitoba Research Institute, CancerCare Manitoba, Winnipeg, MB, R3E 0V9, Canada.,Department of Biochemistry and Medical Genetics, University of Manitoba, 745 Bannatyne Avenue, Room 333A, Winnipeg, MB, Canada
| | - Geneviève P Delcuve
- Department of Biochemistry and Medical Genetics, University of Manitoba, 745 Bannatyne Avenue, Room 333A, Winnipeg, MB, Canada
| | - Camila López
- CancerCare Manitoba Research Institute, CancerCare Manitoba, Winnipeg, MB, R3E 0V9, Canada.,Department of Biochemistry and Medical Genetics, University of Manitoba, 745 Bannatyne Avenue, Room 333A, Winnipeg, MB, Canada
| | - Gino Nardocci
- Faculty of Medicine, Universidad de Los Andes, Santiago, Chile.,Molecular Biology and Bioinformatics Lab, Program in Molecular Biology and Bioinformatics, Center for Biomedical Research and Innovation (CIIB), Universidad de Los Andes, Santiago, Chile
| | - Igor Kovalchuk
- Department of Biological Sciences, University of Lethbridge, Lethbridge, AB, Canada
| | - Andre J van Wijnen
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA.,Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USA
| | - James R Davie
- CancerCare Manitoba Research Institute, CancerCare Manitoba, Winnipeg, MB, R3E 0V9, Canada. .,Department of Biochemistry and Medical Genetics, University of Manitoba, 745 Bannatyne Avenue, Room 333A, Winnipeg, MB, Canada.
| |
Collapse
|
5
|
Nakato R, Sakata T. Methods for ChIP-seq analysis: A practical workflow and advanced applications. Methods 2021; 187:44-53. [PMID: 32240773 DOI: 10.1016/j.ymeth.2020.03.005] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/17/2020] [Accepted: 03/18/2020] [Indexed: 12/13/2022] Open
Abstract
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is a central method in epigenomic research. Genome-wide analysis of histone modifications, such as enhancer analysis and genome-wide chromatin state annotation, enables systematic analysis of how the epigenomic landscape contributes to cell identity, development, lineage specification, and disease. In this review, we first present a typical ChIP-seq analysis workflow, from quality assessment to chromatin-state annotation. We focus on practical, rather than theoretical, approaches for biological studies. Next, we outline various advanced ChIP-seq applications and introduce several state-of-the-art methods, including prediction of gene expression level and chromatin loops from epigenome data and data imputation. Finally, we discuss recently developed single-cell ChIP-seq analysis methodologies that elucidate the cellular diversity within complex tissues and cancers.
Collapse
Affiliation(s)
- Ryuichiro Nakato
- Laboratory of Computational Genomics, Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| | - Toyonori Sakata
- Laboratory of Genome Structure and Function, Institute for Quantitative Biosciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan.
| |
Collapse
|
6
|
Ma G, Babarinde IA, Zhuang Q, Hutchins AP. Unified Analysis of Multiple ChIP-Seq Datasets. Methods Mol Biol 2021; 2198:451-465. [PMID: 32822050 DOI: 10.1007/978-1-0716-0876-0_33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
High-throughput sequencing technologies are increasingly used in molecular cell biology to assess genome-wide chromatin dynamics of proteins bound to DNA, through techniques such as chromatin immunoprecipitation sequencing (ChIP-seq). These techniques often rely on an analysis strategy based on identifying genomic regions with increased sequencing signal to infer the binding location or chemical modifications of proteins bound to DNA. Peak calling within individual samples has been well described, however relatively little attention has been devoted to the merging of replicate samples, and the cross-comparison of many samples. Here, we present a generalized strategy to enable the unification of ChIP-seq datasets, enabling enhanced cross-comparison of binding patterns. The strategy works by merging peak data between different (even unrelated) samples, and then using a local background to recalculate enrichment. This strategy redefines the peaks within each experiment, allowing for more accurate cross-comparison of datasets.
Collapse
Affiliation(s)
- Gang Ma
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Isaac A Babarinde
- Department of Biology, Southern University of Science and Technology, Shenzhen, China
| | - Qiang Zhuang
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.,State Key Laboratory of Medicinal Chemical Biology and College of Life Sciences, Nankai University, Tianjin, China
| | - Andrew P Hutchins
- Department of Biology, Southern University of Science and Technology, Shenzhen, China.
| |
Collapse
|
7
|
Höllbacher B, Balázs K, Heinig M, Uhlenhaut NH. Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation. Comput Struct Biotechnol J 2020; 18:1330-1341. [PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/21/2020] [Accepted: 05/23/2020] [Indexed: 02/06/2023] Open
Abstract
Advancements in the field of next generation sequencing lead to the generation of ever-more data, with the challenge often being how to combine and reconcile results from different OMICs studies such as genome, epigenome and transcriptome. Here we provide an overview of the standard processing pipelines for ChIP-seq and RNA-seq as well as common downstream analyses. We describe popular multi-omics data integration approaches used to identify target genes and co-factors, and we discuss how machine learning techniques may predict transcriptional regulators and gene expression.
Collapse
Affiliation(s)
- Barbara Höllbacher
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - Kinga Balázs
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany
| | - Matthias Heinig
- Institute of Computational Biology ICB, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Department of Informatics, TUM, Munich 85748, Garching, Germany
| | - N Henriette Uhlenhaut
- Institute for Diabetes and Cancer IDC, Helmholtz Zentrum Muenchen (HMGU) and German Center for Diabetes Research (DZD), Munich 85764, Neuherberg, Germany.,Metabolic Programming, TUM School of Life Sciences Weihenstephan, Munich 85354, Freising, Germany
| |
Collapse
|
8
|
Zhang X, Gan Y, Zou G, Guan J, Zhou S. Genome-wide analysis of epigenetic dynamics across human developmental stages and tissues. BMC Genomics 2019; 20:221. [PMID: 30967107 PMCID: PMC6457072 DOI: 10.1186/s12864-019-5472-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Epigenome is highly dynamic during the early stages of embryonic development. Epigenetic modifications provide the necessary regulation for lineage specification and enable the maintenance of cellular identity. Given the rapid accumulation of genome-wide epigenomic modification maps across cellular differentiation process, there is an urgent need to characterize epigenetic dynamics and reveal their impacts on differential gene regulation. METHODS We proposed DiffEM, a computational method for differential analysis of epigenetic modifications and identified highly dynamic modification sites along cellular differentiation process. We applied this approach to investigating 6 epigenetic marks of 20 kinds of human early developmental stages and tissues, including hESCs, 4 hESC-derived lineages and 15 human primary tissues. RESULTS We identified highly dynamic modification sites where different cell types exhibit distinctive modification patterns, and found that these highly dynamic sites enriched in the genes related to cellular development and differentiation. Further, to evaluate the effectiveness of our method, we correlated the dynamics scores of epigenetic modifications with the variance of gene expression, and compared the results of our method with those of the existing algorithms. The comparison results demonstrate the power of our method in evaluating the epigenetic dynamics and identifying highly dynamic regions along cell differentiation process.
Collapse
Affiliation(s)
- Xia Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yanglan Gan
- School of Computer Science and Technology, Donghua University, Shanghai, China.
| | - Guobing Zou
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Jihong Guan
- Department of Computer Science and Technology,Tongji University, Shanghai, China
| | - Shuigeng Zhou
- Shanghai Key Lab of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai, China
| |
Collapse
|
9
|
RELACS nuclei barcoding enables high-throughput ChIP-seq. Commun Biol 2018; 1:214. [PMID: 30534606 PMCID: PMC6281648 DOI: 10.1038/s42003-018-0219-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 10/31/2018] [Indexed: 02/07/2023] Open
Abstract
Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is an invaluable tool for mapping chromatin-associated proteins. Current barcoding strategies aim to improve assay throughput and scalability but intense sample handling and lack of standardization over cell types, cell numbers and epitopes hinder wide-spread use in the field. Here, we present a barcoding method to enable high-throughput ChIP-seq using common molecular biology techniques. The method, called RELACS (restriction enzyme-based labeling of chromatin in situ) relies on standardized nuclei extraction from any source and employs chromatin cutting and barcoding within intact nuclei. Barcoded nuclei are pooled and processed within the same ChIP reaction, for maximal comparability and workload reduction. The innovative barcoding concept is particularly user-friendly and suitable for implementation to standardized large-scale clinical studies and scarce samples. Aiming to maximize universality and scalability, RELACS can generate ChIP-seq libraries for transcription factors and histone modifications from hundreds of samples within three days. Laura Arrigoni et al. present RELACS, a method enabling high-throughput ChIP-seq which involves barcoding and processing intact nuclei in the same ChIP reaction. The method is useful for broad cell types and epitopes, robust to experimental conditions, and drastically decreases workload.
Collapse
|
10
|
Maatz H, van Heesch S, Kreuchwig F, Faber A, Adami E, Hubner N, Heinig M. Epigenetics and Control of RNAs. Methods Mol Biol 2017; 1488:217-237. [PMID: 27933526 DOI: 10.1007/978-1-4939-6427-7_9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Histone modifications are epigenetic marks that fundamentally impact the regulation of gene expression. Integrating histone modification information in the analysis of gene expression traits (eQTL mapping) has been shown to significantly enhance the prediction of eQTLs. In this chapter, we describe (1) how to perform quantitative trait locus (QTL) analysis using histone modification levels as traits and (2) how to integrate these data with information on RNA expression for the elucidation of the epigenetic control of transcript levels. We will provide a comprehensive introduction into the topic, describe in detail how ChIP-seq data are analyzed and elaborate on how to integrate ChIP-seq and RNA-seq data from a segregating disease animal model for the identification of the epigenetic control of RNA expression.
Collapse
Affiliation(s)
- Henrike Maatz
- Max-Delbrück-Center for Molecular Medicine (MDC), 13125, Berlin, Germany
| | | | | | - Allison Faber
- Max-Delbrück-Center for Molecular Medicine (MDC), 13125, Berlin, Germany
| | - Eleonora Adami
- Max-Delbrück-Center for Molecular Medicine (MDC), 13125, Berlin, Germany
| | - Norbert Hubner
- Max-Delbrück-Center for Molecular Medicine (MDC), 13125, Berlin, Germany.
- DZHK (German Centre for Cardiovascular Research), Partner Site, 13347, Berlin, Germany.
- Charité-Universitätsmedizin, 10117, Berlin, Germany.
| | - Matthias Heinig
- Helmholtz Zentrum München, Institute of Computational Biology (ICB), Neuherberg, 85764, Germany.
| |
Collapse
|
11
|
Lun ATL, Smyth GK. csaw: a Bioconductor package for differential binding analysis of ChIP-seq data using sliding windows. Nucleic Acids Res 2015; 44:e45. [PMID: 26578583 PMCID: PMC4797262 DOI: 10.1093/nar/gkv1191] [Citation(s) in RCA: 249] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 10/24/2015] [Indexed: 01/20/2023] Open
Abstract
Chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq) is widely used to identify binding sites for a target protein in the genome. An important scientific application is to identify changes in protein binding between different treatment conditions, i.e. to detect differential binding. This can reveal potential mechanisms through which changes in binding may contribute to the treatment effect. The csaw package provides a framework for the de novo detection of differentially bound genomic regions. It uses a window-based strategy to summarize read counts across the genome. It exploits existing statistical software to test for significant differences in each window. Finally, it clusters windows into regions for output and controls the false discovery rate properly over all detected regions. The csaw package can handle arbitrarily complex experimental designs involving biological replicates. It can be applied to both transcription factor and histone mark datasets, and, more generally, to any type of sequencing data measuring genomic coverage. csaw performs favorably against existing methods for de novo DB analyses on both simulated and real data. csaw is implemented as a R software package and is freely available from the open-source Bioconductor project.
Collapse
Affiliation(s)
- Aaron T L Lun
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Gordon K Smyth
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
12
|
PRAVENEC M, KŘEN V, LANDA V, MLEJNEK P, MUSILOVÁ A, ŠILHAVÝ J, ŠIMÁKOVÁ M, ZÍDEK V. Recent Progress in the Genetics of Spontaneously Hypertensive Rats. Physiol Res 2014; 63:S1-8. [DOI: 10.33549/physiolres.932622] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The spontaneously hypertensive rat (SHR) is the most widely used animal model of essential hypertension and accompanying metabolic disturbances. Recent advances in sequencing of genomes of BN-Lx and SHR progenitors of the BXH/HXB recombinant inbred (RI) strains as well as accumulation of multiple data sets of intermediary phenotypes in the RI strains, including mRNA and microRNA abundance, quantitative metabolomics, proteomics, methylomics or histone modifications, will make it possible to systematically search for genetic variants involved in regulation of gene expression and in the etiology of complex pathophysiological traits. New advances in manipulation of the rat genome, including efficient transgenesis and gene targeting, will enable in vivo functional analyses of selected candidate genes to identify QTL at the molecular level or to provide insight into mechanisms whereby targeted genes affect pathophysiological traits in the SHR.
Collapse
Affiliation(s)
- M. PRAVENEC
- Institute of Physiology Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | | | | | | | | | | | | | | |
Collapse
|