1
|
Foroozandeh Shahraki M, Farahbod M, Libbrecht MW. Robust chromatin state annotation. Genome Res 2024; 34:469-483. [PMID: 38514204 PMCID: PMC11067878 DOI: 10.1101/gr.278343.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
With the goal of mapping genomic activity, international projects have recently measured epigenetic activity in hundreds of cell and tissue types. Chromatin state annotations produced by segmentation and genome annotation (SAGA) methods have emerged as the predominant way to summarize these epigenomic data sets in order to annotate the genome. These chromatin state annotations are essential for many genomic tasks, including identifying active regulatory elements and interpreting disease-associated genetic variation. However, despite the widespread applications of SAGA methods, no principled approach exists to evaluate the statistical significance of chromatin state assignments. Here, we propose the first method for assigning calibrated confidence scores to chromatin state annotations. Toward this goal, we performed a comprehensive evaluation of the reproducibility of the two most widely used existing SAGA methods, ChromHMM and Segway. We found that their predictions are frequently irreproducible. For example, when applying the same SAGA method on two sets of experimental replicates, 27%-69% of predicted enhancers fail to replicate. This suggests that a substantial fraction of predicted elements in existing chromatin state annotations cannot be relied upon. To remedy this problem, we introduce SAGAconf, a method for assigning a measure of confidence (r-value) to chromatin state annotations. SAGAconf works with any SAGA method and assigns an r-value to each genomic bin of a chromatin state annotation that represents the probability that the label of this bin will be reproduced in a replicated experiment. Thus, SAGAconf allows a researcher to select only the reliable predictions from a chromatin annotation for use in downstream analyses.
Collapse
Affiliation(s)
| | - Marjan Farahbod
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| | - Maxwell W Libbrecht
- School of Computing Science, Simon Fraser University, Burnaby, British Columbia V51 1S6, Canada
| |
Collapse
|
2
|
Maji RK, Czepukojc B, Scherer M, Tierling S, Cadenas C, Gianmoena K, Gasparoni N, Nordström K, Gasparoni G, Laggai S, Yang X, Sinha A, Ebert P, Falk-Paulsen M, Kinkley S, Hoppstädter J, Chung HR, Rosenstiel P, Hengstler JG, Walter J, Schulz MH, Kessler SM, Kiemer AK. Alterations in the hepatocyte epigenetic landscape in steatosis. Epigenetics Chromatin 2023; 16:30. [PMID: 37415213 DOI: 10.1186/s13072-023-00504-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 06/21/2023] [Indexed: 07/08/2023] Open
Abstract
Fatty liver disease or the accumulation of fat in the liver, has been reported to affect the global population. This comes with an increased risk for the development of fibrosis, cirrhosis, and hepatocellular carcinoma. Yet, little is known about the effects of a diet containing high fat and alcohol towards epigenetic aging, with respect to changes in transcriptional and epigenomic profiles. In this study, we took up a multi-omics approach and integrated gene expression, methylation signals, and chromatin signals to study the epigenomic effects of a high-fat and alcohol-containing diet on mouse hepatocytes. We identified four relevant gene network clusters that were associated with relevant pathways that promote steatosis. Using a machine learning approach, we predict specific transcription factors that might be responsible to modulate the functionally relevant clusters. Finally, we discover four additional CpG loci and validate aging-related differential CpG methylation. Differential CpG methylation linked to aging showed minimal overlap with altered methylation in steatosis.
Collapse
Affiliation(s)
- Ranjan Kumar Maji
- Institute for Cardiovascular Regeneration, Goethe-University, 60590, Frankfurt, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Rhine-Main, 60590, Frankfurt, Germany
| | - Beate Czepukojc
- Department of Pharmacy, Pharmaceutical Biology, Saarland University, 66123, Saarbrücken, Germany
| | - Michael Scherer
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08003, Barcelona, Spain
| | - Sascha Tierling
- Department of Genetics, Saarland University, 66123, Saarbrücken, Germany
| | - Cristina Cadenas
- IfADo: Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Kathrin Gianmoena
- IfADo: Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Nina Gasparoni
- Department of Genetics, Saarland University, 66123, Saarbrücken, Germany
| | - Karl Nordström
- Department of Genetics, Saarland University, 66123, Saarbrücken, Germany
| | - Gilles Gasparoni
- Department of Genetics, Saarland University, 66123, Saarbrücken, Germany
| | - Stephan Laggai
- Department of Pharmacy, Pharmaceutical Biology, Saarland University, 66123, Saarbrücken, Germany
| | - Xinyi Yang
- Institute of Medical Bioinformatics and Biostatistics, Philipps University of Marburg, 35032, Marburg, Germany
| | - Anupam Sinha
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, 24105, Kiel, Germany
| | - Peter Ebert
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, 40225, Düsseldorf, Germany
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123, Saarbrücken, Germany
| | - Maren Falk-Paulsen
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, 24105, Kiel, Germany
| | - Sarah Kinkley
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, 14195, Berlin, Germany
| | - Jessica Hoppstädter
- Department of Pharmacy, Pharmaceutical Biology, Saarland University, 66123, Saarbrücken, Germany
| | - Ho-Ryun Chung
- Institute of Medical Bioinformatics and Biostatistics, Philipps University of Marburg, 35032, Marburg, Germany
| | - Philip Rosenstiel
- Institute of Clinical Molecular Biology, Christian-Albrechts-University, 24105, Kiel, Germany
| | - Jan G Hengstler
- IfADo: Leibniz Research Centre for Working Environment and Human Factors, Dortmund, Germany
| | - Jörn Walter
- Department of Genetics, Saarland University, 66123, Saarbrücken, Germany
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe-University, 60590, Frankfurt, Germany.
- German Centre for Cardiovascular Research (DZHK), Partner Site Rhine-Main, 60590, Frankfurt, Germany.
- Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, 66123, Saarbrücken, Germany.
- Excellence Cluster on Multimodal Computing and Interaction, Saarland University, 66123, Saarbrücken, Germany.
| | - Sonja M Kessler
- Department of Pharmacy, Pharmaceutical Biology, Saarland University, 66123, Saarbrücken, Germany.
- Institute of Pharmacy, Experimental Pharmacology for Natural Sciences, Martin Luther University Halle-Wittenberg, Halle, Germany.
- Halle Research Centre for Drug Therapy (HRCDT), Halle, Germany.
| | - Alexandra K Kiemer
- Department of Pharmacy, Pharmaceutical Biology, Saarland University, 66123, Saarbrücken, Germany.
| |
Collapse
|
3
|
Vu H, Koch Z, Fiziev P, Ernst J. A framework for group-wise summarization and comparison of chromatin state annotations. Bioinformatics 2023; 39:btac722. [PMID: 36342196 PMCID: PMC9805555 DOI: 10.1093/bioinformatics/btac722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Revised: 10/12/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Genome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution. RESULTS We developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP's probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution. AVAILABILITY AND IMPLEMENTATION The CSREP source code and generated data are available at http://github.com/ernstlab/csrep. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ha Vu
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Zane Koch
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Petko Fiziev
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Illumina Artificial Intelligence Laboratory, Illumina Inc., Foster City, CA 94404, USA
| | - Jason Ernst
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Computer Science Department, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Computational Medicine Department, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Orouji E, Raman AT. Computational methods to explore chromatin state dynamics. Brief Bioinform 2022; 23:6751148. [PMID: 36208178 PMCID: PMC9677473 DOI: 10.1093/bib/bbac439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 08/25/2022] [Accepted: 09/09/2022] [Indexed: 12/14/2022] Open
Abstract
The human genome is marked by several singular and combinatorial histone modifications that shape the different states of chromatin and its three-dimensional organization. Genome-wide mapping of these marks as well as histone variants and open chromatin regions is commonly carried out via profiling DNA-protein binding or via chromatin accessibility methods. After the generation of epigenomic datasets in a cell type, statistical models can be used to annotate the noncoding regions of DNA and infer the combinatorial histone marks or chromatin states (CS). These methods involve partitioning the genome and labeling individual segments based on their CS patterns. Chromatin labels enable the systematic discovery of genomic function and activity and can label the gene body, promoters or enhancers without using other genomic maps. CSs are dynamic and change under different cell conditions, such as in normal, preneoplastic or tumor cells. This review aims to explore the available computational tools that have been developed to capture CS alterations under two or more cellular conditions.
Collapse
Affiliation(s)
- Elias Orouji
- Corresponding author: Elias Orouji, Epigenomics Lab, Princess Margaret Cancer Centre, University Health Network (UHN), 101 College St., Toronto, ON M5G 1 L7, Canada. Tel: +1 (917) 647-2202; E-mail:
| | - Ayush T Raman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, USA
| |
Collapse
|
5
|
Libbrecht MW, Chan RCW, Hoffman MM. Segmentation and genome annotation algorithms for identifying chromatin state and other genomic patterns. PLoS Comput Biol 2021; 17:e1009423. [PMID: 34648491 PMCID: PMC8516206 DOI: 10.1371/journal.pcbi.1009423] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Segmentation and genome annotation (SAGA) algorithms are widely used to understand genome activity and gene regulation. These algorithms take as input epigenomic datasets, such as chromatin immunoprecipitation-sequencing (ChIP-seq) measurements of histone modifications or transcription factor binding. They partition the genome and assign a label to each segment such that positions with the same label exhibit similar patterns of input data. SAGA algorithms discover categories of activity such as promoters, enhancers, or parts of genes without prior knowledge of known genomic elements. In this sense, they generally act in an unsupervised fashion like clustering algorithms, but with the additional simultaneous function of segmenting the genome. Here, we review the common methodological framework that underlies these methods, review variants of and improvements upon this basic framework, and discuss the outlook for future work. This review is intended for those interested in applying SAGA methods and for computational researchers interested in improving upon them.
Collapse
Affiliation(s)
| | - Rachel C. W. Chan
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
| | - Michael M. Hoffman
- Department of Computer Science, University of Toronto, Toronto, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
- Vector Institute for Artificial Intelligence, Toronto, Canada
| |
Collapse
|