1
|
Banerjee A, Zhang S, Bahar I. Genome structural dynamics: insights from Gaussian network analysis of Hi-C data. Brief Funct Genomics 2024; 23:525-537. [PMID: 38654598 PMCID: PMC11428154 DOI: 10.1093/bfgp/elae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/11/2024] [Accepted: 04/02/2024] [Indexed: 04/26/2024] Open
Abstract
Characterization of the spatiotemporal properties of the chromatin is essential to gaining insights into the physical bases of gene co-expression, transcriptional regulation and epigenetic modifications. The Gaussian network model (GNM) has proven in recent work to serve as a useful tool for modeling chromatin structural dynamics, using as input high-throughput chromosome conformation capture data. We focus here on the exploration of the collective dynamics of chromosomal structures at hierarchical levels of resolution, from single gene loci to topologically associating domains or entire chromosomes. The GNM permits us to identify long-range interactions between gene loci, shedding light on the role of cross-correlations between distal regions of the chromosomes in regulating gene expression. Notably, GNM analysis performed across diverse cell lines highlights the conservation of the global/cooperative movements of the chromatin across different types of cells. Variations driven by localized couplings between genomic loci, on the other hand, underlie cell differentiation, underscoring the significance of the four-dimensional properties of the genome in defining cellular identity. Finally, we demonstrate the close relation between the cell type-dependent mobility profiles of gene loci and their gene expression patterns, providing a clear demonstration of the role of chromosomal 4D features in defining cell-specific differential expression of genes.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
| | - She Zhang
- OpenEye, Cadence Molecular Sciences, Santa Fe, NM 87508, USA
| | - Ivet Bahar
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, NY 11794, USA
| |
Collapse
|
2
|
Maisuradze L, King MC, Surovtsev IV, Mochrie SGJ, Shattuck MD, O’Hern CS. Identifying topologically associating domains using differential kernels. PLoS Comput Biol 2024; 20:e1012221. [PMID: 39008525 PMCID: PMC11249266 DOI: 10.1371/journal.pcbi.1012221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 06/03/2024] [Indexed: 07/17/2024] Open
Abstract
Chromatin is a polymer complex of DNA and proteins that regulates gene expression. The three-dimensional (3D) structure and organization of chromatin controls DNA transcription and replication. High-throughput chromatin conformation capture techniques generate Hi-C maps that can provide insight into the 3D structure of chromatin. Hi-C maps can be represented as a symmetric matrix [Formula: see text], where each element represents the average contact probability or number of contacts between chromatin loci i and j. Previous studies have detected topologically associating domains (TADs), or self-interacting regions in [Formula: see text] within which the contact probability is greater than that outside the region. Many algorithms have been developed to identify TADs within Hi-C maps. However, most TAD identification algorithms are unable to identify nested or overlapping TADs and for a given Hi-C map there is significant variation in the location and number of TADs identified by different methods. We develop a novel method to identify TADs, KerTAD, using a kernel-based technique from computer vision and image processing that is able to accurately identify nested and overlapping TADs. We benchmark this method against state-of-the-art TAD identification methods on both synthetic and experimental data sets. We find that the new method consistently has higher true positive rates (TPR) and lower false discovery rates (FDR) than all tested methods for both synthetic and manually annotated experimental Hi-C maps. The TPR for KerTAD is also largely insensitive to increasing noise and sparsity, in contrast to the other methods. We also find that KerTAD is consistent in the number and size of TADs identified across replicate experimental Hi-C maps for several organisms. Thus, KerTAD will improve automated TAD identification and enable researchers to better correlate changes in TADs to biological phenomena, such as enhancer-promoter interactions and disease states.
Collapse
Affiliation(s)
- Luka Maisuradze
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Megan C. King
- Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Ivan V. Surovtsev
- Department of Cell Biology, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Simon G. J. Mochrie
- Department of Physics, Yale University, New Haven, Connecticut, United States of America
| | - Mark D. Shattuck
- Benjamin Levich Institute and Physics Department, The City College of New York, New York, New York, United States of America
| | - Corey S. O’Hern
- Department of Physics, Yale University, New Haven, Connecticut, United States of America
- Department of Mechanical Engineering and Materials Science, Yale University, New Haven, Connecticut, United States of America
- Graduate Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
3
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
4
|
He Y, Xue Y, Wang J, Huang Y, Liu L, Huang Y, Gao YQ. Diffusion-enhanced characterization of 3D chromatin structure reveals its linkage to gene regulatory networks and the interactome. Genome Res 2023; 33:1354-1368. [PMID: 37491077 PMCID: PMC10547250 DOI: 10.1101/gr.277737.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 07/21/2023] [Indexed: 07/27/2023]
Abstract
The interactome networks at the DNA, RNA, and protein levels are crucial for cellular functions, and the diverse variations of these networks are heavily involved in the establishment of different cell states. We have developed a diffusion-based method, Hi-C to geometry (CTG), to obtain reliable geometric information on the chromatin from Hi-C data. CTG produces a consistent and reproducible framework for the 3D genomic structure and provides a reliable and quantitative understanding of the alterations of genomic structures under different cellular conditions. The genomic structure yielded by CTG serves as an architectural blueprint of the dynamic gene regulatory network, based on which cell-specific correspondence between gene-gene and corresponding protein-protein physical interactions, as well as transcription correlation, is revealed. We also find that gene fusion events are significantly enriched between genes of short CTG distances and are thus close in 3D space. These findings indicate that 3D chromatin structure is at least partially correlated with downstream processes such as transcription, gene regulation, and even regulatory networking through affecting protein-protein interactions.
Collapse
Affiliation(s)
- Yueying He
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yue Xue
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Jingyao Wang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yupeng Huang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Lu Liu
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
- School of Life Sciences, Peking University, Beijing 100871, China
| | - Yanyi Huang
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China;
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
| | - Yi Qin Gao
- Beijing National Laboratory for Molecular Sciences, College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China;
- Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Liu K, Li HD, Li Y, Wang J, Wang J. A Comparison of Topologically Associating Domain Callers Based on Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:15-29. [PMID: 35104223 DOI: 10.1109/tcbb.2022.3147805] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Topologically associating domains (TADs) are local chromatin interaction domains, which have been shown to play an important role in gene expression regulation. TADs were originally discovered in the investigation of 3D genome organization based on High-throughput Chromosome Conformation Capture (Hi-C) data. Continuous considerable efforts have been dedicated to developing methods for detecting TADs from Hi-C data. Different computational methods for TADs identification vary in their assumptions and criteria in calling TADs. As a consequence, the TADs called by these methods differ in their similarities and biological features they are enriched in. In this work, we performed a systematic comparison of twenty-six TAD callers. We first compared the TADs and gaps between adjacent TADs across different methods, resolutions, and sequencing depths. We then assessed the quality of TADs and TAD boundaries according to three criteria: the decay of contact frequencies over the genomic distance, enrichment and depletion of regulatory elements around TAD boundaries, and reproducibility of TADs and TAD boundaries in replicate samples. Last, due to the lack of a gold standard of TADs, we also evaluated the performance of the methods on synthetic datasets. We discussed the key principles of TAD callers, and pinpointed current situation in the detection of TADs. We provide a concise, comprehensive, and systematic framework for evaluating the performance of TAD callers, and expect our work will provide useful guidance in choosing suitable approaches for the detection and evaluation of TADs.
Collapse
|
6
|
Sefer E. A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics 2022; 23:127. [PMID: 35413815 PMCID: PMC9006547 DOI: 10.1186/s12859-022-04674-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 04/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Topologically associating domains (TADs) are locally highly-interacting genome regions, which also play a critical role in regulating gene expression in the cell. TADs have been first identified while investigating the 3D genome structure over High-throughput Chromosome Conformation Capture (Hi-C) interaction dataset. Substantial degree of efforts have been devoted to develop techniques for inferring TADs from Hi-C interaction dataset. Many TAD-calling methods have been developed which differ in their criteria and assumptions in TAD inference. Correspondingly, TADs inferred via these callers vary in terms of both similarities and biological features they are enriched in. RESULT We have carried out a systematic comparison of 27 TAD-calling methods over mammals. We use Micro-C, a recent high-resolution variant of Hi-C, to compare TADs at a very high resolution, and classify the methods into 3 categories: feature-based methods, Clustering methods, Graph-partitioning methods. We have evaluated TAD boundaries, gaps between adjacent TADs, and quality of TADs across various criteria. We also found particularly CTCF and Cohesin proteins to be effective in formation of TADs with corner dots. We have also assessed the callers performance on simulated datasets since a gold standard for TADs is missing. TAD sizes and numbers change remarkably between TAD callers and dataset resolutions, indicating that TADs are hierarchically-organized domains, instead of disjoint regions. A core subset of feature-based TAD callers regularly perform the best while inferring reproducible domains, which are also enriched for TAD related biological properties. CONCLUSION We have analyzed the fundamental principles of TAD-calling methods, and identified the existing situation in TAD inference across high resolution Micro-C interaction datasets over mammals. We come up with a systematic, comprehensive, and concise framework to evaluate the TAD-calling methods performance across Micro-C datasets. Our research will be useful in selecting appropriate methods for TAD inference and evaluation based on available data, experimental design, and biological question of interest. We also introduce our analysis as a benchmarking tool with publicly available source code.
Collapse
Affiliation(s)
- Emre Sefer
- Department of Computer Science, Ozyegin University, Istanbul, Turkey.
| |
Collapse
|
7
|
Galan S, Machnik N, Kruse K, Díaz N, Marti-Renom MA, Vaquerizas JM. CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nat Genet 2020; 52:1247-1255. [PMID: 33077914 PMCID: PMC7610641 DOI: 10.1038/s41588-020-00712-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 09/04/2020] [Indexed: 12/11/2022]
Abstract
Dynamic changes in the three-dimensional (3D) organization of chromatin are associated with central biological processes, such as transcription, replication and development. Therefore, the comprehensive identification and quantification of these changes is fundamental to understanding of evolutionary and regulatory mechanisms. Here, we present Comparison of Hi-C Experiments using Structural Similarity (CHESS), an algorithm for the comparison of chromatin contact maps and automatic differential feature extraction. We demonstrate the robustness of CHESS to experimental variability and showcase its biological applications on (1) interspecies comparisons of syntenic regions in human and mouse models; (2) intraspecies identification of conformational changes in Zelda-depleted Drosophila embryos; (3) patient-specific aberrant chromatin conformation in a diffuse large B-cell lymphoma sample; and (4) the systematic identification of chromatin contact differences in high-resolution Capture-C data. In summary, CHESS is a computationally efficient method for the comparison and classification of changes in chromatin contact data.
Collapse
Affiliation(s)
- Silvia Galan
- Max Planck Institute for Molecular Biomedicine, Münster, Germany
- National Centre for Genomic Analysis, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Nick Machnik
- Max Planck Institute for Molecular Biomedicine, Münster, Germany
- Institute of Science and Technology Austria, Klosterneuburg, Austria
| | - Kai Kruse
- Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Noelia Díaz
- Max Planck Institute for Molecular Biomedicine, Münster, Germany
| | - Marc A Marti-Renom
- National Centre for Genomic Analysis, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Pompeu Fabra University, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies, Barcelona, Spain
| | - Juan M Vaquerizas
- Max Planck Institute for Molecular Biomedicine, Münster, Germany.
- Medical Research Council London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
8
|
Lyu H, Li L, Wu Z, Wang T, Zheng J, Wang H. TADBD: a sensitive and fast method for detection of typologically associated domain boundaries. Biotechniques 2020; 69:376-383. [DOI: 10.2144/btn-2019-0165] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
A topologically associated domain (TAD) is a self-interacting genomic block. Detection of TAD boundaries on Hi-C contact matrix is one of the most important issues in the analysis of 3D genome architecture at TAD level. Here, we present TAD boundary detection (TADBD), a sensitive and fast computational method for detection of TAD boundaries on Hi-C contact matrix. This method implements a Haar-based algorithm by considering Haar diagonal template, acceleration via a compact integrogram, multi-scale aggregation at template size and statistical filtering. In most cases, comparison results from simulated and experimental data show that TADBD outperforms the other five methods. In addition, a new R package for TADBD is freely available online.
Collapse
Affiliation(s)
- Hongqiang Lyu
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Lin Li
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Zhifang Wu
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Tian Wang
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
- School of Automation Science & Electrical Engineering, Beihang University, Beijing 100191, China
| | - Jiguang Zheng
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| | - Hongda Wang
- School of Electronic & Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
| |
Collapse
|
9
|
Stansfield JC, Cresswell KG, Dozmorov MG. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 2020; 35:2916-2923. [PMID: 30668639 DOI: 10.1093/bioinformatics/btz048] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 12/14/2018] [Accepted: 01/17/2019] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION With the development of chromatin conformation capture technology and its high-throughput derivative Hi-C sequencing, studies of the three-dimensional interactome of the genome that involve multiple Hi-C datasets are becoming available. To account for the technology-driven biases unique to each dataset, there is a distinct need for methods to jointly normalize multiple Hi-C datasets. Previous attempts at removing biases from Hi-C data have made use of techniques which normalize individual Hi-C datasets, or, at best, jointly normalize two datasets. RESULTS Here, we present multiHiCcompare, a cyclic loess regression-based joint normalization technique for removing biases across multiple Hi-C datasets. In contrast to other normalization techniques, it properly handles the Hi-C-specific decay of chromatin interaction frequencies with the increasing distance between interacting regions. multiHiCcompare uses the general linear model framework for comparative analysis of multiple Hi-C datasets, adapted for the Hi-C-specific decay of chromatin interaction frequencies. multiHiCcompare outperforms other methods when detecting a priori known chromatin interaction differences from jointly normalized datasets. Applied to the analysis of auxin-treated versus untreated experiments, and CTCF depletion experiments, multiHiCcompare was able to recover the expected epigenetic and gene expression signatures of loss of chromatin interactions and reveal novel insights. AVAILABILITY AND IMPLEMENTATION multiHiCcompare is freely available on GitHub and as a Bioconductor R package https://bioconductor.org/packages/multiHiCcompare. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- John C Stansfield
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Kellen G Cresswell
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
10
|
Cook KB, Hristov BH, Le Roch KG, Vert JP, Noble WS. Measuring significant changes in chromatin conformation with ACCOST. Nucleic Acids Res 2020; 48:2303-2311. [PMID: 32034421 PMCID: PMC7049724 DOI: 10.1093/nar/gkaa069] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 01/17/2020] [Accepted: 02/03/2020] [Indexed: 12/17/2022] Open
Abstract
Chromatin conformation assays such as Hi-C cannot directly measure differences in 3D architecture between cell types or cell states. For this purpose, two or more Hi-C experiments must be carried out, but direct comparison of the resulting Hi-C matrices is confounded by several features of Hi-C data. Most notably, the genomic distance effect, whereby contacts between pairs of genomic loci that are proximal along the chromosome exhibit many more Hi-C contacts that distal pairs of loci, dominates every Hi-C matrix. Furthermore, the form that this distance effect takes often varies between different Hi-C experiments, even between replicate experiments. Thus, a statistical confidence measure designed to identify differential Hi-C contacts must accurately account for the genomic distance effect or risk being misled by large-scale but artifactual differences. ACCOST (Altered Chromatin COnformation STatistics) accomplishes this goal by extending the statistical model employed by DEseq, re-purposing the ‘size factors,’ which were originally developed to account for differences in read depth between samples, to instead model the genomic distance effect. We show via analysis of simulated and real data that ACCOST provides unbiased statistical confidence estimates that compare favorably with competing methods such as diffHiC, FIND and HiCcompare. ACCOST is freely available with an Apache license at https://bitbucket.org/noblelab/accost.
Collapse
Affiliation(s)
- Kate B Cook
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA
| | - Borislav H Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA
| | - Karine G Le Roch
- Department of Cell Biology, University of California, Riverside, CA 92521, USA
| | - Jean Philippe Vert
- Google Brain, Paris, 75009, France.,Centre for Computational Biology, MINES ParisTech, PSL University, Paris, 75009, France
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195-5065, USA.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195-2355, USA
| |
Collapse
|
11
|
Kumari K, Duenweg B, Padinhateeri R, Prakash JR. Computing 3D Chromatin Configurations from Contact Probability Maps by Inverse Brownian Dynamics. Biophys J 2020; 118:2193-2208. [PMID: 32389215 PMCID: PMC7203009 DOI: 10.1016/j.bpj.2020.02.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 02/04/2020] [Accepted: 02/11/2020] [Indexed: 01/20/2023] Open
Abstract
The three-dimensional (3D) organization of chromatin, on the length scale of a few genes, is crucial in determining the functional state-accessibility and amount of gene expression-of the chromatin. Recent advances in chromosome conformation capture experiments provide partial information on the chromatin organization in a cell population, namely the contact count between any segment pairs, but not on the interaction strength that leads to these contact counts. However, given the contact matrix, determining the complete 3D organization of the whole chromatin polymer is an inverse problem. In this work, a novel inverse Brownian dynamics method based on a coarse-grained bead-spring chain model has been proposed to compute the optimal interaction strengths between different segments of chromatin such that the experimentally measured contact count probability constraints are satisfied. Applying this method to the α-globin gene locus in two different cell types, we predict the 3D organizations corresponding to active and repressed states of chromatin at the locus. We show that the average distance between any two segments of the region has a broad distribution and cannot be computed as a simple inverse relation based on the contact probability alone. The results presented for multiple normalization methods suggest that all measurable quantities may crucially depend on the nature of normalization. We argue that by experimentally measuring predicted quantities, one may infer the appropriate form of normalization.
Collapse
Affiliation(s)
- Kiran Kumari
- Department of Chemical Engineering, Monash University, Melbourne, Victoria, Australia; Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India; IITB-Monash Research Academy, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
| | - Burkhard Duenweg
- Department of Chemical Engineering, Monash University, Melbourne, Victoria, Australia; Max Planck Institute for Polymer Research, Mainz, Germany
| | - Ranjith Padinhateeri
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra, India.
| | - J Ravi Prakash
- Department of Chemical Engineering, Monash University, Melbourne, Victoria, Australia.
| |
Collapse
|
12
|
Cresswell KG, Dozmorov MG. TADCompare: An R Package for Differential and Temporal Analysis of Topologically Associated Domains. Front Genet 2020; 11:158. [PMID: 32211023 PMCID: PMC7076128 DOI: 10.3389/fgene.2020.00158] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 02/11/2020] [Indexed: 12/02/2022] Open
Abstract
Recent research using chromatin conformation capture technologies, such as Hi-C, has demonstrated the importance of topologically associated domains (TADs) and smaller chromatin loops, collectively referred hereafter as "interacting domains." Many such domains change during development or disease, and exhibit cell- and condition-specific differences. Quantification of the dynamic behavior of interacting domains will help to better understand genome regulation. Methods for comparing interacting domains between cells and conditions are highly limited. We developed TADCompare, a method for differential analysis of boundaries of interacting domains between two or more Hi-C datasets. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of boundary differences. Using this measure, we introduce methods for identifying differential and consensus boundaries of interacting domains and tracking boundary changes over time. We further propose a novel framework for the systematic classification of boundary changes. Colocalization- and gene enrichment analysis of different types of boundary changes demonstrated distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare and Bioconductor (submitted).
Collapse
|
13
|
Gisselbrecht SS, Palagi A, Kurland JV, Rogers JM, Ozadam H, Zhan Y, Dekker J, Bulyk ML. Transcriptional Silencers in Drosophila Serve a Dual Role as Transcriptional Enhancers in Alternate Cellular Contexts. Mol Cell 2019; 77:324-337.e8. [PMID: 31704182 DOI: 10.1016/j.molcel.2019.10.004] [Citation(s) in RCA: 68] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Revised: 08/15/2019] [Accepted: 10/01/2019] [Indexed: 12/26/2022]
Abstract
A major challenge in biology is to understand how complex gene expression patterns are encoded in the genome. While transcriptional enhancers have been studied extensively, few transcriptional silencers have been identified, and they remain poorly understood. Here, we used a novel strategy to screen hundreds of sequences for tissue-specific silencer activity in whole Drosophila embryos. Almost all of the transcriptional silencers that we identified were also active enhancers in other cellular contexts. These elements are bound by more transcription factors than non-silencers. A subset of these silencers forms long-range contacts with promoters. Deletion of a silencer caused derepression of its target gene. Our results challenge the common practice of treating enhancers and silencers as separate classes of regulatory elements and suggest the possibility that thousands or more bifunctional CRMs remain to be discovered in Drosophila and 104-105 in humans.
Collapse
Affiliation(s)
- Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Alexandre Palagi
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Doctoral School of Life and Health Sciences, University of Nice Sophia Antipolis, 06560 Valbonne, France
| | - Jesse V Kurland
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
| | - Julia M Rogers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138, USA
| | - Hakan Ozadam
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Ye Zhan
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA
| | - Job Dekker
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01655, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA 02138, USA; Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
14
|
Di Filippo L, Righelli D, Gagliardi M, Matarazzo MR, Angelini C. HiCeekR: A Novel Shiny App for Hi-C Data Analysis. Front Genet 2019; 10:1079. [PMID: 31749839 PMCID: PMC6844183 DOI: 10.3389/fgene.2019.01079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/09/2019] [Indexed: 01/14/2023] Open
Abstract
The High-throughput Chromosome Conformation Capture (Hi-C) technique combines the power of the Next Generation Sequencing technologies with chromosome conformation capture approach to study the 3D chromatin organization at the genome-wide scale. Although such a technique is quite recent, many tools are already available for pre-processing and analyzing Hi-C data, allowing to identify chromatin loops, topological associating domains and A/B compartments. However, only a few of them provide an exhaustive analysis pipeline or allow to easily integrate and visualize other omic layers. Moreover, most of the available tools are designed for expert users, who have great confidence with command-line applications. In this paper, we present HiCeekR (https://github.com/lucidif/HiCeekR), a novel R Graphical User Interface (GUI) that allows researchers to easily perform a complete Hi-C data analysis. With the aid of the Shiny libraries, it integrates several R/Bioconductor packages for Hi-C data analysis and visualization, guiding the user during the entire process. Here, we describe its architecture and functionalities, then illustrate its capabilities using a publicly available dataset.
Collapse
Affiliation(s)
- Lucio Di Filippo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Dario Righelli
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Miriam Gagliardi
- Max Planck Institute for Psychiatry, Munich, Germany.,Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Maria Rosaria Matarazzo
- Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| |
Collapse
|
15
|
Abstract
Hi-C has been predominately used to study the genome-wide interactions of genomes. In Hi-C experiments, it is believed that biases originating from different systematic deviations lead to extraneous variability among raw samples, and affect the reliability of downstream interpretations. As an important pipeline in Hi-C analysis, normalization seeks to remove the unwanted systematic biases; thus, a comparison between Hi-C normalization methods benefits their choice and the downstream analysis. In this article, a comprehensive comparison is proposed to investigate six Hi-C normalization methods in terms of multiple considerations. In light of comparison results, it has been shown that a cross-sample approach significantly outperforms individual sample methods in most considerations. The differences between these methods are analyzed, some practical recommendations are given, and the results are summarized in a table to facilitate the choice of the six normalization methods. The source code for the implementation of these methods is available at https://github.com/lhqxinghun/bioinformatics/tree/master/Hi-C/NormCompare.
Collapse
|
16
|
Zhang X, Zhang Y, Zhu X, Purmann C, Haney MS, Ward T, Khechaduri A, Yao J, Weissman SM, Urban AE. Local and global chromatin interactions are altered by large genomic deletions associated with human brain development. Nat Commun 2018; 9:5356. [PMID: 30559385 PMCID: PMC6297223 DOI: 10.1038/s41467-018-07766-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 11/09/2018] [Indexed: 01/18/2023] Open
Abstract
Large copy number variants (CNVs) in the human genome are strongly associated with common neurodevelopmental, neuropsychiatric disorders such as schizophrenia and autism. Here we report on the epigenomic effects of the prominent large deletion CNVs on chromosome 22q11.2 and on chromosome 1q21.1. We use Hi-C analysis of long-range chromosome interactions, including haplotype-specific Hi-C analysis, ChIP-Seq analysis of regulatory histone marks, and RNA-Seq analysis of gene expression patterns. We observe changes on all the levels of analysis, within the deletion boundaries, in the deletion flanking regions, along chromosome 22q, and genome wide. We detect gene expression changes as well as pronounced and multilayered effects on chromatin states, chromosome folding and on the topological domains of the chromatin, that emanate from the large CNV locus. These findings suggest basic principles of how such large genomic deletions can alter nuclear organization and affect genomic molecular activity. Copy number variants in the human genome (CNVs) are associated with neurodevelopmental and psychiatric disorders such as schizophrenia and autism. Here the authors investigate how the large deletion CNV on chromosome 22q11.2 alters chromatin organization.
Collapse
Affiliation(s)
- Xianglong Zhang
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA
| | - Ying Zhang
- Department of Genetics, Yale University, New Haven, 06520, CT, USA.,Department of Genetics and Genomics Sciences, Icahn School of Medicine at Mount Sinai & Sema4 NYC Laboratory, New York, 10029, NY, USA
| | - Xiaowei Zhu
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA
| | - Carolin Purmann
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA
| | - Michael S Haney
- Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA
| | - Thomas Ward
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA
| | - Arineh Khechaduri
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA.,Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, 98109, WA, USA
| | - Jie Yao
- Department of Cell Biology, Yale University School of Medicine, New Haven, 06520, CT, USA.,Sun Yat-sen University, Guangzhou, 510080, Guangdong, China
| | | | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, 94304, CA, USA. .,Department of Genetics, Stanford University School of Medicine, Stanford, 94304, CA, USA.
| |
Collapse
|
17
|
Calandrelli R, Wu Q, Guan J, Zhong S. GITAR: An Open Source Tool for Analysis and Visualization of Hi-C Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2018; 16:365-372. [PMID: 30553884 PMCID: PMC6364044 DOI: 10.1016/j.gpb.2018.06.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Revised: 05/20/2018] [Accepted: 06/19/2018] [Indexed: 01/01/2023]
Abstract
Interactions between chromatin segments play a large role in functional genomic assays and developments in genomic interaction detection methods have shown interacting topological domains within the genome. Among these methods, Hi-C plays a key role. Here, we present the Genome Interaction Tools and Resources (GITAR), a software to perform a comprehensive Hi-C data analysis, including data preprocessing, normalization, and visualization, as well as analysis of topologically-associated domains (TADs). GITAR is composed of two main modules: (1) HiCtool, a Python library to process and visualize Hi-C data, including TAD analysis; and (2) processed data library, a large collection of human and mouse datasets processed using HiCtool. HiCtool leads the user step-by-step through a pipeline, which goes from the raw Hi-C data to the computation, visualization, and optimized storage of intra-chromosomal contact matrices and TAD coordinates. A large collection of standardized processed data allows the users to compare different datasets in a consistent way, while saving time to obtain data for visualization or additional analyses. More importantly, GITAR enables users without any programming or bioinformatic expertise to work with Hi-C data. GITAR is publicly available at http://genomegitar.org as an open-source software.
Collapse
Affiliation(s)
- Riccardo Calandrelli
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA.
| | - Qiuyang Wu
- Department of Computer Science and Technology, Tongji University, Shanghai 200092, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 200092, China
| | - Sheng Zhong
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
18
|
Zufferey M, Tavernari D, Oricchio E, Ciriello G. Comparison of computational methods for the identification of topologically associating domains. Genome Biol 2018; 19:217. [PMID: 30526631 PMCID: PMC6288901 DOI: 10.1186/s13059-018-1596-9] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 11/26/2018] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Chromatin folding gives rise to structural elements among which are clusters of densely interacting DNA regions termed topologically associating domains (TADs). TADs have been characterized across multiple species, tissue types, and differentiation stages, sometimes in association with regulation of biological functions. The reliability and reproducibility of these findings are intrinsically related with the correct identification of these domains from high-throughput chromatin conformation capture (Hi-C) experiments. RESULTS Here, we test and compare 22 computational methods to identify TADs across 20 different conditions. We find that TAD sizes and numbers vary significantly among callers and data resolutions, challenging the definition of an average TAD size, but strengthening the hypothesis that TADs are hierarchically organized domains, rather than disjoint structural elements. Performances of these methods differ based on data resolution and normalization strategy, but a core set of TAD callers consistently retrieve reproducible domains, even at low sequencing depths, that are enriched for TAD-associated biological features. CONCLUSIONS This study provides a reference for the analysis of chromatin domains from Hi-C experiments and useful guidelines for choosing a suitable approach based on the experimental design, available data, and biological question of interest.
Collapse
Affiliation(s)
- Marie Zufferey
- Department of Computational Biology, University of Lausanne (UNIL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Daniele Tavernari
- Department of Computational Biology, University of Lausanne (UNIL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Elisa Oricchio
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Giovanni Ciriello
- Department of Computational Biology, University of Lausanne (UNIL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
19
|
HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics 2018; 19:279. [PMID: 30064362 PMCID: PMC6069782 DOI: 10.1186/s12859-018-2288-x] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/18/2018] [Indexed: 12/22/2022] Open
Abstract
Background Changes in spatial chromatin interactions are now emerging as a unifying mechanism orchestrating the regulation of gene expression. Hi-C sequencing technology allows insight into chromatin interactions on a genome-wide scale. However, Hi-C data contains many DNA sequence- and technology-driven biases. These biases prevent effective comparison of chromatin interactions aimed at identifying genomic regions differentially interacting between, e.g., disease-normal states or different cell types. Several methods have been developed for normalizing individual Hi-C datasets. However, they fail to account for biases between two or more Hi-C datasets, hindering comparative analysis of chromatin interactions. Results We developed a simple and effective method, HiCcompare, for the joint normalization and differential analysis of multiple Hi-C datasets. The method introduces a distance-centric analysis and visualization of the differences between two Hi-C datasets on a single plot that allows for a data-driven normalization of biases using locally weighted linear regression (loess). HiCcompare outperforms methods for normalizing individual Hi-C datasets and methods for differential analysis (diffHiC, FIND) in detecting a priori known chromatin interaction differences while preserving the detection of genomic structures, such as A/B compartments. Conclusions HiCcompare is able to remove between-dataset bias present in Hi-C matrices. It also provides a user-friendly tool to allow the scientific community to perform direct comparisons between the growing number of pre-processed Hi-C datasets available at online repositories. HiCcompare is freely available as a Bioconductor R package https://bioconductor.org/packages/HiCcompare/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2288-x) contains supplementary material, which is available to authorized users.
Collapse
|
20
|
Mifsud B, Martincorena I, Darbo E, Sugar R, Schoenfelder S, Fraser P, Luscombe NM. GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in Hi-C data. PLoS One 2017; 12:e0174744. [PMID: 28379994 PMCID: PMC5381888 DOI: 10.1371/journal.pone.0174744] [Citation(s) in RCA: 65] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 03/14/2017] [Indexed: 01/16/2023] Open
Abstract
Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).
Collapse
Affiliation(s)
- Borbala Mifsud
- The Francis Crick Institute, London, United Kingdom
- UCL Genetics Institute, Department of Genetics Evolution and Environment, University College London, London, United Kingdom
| | | | - Elodie Darbo
- The Francis Crick Institute, London, United Kingdom
| | - Robert Sugar
- The Francis Crick Institute, London, United Kingdom
| | | | - Peter Fraser
- Nuclear Dynamics Programme, Babraham Institute, Cambridge, United Kingdom
| | - Nicholas M. Luscombe
- The Francis Crick Institute, London, United Kingdom
- UCL Genetics Institute, Department of Genetics Evolution and Environment, University College London, London, United Kingdom
- Okinawa Institute of Science & Technology, Okinawa, Japan
| |
Collapse
|
21
|
Wu HJ, Michor F. A computational strategy to adjust for copy number in tumor Hi-C data. Bioinformatics 2016; 32:3695-3701. [PMID: 27531101 PMCID: PMC6078171 DOI: 10.1093/bioinformatics/btw540] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 07/28/2016] [Accepted: 08/11/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The Hi-C technology was designed to decode the three-dimensional conformation of the genome. Despite progress towards more and more accurate contact maps, several systematic biases have been demonstrated to affect the resulting data matrix. Here we report a new source of bias that can arise in tumor Hi-C data, which is related to the copy number of genomic DNA. To address this bias, we designed a chromosome-adjusted iterative correction method called caICB. Our caICB correction method leads to significant improvements when compared with the original iterative correction in terms of eliminating copy number bias. AVAILABILITY AND IMPLEMENTATION The method is available at https://bitbucket.org/mthjwu/hicapp CONTACT: michor@jimmy.harvard.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hua-Jun Wu
- Department of Computational Biology and Biostatistics, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| | - Franziska Michor
- Department of Computational Biology and Biostatistics, Dana-Farber Cancer Institute, and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| |
Collapse
|
22
|
Abstract
Chromosomes of eukaryotes adopt highly dynamic and complex hierarchical structures in the nucleus. The three-dimensional (3D) organization of chromosomes profoundly affects DNA replication, transcription and the repair of DNA damage. Thus, a thorough understanding of nuclear architecture is fundamental to the study of nuclear processes in eukaryotic cells. Recent years have seen rapid proliferation of technologies to investigate genome organization and function. Here, we review experimental and computational methodologies for 3D genome analysis, with special focus on recent advances in high-throughput chromatin conformation capture (3C) techniques and data analysis.
Collapse
Affiliation(s)
- Anthony D Schmitt
- Ludwig Institute for Cancer Research and the University of California, San Diego (UCSD) Biomedical Sciences Graduate Program, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Ming Hu
- Department of Population Health, Division of Biostatistics, New York University School of Medicine, 650 First Avenue, Room 540, New York, New York 10016, USA
- Present address: Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, 9500 Euclid Avenue, Cleveland, Ohio 44195, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research, Department of Cellular and Molecular Medicine, Moores Cancer Center and Institute of Genomic Medicine, University of California, San Diego (UCSD) School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| |
Collapse
|
23
|
Shavit Y, Walker BJ, Lio' P. Hierarchical block matrices as efficient representations of chromosome topologies and their application for 3C data integration. Bioinformatics 2015; 32:1121-9. [PMID: 26685310 DOI: 10.1093/bioinformatics/btv736] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2015] [Accepted: 12/12/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent advancements in molecular methods have made it possible to capture physical contacts between multiple chromatin fragments. The resulting association matrices provide a noisy estimate for average spatial proximity that can be used to gain insights into the genome organization inside the nucleus. However, extracting topological information from these data is challenging and their integration across resolutions is still poorly addressed. Recent findings suggest that a hierarchical approach could be advantageous for addressing these challenges. RESULTS We present an algorithmic framework, which is based on hierarchical block matrices (HBMs), for topological analysis and integration of chromosome conformation capture (3C) data. We first describe chromoHBM, an algorithm that compresses high-throughput 3C (HiT-3C) data into topological features that are efficiently summarized with an HBM representation. We suggest that instead of directly combining HiT-3C datasets across resolutions, which is a difficult task, we can integrate their HBM representations, and describe chromoHBM-3C, an algorithm which merges HBMs. Since three-dimensional (3D) reconstruction can also benefit from topological information, we further present chromoHBM-3D, an algorithm which exploits the HBM representation in order to gradually introduce topological constraints to the reconstruction process. We evaluate our approach in light of previous image microscopy findings and epigenetic data, and show that it can relate multiple spatial scales and provide a more complete view of the 3D genome architecture. AVAILABILITY AND IMPLEMENTATION The presented algorithms are available from: https://github.com/yolish/hbm CONTACT ys388@cam.ac.uk or pl219@cam.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yoli Shavit
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, UK
| | - Barnabas James Walker
- University of Cambridge, Cambridge CB3 0FD, UK and Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
| | - Pietro Lio'
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD, UK
| |
Collapse
|
24
|
Shavit Y, Merelli I, Milanesi L, Lio’ P. How computer science can help in understanding the 3D genome architecture. Brief Bioinform 2015; 17:733-44. [DOI: 10.1093/bib/bbv085] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Indexed: 01/20/2023] Open
|
25
|
Schmid MW, Grob S, Grossniklaus U. HiCdat: a fast and easy-to-use Hi-C data analysis tool. BMC Bioinformatics 2015; 16:277. [PMID: 26334796 PMCID: PMC4559209 DOI: 10.1186/s12859-015-0678-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 07/20/2015] [Indexed: 12/25/2022] Open
Abstract
Background The study of nuclear architecture using Chromosome Conformation Capture (3C) technologies is a novel frontier in biology. With further reduction in sequencing costs, the potential of Hi-C in describing nuclear architecture as a phenotype is only about to unfold. To use Hi-C for phenotypic comparisons among different cell types, conditions, or genetic backgrounds, Hi-C data processing needs to be more accessible to biologists. Results HiCdat provides a simple graphical user interface for data pre-processing and a collection of higher-level data analysis tools implemented in R. Data pre-processing also supports a wide range of additional data types required for in-depth analysis of the Hi-C data (e.g. RNA-Seq, ChIP-Seq, and BS-Seq). Conclusions HiCdat is easy-to-use and provides solutions starting from aligned reads up to in-depth analyses. Importantly, HiCdat is focussed on the analysis of larger structural features of chromosomes, their correlation to genomic and epigenomic features, and on comparative studies. It uses simple input and output formats and can therefore easily be integrated into existing workflows or combined with alternative tools. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0678-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Marc W Schmid
- Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, Zürich, 8008, Switzerland. .,Zurich-Basel Plant Science Center, Universitätstrasse 2, Zürich, 8092, Switzerland.
| | - Stefan Grob
- Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, Zürich, 8008, Switzerland. .,Zurich-Basel Plant Science Center, Universitätstrasse 2, Zürich, 8092, Switzerland.
| | - Ueli Grossniklaus
- Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, Zürich, 8008, Switzerland. .,Zurich-Basel Plant Science Center, Universitätstrasse 2, Zürich, 8092, Switzerland.
| |
Collapse
|
26
|
Ay F, Noble WS. Analysis methods for studying the 3D architecture of the genome. Genome Biol 2015; 16:183. [PMID: 26328929 PMCID: PMC4556012 DOI: 10.1186/s13059-015-0745-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 08/10/2015] [Indexed: 11/10/2022] Open
Abstract
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.
Collapse
Affiliation(s)
- Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Feinberg School of Medicine, Northwestern University, Chicago, 60661, IL, USA.
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Department of Computer Science and Engineering, University of Washington, Seattle, 98195, WA, USA.
| |
Collapse
|
27
|
Shavit Y, Hamey FK, Lio P. FisHiCal: an R package for iterative FISH-based calibration of Hi-C data. Bioinformatics 2014; 30:3120-2. [PMID: 25061071 PMCID: PMC4609013 DOI: 10.1093/bioinformatics/btu491] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/26/2014] [Accepted: 07/16/2014] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The fluorescence in situ hybridization (FISH) method has been providing valuable information on physical distances between loci (via image analysis) for several decades. Recently, high-throughput data on nearby chemical contacts between and within chromosomes became available with the Hi-C method. Here, we present FisHiCal, an R package for an iterative FISH-based Hi-C calibration that exploits in full the information coming from these methods. We describe here our calibration model and present 3D inference methods that we have developed for increasing its usability, namely, 3D reconstruction through local stress minimization and detection of spatial inconsistencies. We next confirm our calibration across three human cell lines and explain how the output of our methods could inform our model, defining an iterative calibration pipeline, with applications for quality assessment and meta-analysis. AVAILABILITY AND IMPLEMENTATION FisHiCal v1.1 is available from http://cran.r-project.org/.
Collapse
Affiliation(s)
- Yoli Shavit
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| | - Fiona Kathryn Hamey
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| | - Pietro Lio
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|