1
|
Raffo A, Paulsen J. The shape of chromatin: insights from computational recognition of geometric patterns in Hi-C data. Brief Bioinform 2023; 24:bbad302. [PMID: 37646128 PMCID: PMC10516369 DOI: 10.1093/bib/bbad302] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 07/05/2023] [Accepted: 08/03/2023] [Indexed: 09/01/2023] Open
Abstract
The three-dimensional organization of chromatin plays a crucial role in gene regulation and cellular processes like deoxyribonucleic acid (DNA) transcription, replication and repair. Hi-C and related techniques provide detailed views of spatial proximities within the nucleus. However, data analysis is challenging partially due to a lack of well-defined, underpinning mathematical frameworks. Recently, recognizing and analyzing geometric patterns in Hi-C data has emerged as a powerful approach. This review provides a summary of algorithms for automatic recognition and analysis of geometric patterns in Hi-C data and their correspondence with chromatin structure. We classify existing algorithms on the basis of the data representation and pattern recognition paradigm they make use of. Finally, we outline some of the challenges ahead and promising future directions.
Collapse
Affiliation(s)
- Andrea Raffo
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| | - Jonas Paulsen
- Department of Biosciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
2
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
3
|
Liu J, Li P, Sun J, Guo J. LPAD: using network construction and label propagation to detect topologically associating domains from Hi-C data. Brief Bioinform 2023; 24:7150739. [PMID: 37139561 DOI: 10.1093/bib/bbad165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 03/06/2023] [Accepted: 04/09/2023] [Indexed: 05/05/2023] Open
Abstract
With the development of chromosome conformation capture technique, the study of spatial conformation of a genome based on Hi-C technique has made a quantum leap. Previous studies reveal that genomes are folded into hierarchy of three-dimensional (3D) structures associated with topologically associating domains (TADs), and detecting TAD boundaries is of great significance in the chromosome-level analysis of 3D genome architecture. In this paper, we propose a novel TAD identification method, LPAD, which first extracts node correlations from global interactions of chromosomes based on the random walk with restart and then builds an undirected graph from Hi-C contact matrix. Next, LPAD designs a label propagation-based approach to discover communities and generates TADs. Experimental results verify the effectiveness and quality of TAD detections compared with existing methods. Furthermore, experimental evaluation of chromatin immunoprecipitation sequencing data shows that LPAD performs high enrichment of histone modifications remarkably nearby the TAD boundaries, and these results demonstrate LPAD's advantages on TAD identification accuracy.
Collapse
Affiliation(s)
- Jian Liu
- College of Computer Science, Nankai University, Tianjin 300071, China
| | - Pingjing Li
- College of Computer Science, Nankai University, Tianjin 300071, China
| | - Jialiang Sun
- College of Computer Science, Nankai University, Tianjin 300071, China
- Centre for Bioinformatics and Intelligent Medicine, Nankai University, Tianjin 300071, China
| | - Jun Guo
- College of Software, Northeastern University, Shenyang 110819, China
| |
Collapse
|
4
|
Dang D, Zhang SW, Duan R, Zhang S. Defining the separation landscape of topological domains for decoding consensus domain organization of the 3D genome. Genome Res 2023; 33:386-400. [PMID: 36894325 PMCID: PMC10078287 DOI: 10.1101/gr.277187.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 02/23/2023] [Indexed: 03/11/2023]
Abstract
Topologically associating domains (TADs) have emerged as basic structural and functional units of genome organization and have been determined by many computational methods from Hi-C contact maps. However, the TADs obtained by different methods vary greatly, which makes the accurate determination of TADs a challenging issue and hinders subsequent biological analyses about their organization and functions. Obvious inconsistencies among the TADs identified by different methods indeed make the statistical and biological properties of TADs overly depend on the chosen method rather than on the data. To this end, we use the consensus structural information captured by these methods to define the TAD separation landscape for decoding the consensus domain organization of the 3D genome. We show that the TAD separation landscape could be used to compare domain boundaries across multiple cell types for discovering conserved and divergent topological structures, decipher three types of boundary regions with diverse biological features, and identify consensus TADs (ConsTADs). We illustrate that these analyses could deepen our understanding of the relationships between the topological domains and chromatin states, gene expression, and DNA replication timing.
Collapse
Affiliation(s)
- Dachang Dang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an 710072, China;
| | - Ran Duan
- Department of Software Engineering, Yunnan University, Kunming 650500, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
5
|
Ye Y, Zhang S, Gao L, Zhu Y, Zhang J. Deciphering Hierarchical Chromatin Domains and Preference of Genomic Position Forming Boundaries in Single Mouse Embryonic Stem Cells. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2205162. [PMID: 36658736 PMCID: PMC10015865 DOI: 10.1002/advs.202205162] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 12/15/2022] [Indexed: 06/17/2023]
Abstract
The exploration of single-cell 3D genome maps reveals that chromatin domains are indeed physical structures presenting in single cells, and domain boundaries vary from cell to cell. However, systematic analysis of the association between regulatory factor binding and elements and the formation of chromatin domains in single cells has not yet emerged. To this end, a hierarchical chromatin domain structure identification algorithm (named as HiCS) is first developed from individual single-cell Hi-C maps, with superior performance in both accuracy and efficiency. The results suggest that in addition to the known CTCF-cohesin complex, Polycomb, TrxG, pluripotent protein families, and other multiple factors also contribute to shaping chromatin domain boundaries in single embryonic stem cells. Different cooperation patterns of these regulatory factors drive genomic position categories with differential preferences forming boundaries, and the most extensive six types of retrotransposons are differentially distributed in these genomic position categories with preferential localization. The above results suggest that these different retrotransposons within genomic regions interplay with regulatory factors navigating the preference of genomic positions forming boundaries, driving the formation of higher-order chromatin structures, and thus regulating cell functions in single mouse embryonic stem cells.
Collapse
Affiliation(s)
- Yusen Ye
- School of Computer Science and TechnologyXidian UniversityXi'anShaanxi710071P. R. China
| | - Shihua Zhang
- NCMISCEMSRCSDSAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190P. R. China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049P. R. China
- Center for Excellence in Animal Evolution and GeneticsChinese Academy of SciencesKunming650223P. R. China
| | - Lin Gao
- School of Computer Science and TechnologyXidian UniversityXi'anShaanxi710071P. R. China
| | - Yuqing Zhu
- Center for Stem Cell and Translational MedicineSchool of Life SciencesAnhui UniversityHefeiAnhui230601P. R. China
| | - Jin Zhang
- Center for Stem Cell and Regenerative MedicineDepartment of Basic Medical Sciences, and Bone Marrow Transplantation Center of the First Affiliated HospitalZhejiang University School of MedicineHangzhouZhejiang310003P. R. China
- Zhejiang Laboratory for Systems and Precision MedicineZhejiang University Medical CenterHangzhouZhejiang311121P. R. China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310058P. R. China
- Center of Gene/Cell Engineering and Genome MedicineHangzhouZhejiang310058P. R. China
| |
Collapse
|
6
|
Liu K, Li HD, Li Y, Wang J, Wang J. A Comparison of Topologically Associating Domain Callers Based on Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:15-29. [PMID: 35104223 DOI: 10.1109/tcbb.2022.3147805] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Topologically associating domains (TADs) are local chromatin interaction domains, which have been shown to play an important role in gene expression regulation. TADs were originally discovered in the investigation of 3D genome organization based on High-throughput Chromosome Conformation Capture (Hi-C) data. Continuous considerable efforts have been dedicated to developing methods for detecting TADs from Hi-C data. Different computational methods for TADs identification vary in their assumptions and criteria in calling TADs. As a consequence, the TADs called by these methods differ in their similarities and biological features they are enriched in. In this work, we performed a systematic comparison of twenty-six TAD callers. We first compared the TADs and gaps between adjacent TADs across different methods, resolutions, and sequencing depths. We then assessed the quality of TADs and TAD boundaries according to three criteria: the decay of contact frequencies over the genomic distance, enrichment and depletion of regulatory elements around TAD boundaries, and reproducibility of TADs and TAD boundaries in replicate samples. Last, due to the lack of a gold standard of TADs, we also evaluated the performance of the methods on synthetic datasets. We discussed the key principles of TAD callers, and pinpointed current situation in the detection of TADs. We provide a concise, comprehensive, and systematic framework for evaluating the performance of TAD callers, and expect our work will provide useful guidance in choosing suitable approaches for the detection and evaluation of TADs.
Collapse
|
7
|
Sefer E. A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics 2022; 23:127. [PMID: 35413815 PMCID: PMC9006547 DOI: 10.1186/s12859-022-04674-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 04/07/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Topologically associating domains (TADs) are locally highly-interacting genome regions, which also play a critical role in regulating gene expression in the cell. TADs have been first identified while investigating the 3D genome structure over High-throughput Chromosome Conformation Capture (Hi-C) interaction dataset. Substantial degree of efforts have been devoted to develop techniques for inferring TADs from Hi-C interaction dataset. Many TAD-calling methods have been developed which differ in their criteria and assumptions in TAD inference. Correspondingly, TADs inferred via these callers vary in terms of both similarities and biological features they are enriched in. RESULT We have carried out a systematic comparison of 27 TAD-calling methods over mammals. We use Micro-C, a recent high-resolution variant of Hi-C, to compare TADs at a very high resolution, and classify the methods into 3 categories: feature-based methods, Clustering methods, Graph-partitioning methods. We have evaluated TAD boundaries, gaps between adjacent TADs, and quality of TADs across various criteria. We also found particularly CTCF and Cohesin proteins to be effective in formation of TADs with corner dots. We have also assessed the callers performance on simulated datasets since a gold standard for TADs is missing. TAD sizes and numbers change remarkably between TAD callers and dataset resolutions, indicating that TADs are hierarchically-organized domains, instead of disjoint regions. A core subset of feature-based TAD callers regularly perform the best while inferring reproducible domains, which are also enriched for TAD related biological properties. CONCLUSION We have analyzed the fundamental principles of TAD-calling methods, and identified the existing situation in TAD inference across high resolution Micro-C interaction datasets over mammals. We come up with a systematic, comprehensive, and concise framework to evaluate the TAD-calling methods performance across Micro-C datasets. Our research will be useful in selecting appropriate methods for TAD inference and evaluation based on available data, experimental design, and biological question of interest. We also introduce our analysis as a benchmarking tool with publicly available source code.
Collapse
Affiliation(s)
- Emre Sefer
- Department of Computer Science, Ozyegin University, Istanbul, Turkey.
| |
Collapse
|
8
|
Mapping nucleosome and chromatin architectures: A survey of computational methods. Comput Struct Biotechnol J 2022; 20:3955-3962. [PMID: 35950186 PMCID: PMC9340519 DOI: 10.1016/j.csbj.2022.07.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/22/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
With ever-growing genomic sequencing data, the data variabilities and the underlying biases of the sequencing technologies pose significant computational challenges ranging from the need for accurately detecting the nucleosome positioning or chromatin interaction to the need for developing normalization methods to eliminate systematic biases. This review mainly surveys the computational methods for mapping the higher-resolution nucleosome and higher-order chromatin architectures. While a detailed discussion of the underlying algorithms is beyond the scope of our survey, we have discussed the methods and tools that can detect the nucleosomes in the genome, then demonstrated the computational methods for identifying 3D chromatin domains and interactions. We further illustrated computational approaches for integrating multi-omics data with Hi-C data and the advance of single-cell (sc)Hi-C data analysis. Our survey provides a comprehensive and valuable resource for biomedical scientists interested in studying nucleosome organization and chromatin structures as well as for computational scientists who are interested in improving upon them.
Collapse
|
9
|
Wang W, Gao L, Ye Y, Gao Y. CCIP: Predicting CTCF-mediated chromatin loops with transitivity. Bioinformatics 2021; 37:4635-4642. [PMID: 34289010 PMCID: PMC8665748 DOI: 10.1093/bioinformatics/btab534] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 06/18/2021] [Accepted: 07/19/2021] [Indexed: 11/14/2022] Open
Abstract
Motivation CTCF-mediated chromatin loops underlie the formation of topological associating domains and serve as the structural basis for transcriptional regulation. However, the formation mechanism of these loops remains unclear, and the genome-wide mapping of these loops is costly and difficult. Motivated by the recent studies on the formation mechanism of CTCF-mediated loops, we studied the possibility of making use of transitivity-related information of interacting CTCF anchors to predict CTCF loops computationally. In this context, transitivity arises when two CTCF anchors interact with the same third anchor by the loop extrusion mechanism and bring themselves close to each other spatially to form an indirect loop. Results To determine whether transitivity is informative for predicting CTCF loops and to obtain an accurate and low-cost predicting method, we proposed a two-stage random-forest-based machine learning method, CTCF-mediated Chromatin Interaction Prediction (CCIP), to predict CTCF-mediated chromatin loops. Our two-stage learning approach makes it possible for us to train a prediction model by taking advantage of transitivity-related information as well as functional genome data and genomic data. Experimental studies showed that our method predicts CTCF-mediated loops more accurately than other methods and that transitivity, when used as a properly defined attribute, is informative for predicting CTCF loops. Furthermore, we found that transitivity explains the formation of tandem CTCF loops and facilitates enhancer–promoter interactions. Our work contributes to the understanding of the formation mechanism and function of CTCF-mediated chromatin loops. Availability and implementation The source code of CCIP can be accessed at: https://github.com/GaoLabXDU/CCIP. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weibing Wang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, 710071, China
| | - Yong Gao
- Department of Computer Science, The University of British Columbia Okanagan, Kelowna, BC, V1V 1V5, Canada
| |
Collapse
|
10
|
SBTD: A Novel Method for Detecting Topological Associated Domains from Hi-C Data. Interdiscip Sci 2021; 13:638-651. [PMID: 34160760 DOI: 10.1007/s12539-021-00453-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 06/16/2021] [Accepted: 06/17/2021] [Indexed: 10/21/2022]
Abstract
The development of Hi-C technology has generated terabytes of chromatin interaction data, which bring possibilities for insight study of chromatin structure. Several studies revealed that mammalian chromosomes are folded into topological associated domains (TADs), which are conserved across cell types. Accurate detection of topological associated domains is now a vital process for revealing the relationship between the structure and function of genome organization. Unfortunately, the current TAD detection methods require massive computing resources, careful parameter adjustment and/or encounter inconsistent results. In this paper, we propose a novel method, Spectral-Based TAD Detector (SBTD), and evaluate its performance with a set of widely accepted statistical methods. We treat the chromatin interaction matrix as a graph and first introduce cosine similarity as a measure of the interaction patterns between bins. The results show that SBTD identifies higher quality TADs than the popular methods (DomainCaller, TopDom and SpectralTAD) and the internal bins of TADs identified by SBTD have higher correlation. Besides, The TADs identified by SBTD show a highly similar histone modification signal enrichment pattern at the boundary as reported in the previous literature. Finally, the motif enrichment analysis shows that compared with the background region, the DNA motifs of known insulator proteins are significantly enriched in the TAD boundary region identified by our method, which proves the high performance of our proposed method. Overall, SBTD is much more effective than existing methods with only one easy-to-adjust parameter, cluster number, for which we provide optimization guidelines.
Collapse
|
11
|
Dong K, Zhang S. Joint reconstruction of cis-regulatory interaction networks across multiple tissues using single-cell chromatin accessibility data. Brief Bioinform 2020; 22:5860691. [PMID: 32578841 PMCID: PMC8138825 DOI: 10.1093/bib/bbaa120] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 05/16/2020] [Accepted: 05/18/2020] [Indexed: 12/11/2022] Open
Abstract
The rapid accumulation of single-cell chromatin accessibility data offers a unique opportunity to investigate common and specific regulatory mechanisms across different cell types. However, existing methods for cis-regulatory network reconstruction using single-cell chromatin accessibility data were only designed for cells belonging to one cell type, and resulting networks may be incomparable directly due to diverse cell numbers of different cell types. Here, we adopt a computational method to jointly reconstruct cis-regulatory interaction maps (JRIM) of multiple cell populations based on patterns of co-accessibility in single-cell data. We applied JRIM to explore common and specific regulatory interactions across multiple tissues from single-cell ATAC-seq dataset containing ~80 000 cells across 13 mouse tissues. Reconstructed common interactions among 13 tissues indeed relate to basic biological functions, and individual cis-regulatory networks show strong tissue specificity and functional relevance. More importantly, tissue-specific regulatory interactions are mediated by coordination of histone modifications and tissue-related TFs, and many of them may reveal novel regulatory mechanisms.
Collapse
Affiliation(s)
- Kangning Dong
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences
| | - Shihua Zhang
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences
| |
Collapse
|