1
|
Fang T, Liu Y, Woicik A, Lu M, Jha A, Wang X, Li G, Hristov B, Liu Z, Xu H, Noble WS, Wang S. Enhancing Hi-C contact matrices for loop detection with Capricorn: a multiview diffusion model. Bioinformatics 2024; 40:i471-i480. [PMID: 38940142 PMCID: PMC11211821 DOI: 10.1093/bioinformatics/btae211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION High-resolution Hi-C contact matrices reveal the detailed three-dimensional architecture of the genome, but high-coverage experimental Hi-C data are expensive to generate. Simultaneously, chromatin structure analyses struggle with extremely sparse contact matrices. To address this problem, computational methods to enhance low-coverage contact matrices have been developed, but existing methods are largely based on resolution enhancement methods for natural images and hence often employ models that do not distinguish between biologically meaningful contacts, such as loops and other stochastic contacts. RESULTS We present Capricorn, a machine learning model for Hi-C resolution enhancement that incorporates small-scale chromatin features as additional views of the input Hi-C contact matrix and leverages a diffusion probability model backbone to generate a high-coverage matrix. We show that Capricorn outperforms the state of the art in a cross-cell-line setting, improving on existing methods by 17% in mean squared error and 26% in F1 score for chromatin loop identification from the generated high-coverage data. We also demonstrate that Capricorn performs well in the cross-chromosome setting and cross-chromosome, cross-cell-line setting, improving the downstream loop F1 score by 14% relative to existing methods. We further show that our multiview idea can also be used to improve several existing methods, HiCARN and HiCNN, indicating the wide applicability of this approach. Finally, we use DNA sequence to validate discovered loops and find that the fraction of CTCF-supported loops from Capricorn is similar to those identified from the high-coverage data. Capricorn is a powerful Hi-C resolution enhancement method that enables scientists to find chromatin features that cannot be identified in the low-coverage contact matrix. AVAILABILITY AND IMPLEMENTATION Implementation of Capricorn and source code for reproducing all figures in this paper are available at https://github.com/CHNFTQ/Capricorn.
Collapse
Affiliation(s)
- Tangqi Fang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Yifeng Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Minsi Lu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, United States
| | - Gang Li
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
- eScience Institute, University of Washington, Seattle, WA 98195, United States
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Zixuan Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - William S Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
2
|
Liu T, Zhu H, Wang Z. Learning Micro-C from Hi-C with diffusion models. PLoS Comput Biol 2024; 20:e1012136. [PMID: 38758956 PMCID: PMC11139321 DOI: 10.1371/journal.pcbi.1012136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 05/30/2024] [Accepted: 05/05/2024] [Indexed: 05/19/2024] Open
Abstract
In the last few years, Micro-C has shown itself as an improved alternative to Hi-C. It replaced the restriction enzymes in Hi-C assays with micrococcal nuclease (MNase), resulting in capturing nucleosome resolution chromatin interactions. The signal-to-noise improvement of Micro-C allows it to detect more chromatin loops than high-resolution Hi-C. However, compared with massive Hi-C datasets available in the literature, there are only a limited number of Micro-C datasets. To take full advantage of these Hi-C datasets, we present HiC2MicroC, a computational method learning and then predicting Micro-C from Hi-C based on the denoising diffusion probabilistic models (DDPM). We trained our DDPM and other regression models in human foreskin fibroblast (HFFc6) cell line and evaluated these methods in six different cell types at 5-kb and 1-kb resolution. Our evaluations demonstrate that both HiC2MicroC and regression methods can markedly improve Hi-C towards Micro-C, and our DDPM-based HiC2MicroC outperforms regression in various terms. First, HiC2MicroC successfully recovers most of the Micro-C loops even those not detected in Hi-C maps. Second, a majority of the HiC2MicroC-recovered loops anchor CTCF binding sites in a convergent orientation. Third, HiC2MicroC loops share genomic and epigenetic properties with Micro-C loops, including linking promoters and enhancers, and their anchors are enriched for structural proteins (CTCF and cohesin) and histone modifications. Lastly, we find our recovered loops are also consistent with the loops identified from promoter capture Micro-C (PCMicro-C) and Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET). Overall, HiC2MicroC is an effective tool for further studying Hi-C data with Micro-C as a template. HiC2MicroC is publicly available at https://github.com/zwang-bioinformatics/HiC2MicroC/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Hao Zhu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|
3
|
Liu R, Xu R, Yan S, Li P, Jia C, Sun H, Sheng K, Wang Y, Zhang Q, Guo J, Xin X, Li X, Guo D. Hi-C, a chromatin 3D structure technique advancing the functional genomics of immune cells. Front Genet 2024; 15:1377238. [PMID: 38586584 PMCID: PMC10995239 DOI: 10.3389/fgene.2024.1377238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 03/13/2024] [Indexed: 04/09/2024] Open
Abstract
The functional performance of immune cells relies on a complex transcriptional regulatory network. The three-dimensional structure of chromatin can affect chromatin status and gene expression patterns, and plays an important regulatory role in gene transcription. Currently available techniques for studying chromatin spatial structure include chromatin conformation capture techniques and their derivatives, chromatin accessibility sequencing techniques, and others. Additionally, the recently emerged deep learning technology can be utilized as a tool to enhance the analysis of data. In this review, we elucidate the definition and significance of the three-dimensional chromatin structure, summarize the technologies available for studying it, and describe the research progress on the chromatin spatial structure of dendritic cells, macrophages, T cells, B cells, and neutrophils.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Dianhao Guo
- School of Clinical and Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| |
Collapse
|
4
|
Murtaza G, Jain A, Hughes M, Wagner J, Singh R. A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods. Genes (Basel) 2023; 15:54. [PMID: 38254945 PMCID: PMC10815746 DOI: 10.3390/genes15010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 12/24/2023] [Accepted: 12/26/2023] [Indexed: 01/24/2024] Open
Abstract
Hi-C is a widely used technique to study the 3D organization of the genome. Due to its high sequencing cost, most of the generated datasets are of a coarse resolution, which makes it impractical to study finer chromatin features such as Topologically Associating Domains (TADs) and chromatin loops. Multiple deep learning-based methods have recently been proposed to increase the resolution of these datasets by imputing Hi-C reads (typically called upscaling). However, the existing works evaluate these methods on either synthetically downsampled datasets, or a small subset of experimentally generated sparse Hi-C datasets, making it hard to establish their generalizability in the real-world use case. We present our framework-Hi-CY-that compares existing Hi-C resolution upscaling methods on seven experimentally generated low-resolution Hi-C datasets belonging to various levels of read sparsities originating from three cell lines on a comprehensive set of evaluation metrics. Hi-CY also includes four downstream analysis tasks, such as TAD and chromatin loops recall, to provide a thorough report on the generalizability of these methods. We observe that existing deep learning methods fail to generalize to experimentally generated sparse Hi-C datasets, showing a performance reduction of up to 57%. As a potential solution, we find that retraining deep learning-based methods with experimentally generated Hi-C datasets improves performance by up to 31%. More importantly, Hi-CY shows that even with retraining, the existing deep learning-based methods struggle to recover biological features such as chromatin loops and TADs when provided with sparse Hi-C datasets. Our study, through the Hi-CY framework, highlights the need for rigorous evaluation in the future. We identify specific avenues for improvements in the current deep learning-based Hi-C upscaling methods, including but not limited to using experimentally generated datasets for training.
Collapse
Affiliation(s)
- Ghulam Murtaza
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Atishay Jain
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Madeline Hughes
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA;
| | - Ritambhara Singh
- Department of Computer Science, Brown University, Providence, RI 02912, USA; (G.M.); (A.J.); (M.H.)
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| |
Collapse
|
5
|
Liu T, Wang Z. HiC4D: forecasting spatiotemporal Hi-C data with residual ConvLSTM. Brief Bioinform 2023; 24:bbad263. [PMID: 37478379 PMCID: PMC10516390 DOI: 10.1093/bib/bbad263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 06/12/2023] [Accepted: 06/28/2023] [Indexed: 07/23/2023] Open
Abstract
The Hi-C experiments have been extensively used for the studies of genomic structures. In the last few years, spatiotemporal Hi-C has largely contributed to the investigation of genome dynamic reorganization. However, computationally modeling and forecasting spatiotemporal Hi-C data still have not been seen in the literature. We present HiC4D for dealing with the problem of forecasting spatiotemporal Hi-C data. We designed and benchmarked a novel network and named it residual ConvLSTM (ResConvLSTM), which is a combination of residual network and convolutional long short-term memory (ConvLSTM). We evaluated our new ResConvLSTM networks and compared them with the other five methods, including a naïve network (NaiveNet) that we designed as a baseline method and four outstanding video-prediction methods from the literature: ConvLSTM, spatiotemporal LSTM (ST-LSTM), self-attention LSTM (SA-LSTM) and simple video prediction (SimVP). We used eight different spatiotemporal Hi-C datasets for the blind test, including two from mouse embryogenesis, one from somatic cell nuclear transfer (SCNT) embryos, three embryogenesis datasets from different species and two non-embryogenesis datasets. Our evaluation results indicate that our ResConvLSTM networks almost always outperform the other methods on the eight blind-test datasets in terms of accurately predicting the Hi-C contact matrices at future time-steps. Our benchmarks also indicate that all of the methods that we benchmarked can successfully recover the boundaries of topologically associating domains called on the experimental Hi-C contact matrices. Taken together, our benchmarks suggest that HiC4D is an effective tool for predicting spatiotemporal Hi-C data. HiC4D is publicly available at both http://dna.cs.miami.edu/HiC4D/ and https://github.com/zwang-bioinformatics/HiC4D/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, 33124, FL, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, 33124, FL, USA
| |
Collapse
|
6
|
Senapati S, Irshad IU, Sharma AK, Kumar H. Fundamental insights into the correlation between chromosome configuration and transcription. Phys Biol 2023; 20:051002. [PMID: 37467757 DOI: 10.1088/1478-3975/ace8e5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/19/2023] [Indexed: 07/21/2023]
Abstract
Eukaryotic chromosomes exhibit a hierarchical organization that spans a spectrum of length scales, ranging from sub-regions known as loops, which typically comprise hundreds of base pairs, to much larger chromosome territories that can encompass a few mega base pairs. Chromosome conformation capture experiments that involve high-throughput sequencing methods combined with microscopy techniques have enabled a new understanding of inter- and intra-chromosomal interactions with unprecedented details. This information also provides mechanistic insights on the relationship between genome architecture and gene expression. In this article, we review the recent findings on three-dimensional interactions among chromosomes at the compartment, topologically associating domain, and loop levels and the impact of these interactions on the transcription process. We also discuss current understanding of various biophysical processes involved in multi-layer structural organization of chromosomes. Then, we discuss the relationships between gene expression and genome structure from perturbative genome-wide association studies. Furthermore, for a better understanding of how chromosome architecture and function are linked, we emphasize the role of epigenetic modifications in the regulation of gene expression. Such an understanding of the relationship between genome architecture and gene expression can provide a new perspective on the range of potential future discoveries and therapeutic research.
Collapse
Affiliation(s)
- Swayamshree Senapati
- School of Basic Sciences, Indian Institute of Technology, Bhubaneswar, Argul, Odisha 752050, India
| | - Inayat Ullah Irshad
- Department of Physics, Indian Institute of Technology, Jammu, Jammu 181221, India
| | - Ajeet K Sharma
- Department of Physics, Indian Institute of Technology, Jammu, Jammu 181221, India
- Department of Biosciences and Bioengineering, Indian Institute of Technology Jammu, Jammu 181221, India
| | - Hemant Kumar
- School of Basic Sciences, Indian Institute of Technology, Bhubaneswar, Argul, Odisha 752050, India
| |
Collapse
|
7
|
Li K, Zhang P, Wang Z, Shen W, Sun W, Xu J, Wen Z, Li L. iEnhance: a multi-scale spatial projection encoding network for enhancing chromatin interaction data resolution. Brief Bioinform 2023; 24:bbad245. [PMID: 37381618 DOI: 10.1093/bib/bbad245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/06/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023] Open
Abstract
Although sequencing-based high-throughput chromatin interaction data are widely used to uncover genome-wide three-dimensional chromatin architecture, their sparseness and high signal-noise-ratio greatly restrict the precision of the obtained structural elements. To improve data quality, we here present iEnhance (chromatin interaction data resolution enhancement), a multi-scale spatial projection and encoding network, to predict high-resolution chromatin interaction matrices from low-resolution and noisy input data. Specifically, iEnhance projects the input data into matrix spaces to extract multi-scale global and local feature sets, then hierarchically fused these features by attention mechanism. After that, dense channel encoding and residual channel decoding are used to effectively infer robust chromatin interaction maps. iEnhance outperforms state-of-the-art Hi-C resolution enhancement tools in both visual and quantitative evaluation. Comprehensive analysis shows that unlike other tools, iEnhance can recover both short-range structural elements and long-range interaction patterns precisely. More importantly, iEnhance can be transferred to data enhancement of other tissues or cell lines of unknown resolution. Furthermore, iEnhance performs robustly in enhancement of diverse chromatin interaction data including those from single-cell Hi-C and Micro-C experiments.
Collapse
Affiliation(s)
- Kai Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zilin Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wei Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Zi Wen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
8
|
Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci 2023; 10:1168562. [PMID: 37091873 PMCID: PMC10113542 DOI: 10.3389/fmolb.2023.1168562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
Genomes are organized into nuclear compartments, separating active from inactive chromatin. Chromatin compartments are readily visible in a large number of species by experiments that map chromatin conformation genome-wide. When analyzing these maps, a common step is the identification of genomic intervals that interact within A (active) and B (inactive) compartments. It has also become increasingly common to identify and analyze subcompartments. We review different strategies to identify A/B and subcompartment intervals, including a discussion of various machine-learning approaches to predict these features. We then discuss the strengths and limitations of current strategies and examine how these aspects of analysis may have impacted our understanding of chromatin compartments.
Collapse
Affiliation(s)
| | | | | | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
9
|
Zhang S, Plummer D, Lu L, Cui J, Xu W, Wang M, Liu X, Prabhakar N, Shrinet J, Srinivasan D, Fraser P, Li Y, Li J, Jin F. DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution. Nat Genet 2022; 54:1013-1025. [PMID: 35817982 PMCID: PMC10082397 DOI: 10.1038/s41588-022-01116-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/30/2022] [Indexed: 11/09/2022]
Abstract
Mapping chromatin loops from noisy Hi-C heatmaps remains a major challenge. Here we present DeepLoop, which performs rigorous bias correction followed by deep-learning-based signal enhancement for robust chromatin interaction mapping from low-depth Hi-C data. DeepLoop enables loop-resolution, single-cell Hi-C analysis. It also achieves a cross-platform convergence between different Hi-C protocols and micrococcal nuclease (micro-C). DeepLoop allowed us to map the genetic and epigenetic determinants of allele-specific chromatin interactions in the human genome. We nominate new loci with allele-specific interactions governed by imprinting or allelic DNA methylation. We also discovered that, in the inactivated X chromosome (Xi), local loops at the DXZ4 'megadomain' boundary escape X-inactivation but the FIRRE 'superloop' locus does not. Importantly, DeepLoop can pinpoint heterozygous single-nucleotide polymorphisms and large structure variants that cause allelic chromatin loops, many of which rewire enhancers with transcription consequences. Taken together, DeepLoop expands the use of Hi-C to provide loop-resolution insights into the genetics of the three-dimensional genome.
Collapse
Affiliation(s)
- Shanshan Zhang
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.,The Biomedical Sciences Training Program, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Dylan Plummer
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Leina Lu
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Jian Cui
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Wanying Xu
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.,The Biomedical Sciences Training Program, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Miao Wang
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Xiaoxiao Liu
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Nachiketh Prabhakar
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Jatin Shrinet
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Divyaa Srinivasan
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Peter Fraser
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Yan Li
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.
| | - Jing Li
- Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA. .,Department of Population and Quantitative Health Sciences, Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH, USA.
| | - Fulai Jin
- Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA. .,Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, OH, USA. .,Department of Population and Quantitative Health Sciences, Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH, USA.
| |
Collapse
|
10
|
Huang L, Yang Y, Li G, Jiang M, Wen J, Abnousi A, Rosen JD, Hu M, Li Y. A systematic evaluation of Hi-C data enhancement methods for enhancing PLAC-seq and HiChIP data. Brief Bioinform 2022; 23:bbac145. [PMID: 35488276 PMCID: PMC9116213 DOI: 10.1093/bib/bbac145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 03/30/2022] [Accepted: 03/31/2022] [Indexed: 11/12/2022] Open
Abstract
The three-dimensional organization of chromatin plays a critical role in gene regulation. Recently developed technologies, such as HiChIP and proximity ligation-assisted ChIP-Seq (PLAC-seq) (hereafter referred to as HP for brevity), can measure chromosome spatial organization by interrogating chromatin interactions mediated by a protein of interest. While offering cost-efficiency over genome-wide unbiased high-throughput chromosome conformation capture (Hi-C) data, HP data remain sparse at kilobase (Kb) resolution with the current sequencing depth in the order of 108 reads per sample. Deep learning models, including HiCPlus, HiCNN, HiCNN2, DeepHiC and Variationally Encoded Hi-C Loss Enhancer (VEHiCLE), have been developed to enhance the sequencing depth of Hi-C data, but their performance on HP data has not been benchmarked. Here, we performed a comprehensive evaluation of HP data sequencing depth enhancement using models developed for Hi-C data. Specifically, we analyzed various HP data, including Smc1a HiChIP data of the human lymphoblastoid cell line GM12878, H3K4me3 PLAC-seq data of four human neural cell types as well as of mouse embryonic stem cells (mESC), and mESC CCCTC-binding factor (CTCF) PLAC-seq data. Our evaluations lead to the following three findings: (i) most models developed for Hi-C data achieve reasonable performance when applied to HP data (e.g. with Pearson correlation ranging 0.76-0.95 for pairs of loci within 300 Kb), and the enhanced datasets lead to improved statistical power for detecting long-range chromatin interactions, (ii) models trained on HP data outperform those trained on Hi-C data and (iii) most models are transferable across cell types. Our results provide a general guideline for HP data enhancement using existing methods designed for Hi-C data.
Collapse
Affiliation(s)
- Le Huang
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, North Carolina 27599, USA
| | - Yuchen Yang
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, 510275 Guangzhou, China
| | - Gang Li
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, NC 27599, USA
| | - Minzhi Jiang
- Department of Applied Physical Sciences, University of North Carolina at Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina 27599, USA
| | - Armen Abnousi
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio 44195
| | - Jonathan D Rosen
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina 27599, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio 44195
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, North Carolina 27599, USA
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina 27599, USA
- Department of Computer Science, University of North Carolina at Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
11
|
Hicks P, Oluwadare O. HiCARN: Resolution Enhancement of Hi-C Data Using Cascading Residual Networks. Bioinformatics 2022; 38:2414-2421. [PMID: 35274679 PMCID: PMC9048669 DOI: 10.1093/bioinformatics/btac156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/15/2022] [Accepted: 03/10/2022] [Indexed: 11/29/2022] Open
Abstract
Motivation High throughput chromosome conformation capture (Hi-C) contact matrices are used to predict 3D chromatin structures in eukaryotic cells. High-resolution Hi-C data are less available than low-resolution Hi-C data due to sequencing costs but provide greater insight into the intricate details of 3D chromatin structures such as enhancer–promoter interactions and sub-domains. To provide a cost-effective solution to high-resolution Hi-C data collection, deep learning models are used to predict high-resolution Hi-C matrices from existing low-resolution matrices across multiple cell types. Results Here, we present two Cascading Residual Networks called HiCARN-1 and HiCARN-2, a convolutional neural network and a generative adversarial network, that use a novel framework of cascading connections throughout the network for Hi-C contact matrix prediction from low-resolution data. Shown by image evaluation and Hi-C reproducibility metrics, both HiCARN models, overall, outperform state-of-the-art Hi-C resolution enhancement algorithms in predictive accuracy for both human and mouse 1/16, 1/32, 1/64 and 1/100 downsampled high-resolution Hi-C data. Also, validation by extracting topologically associating domains, chromosome 3D structure and chromatin loop predictions from the enhanced data shows that HiCARN can proficiently reconstruct biologically significant regions. Availability and implementation HiCARN can be accessed and utilized as an open-sourced software at: https://github.com/OluwadareLab/HiCARN and is also available as a containerized application that can be run on any platform. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Parker Hicks
- Concordia University Irvine, Irvine, CA 92612, USA
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, USA
| |
Collapse
|
12
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|