1
|
Fang T, Liu Y, Woicik A, Lu M, Jha A, Wang X, Li G, Hristov B, Liu Z, Xu H, Noble WS, Wang S. Enhancing Hi-C contact matrices for loop detection with Capricorn: a multiview diffusion model. Bioinformatics 2024; 40:i471-i480. [PMID: 38940142 PMCID: PMC11211821 DOI: 10.1093/bioinformatics/btae211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION High-resolution Hi-C contact matrices reveal the detailed three-dimensional architecture of the genome, but high-coverage experimental Hi-C data are expensive to generate. Simultaneously, chromatin structure analyses struggle with extremely sparse contact matrices. To address this problem, computational methods to enhance low-coverage contact matrices have been developed, but existing methods are largely based on resolution enhancement methods for natural images and hence often employ models that do not distinguish between biologically meaningful contacts, such as loops and other stochastic contacts. RESULTS We present Capricorn, a machine learning model for Hi-C resolution enhancement that incorporates small-scale chromatin features as additional views of the input Hi-C contact matrix and leverages a diffusion probability model backbone to generate a high-coverage matrix. We show that Capricorn outperforms the state of the art in a cross-cell-line setting, improving on existing methods by 17% in mean squared error and 26% in F1 score for chromatin loop identification from the generated high-coverage data. We also demonstrate that Capricorn performs well in the cross-chromosome setting and cross-chromosome, cross-cell-line setting, improving the downstream loop F1 score by 14% relative to existing methods. We further show that our multiview idea can also be used to improve several existing methods, HiCARN and HiCNN, indicating the wide applicability of this approach. Finally, we use DNA sequence to validate discovered loops and find that the fraction of CTCF-supported loops from Capricorn is similar to those identified from the high-coverage data. Capricorn is a powerful Hi-C resolution enhancement method that enables scientists to find chromatin features that cannot be identified in the low-coverage contact matrix. AVAILABILITY AND IMPLEMENTATION Implementation of Capricorn and source code for reproducing all figures in this paper are available at https://github.com/CHNFTQ/Capricorn.
Collapse
Affiliation(s)
- Tangqi Fang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Yifeng Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Minsi Lu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, United States
| | - Gang Li
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
- eScience Institute, University of Washington, Seattle, WA 98195, United States
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Zixuan Liu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - William S Noble
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
2
|
Zhang Y, Cameron CJF, Blanchette M. Posterior inference of Hi-C contact frequency through sampling. FRONTIERS IN BIOINFORMATICS 2024; 3:1285828. [PMID: 38455089 PMCID: PMC10919286 DOI: 10.3389/fbinf.2023.1285828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/20/2023] [Indexed: 03/09/2024] Open
Abstract
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
Collapse
Affiliation(s)
- Yanlin Zhang
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Christopher J. F. Cameron
- School of Computer Science, McGill University, Montréal, QC, Canada
- Department of Biochemistry and Goodman Cancer Research Center, McGill University, Montreal, QC, Canada
| | | |
Collapse
|