1
|
Jorge E, Foissac S, Neuvial P, Zytnicki M, Vialaneix N. A comprehensive review and benchmark of differential analysis tools for Hi-C data. Brief Bioinform 2025; 26:bbaf074. [PMID: 40037641 PMCID: PMC11879411 DOI: 10.1093/bib/bbaf074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 01/24/2025] [Accepted: 02/10/2025] [Indexed: 03/06/2025] Open
Abstract
MOTIVATION The 3D organization of the genome plays a crucial role in various biological processes. Hi-C technology is widely used to investigate chromosome structures by quantifying 3D proximity between genomic regions. While numerous computational tools exist for detecting differences in Hi-C data between conditions, a comprehensive review and benchmark comparing their effectiveness is lacking. RESULTS This study offers a comprehensive review and benchmark of 10 generic tools for differential analysis of Hi-C matrices at the interaction count level. The benchmark assesses the statistical methods, usability, and performance (in terms of precision and power) of these tools, using both real and simulated Hi-C data. Results reveal a striking variability in performance among the tools, highlighting the substantial impact of preprocessing filters and the difficulty all tools encounter in effectively controlling the false discovery rate across varying resolutions and chromosome sizes. AVAILABILITY The complete benchmark is available at https://forgemia.inra.fr/scales/replication-chrocodiff using processed data deposited at https://doi.org/10.57745/LR0W9R. CONTACT nathalie.vialaneix@inrae.fr.
Collapse
Affiliation(s)
- Elise Jorge
- GenPhySE, Université de Toulouse, INRAE, ENVT, 31326 Castanet-Tolosan, France
| | - Sylvain Foissac
- GenPhySE, Université de Toulouse, INRAE, ENVT, 31326 Castanet-Tolosan, France
| | - Pierre Neuvial
- Institut de Mathématiques de Toulouse, UMR 5219, Université de Toulouse, CNRS UPS, 31062 Toulouse, France
| | - Matthias Zytnicki
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| | - Nathalie Vialaneix
- Université Fédérale de Toulouse, INRAE, MIAT, 31326 Castanet-Tolosan, France
| |
Collapse
|
2
|
Tavallaee G, Orouji E. Mapping the 3D genome architecture. Comput Struct Biotechnol J 2024; 27:89-101. [PMID: 39816913 PMCID: PMC11732852 DOI: 10.1016/j.csbj.2024.12.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 12/17/2024] [Accepted: 12/20/2024] [Indexed: 01/18/2025] Open
Abstract
The spatial organization of the genome plays a critical role in regulating gene expression, cellular differentiation, and genome stability. This review provides an in-depth examination of the methodologies, computational tools, and frameworks developed to map the three-dimensional (3D) architecture of the genome, focusing on both ligation-based and ligation-free techniques. We also explore the limitations of these methods, including biases introduced by restriction enzyme digestion and ligation inefficiencies, and compare them to more recent ligation-free approaches such as Genome Architecture Mapping (GAM) and Split-Pool Recognition of Interactions by Tag Extension (SPRITE). These techniques offer unique insights into higher-order chromatin structures by bypassing ligation steps, thus enabling the capture of complex multi-way interactions that are often challenging to resolve with traditional methods. Furthermore, we discuss the integration of chromatin interaction data with other genomic layers through multimodal approaches, including recent advances in single-cell technologies like sci-HiC and scSPRITE, which help unravel the heterogeneity of chromatin architecture in development and disease.
Collapse
Affiliation(s)
- Ghazaleh Tavallaee
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Elias Orouji
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| |
Collapse
|
3
|
Zhou X, Wu H. scHiClassifier: a deep learning framework for cell type prediction by fusing multiple feature sets from single-cell Hi-C data. Brief Bioinform 2024; 26:bbaf009. [PMID: 39831891 PMCID: PMC11744636 DOI: 10.1093/bib/bbaf009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Revised: 12/01/2024] [Accepted: 01/06/2025] [Indexed: 01/22/2025] Open
Abstract
Single-cell high-throughput chromosome conformation capture (Hi-C) technology enables capturing chromosomal spatial structure information at the cellular level. However, to effectively investigate changes in chromosomal structure across different cell types, there is a requisite for methods that can identify cell types utilizing single-cell Hi-C data. Current frameworks for cell type prediction based on single-cell Hi-C data are limited, often struggling with features interpretability and biological significance, and lacking convincing and robust classification performance validation. In this study, we propose four new feature sets based on the contact matrix with clear interpretability and biological significance. Furthermore, we develop a novel deep learning framework named scHiClassifier based on multi-head self-attention encoder, 1D convolution and feature fusion, which integrates information from these four feature sets to predict cell types accurately. Through comprehensive comparison experiments with benchmark frameworks on six datasets, we demonstrate the superior classification performance and the universality of the scHiClassifier framework. We further assess the robustness of scHiClassifier through data perturbation experiments and data dropout experiments. Moreover, we demonstrate that using all feature sets in the scHiClassifier framework yields optimal performance, supported by comparisons of different feature set combinations. The effectiveness and the superiority of the multiple feature set extraction are proven by comparison with four unsupervised dimensionality reduction methods. Additionally, we analyze the importance of different feature sets and chromosomes using the "SHapley Additive exPlanations" method. Furthermore, the accuracy and reliability of the scHiClassifier framework in cell classification for single-cell Hi-C data are supported through enrichment analysis. The source code of scHiClassifier is freely available at https://github.com/HaoWuLab-Bioinformatics/scHiClassifier.
Collapse
Affiliation(s)
- Xiangfei Zhou
- School of Software, Shandong University, No. 1500, Shunhua Road, Hi-Tech Industrial Development Zone, Jinan 250100, Shandong, China
| | - Hao Wu
- School of Software, Shandong University, No. 1500, Shunhua Road, Hi-Tech Industrial Development Zone, Jinan 250100, Shandong, China
- Shenzhen Research Institute of Shandong University, Shandong University, No. 19, Gaoxin South 4th Road, Nanshan District, Shenzhen 518063, Guangdong, China
| |
Collapse
|
4
|
Wang Y, Kong S, Zhou C, Wang Y, Zhang Y, Fang Y, Li G. A review of deep learning models for the prediction of chromatin interactions with DNA and epigenomic profiles. Brief Bioinform 2024; 26:bbae651. [PMID: 39708837 DOI: 10.1093/bib/bbae651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/29/2024] [Accepted: 12/03/2024] [Indexed: 12/23/2024] Open
Abstract
Advances in three-dimensional (3D) genomics have revealed the spatial characteristics of chromatin interactions in gene expression regulation, which is crucial for understanding molecular mechanisms in biological processes. High-throughput technologies like ChIA-PET, Hi-C, and their derivatives methods have greatly enhanced our knowledge of 3D chromatin architecture. However, the chromatin interaction mechanisms remain largely unexplored. Deep learning, with its powerful feature extraction and pattern recognition capabilities, offers a promising approach for integrating multi-omics data, to build accurate predictive models of chromatin interaction matrices. This review systematically summarizes recent advances in chromatin interaction matrix prediction models. By integrating DNA sequences and epigenetic signals, we investigate the latest developments in these methods. This article details various models, focusing on how one-dimensional (1D) information transforms into the 3D structure chromatin interactions, and how the integration of different deep learning modules specifically affects model accuracy. Additionally, we discuss the critical role of DNA sequence information and epigenetic markers in shaping 3D genome interaction patterns. Finally, this review addresses the challenges in predicting chromatin interaction matrices, in order to improve the precise mapping of chromatin interaction matrices and DNA sequence, and supporting the transformation and theoretical development of 3D genomics across biological systems.
Collapse
Affiliation(s)
- Yunlong Wang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Siyuan Kong
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
| | - Cong Zhou
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Yanfang Wang
- State Key Laboratory of Animal Biotech Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), No. 2 West Yuanmingyuan Rd, Haidian District, Beijing 100193, China
| | - Yubo Zhang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No. 97 Buxin Road, Dapeng New District, Shenzhen 518120, China
- Sequencing Facility, Frederick National Laboratory for Cancer Research, 8560 Progress Drive, Frederick, MD 21701, United States
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- Hubei Engineering Technology Research Center of Agricultural Big Data, 3D Genomics Research Center, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
- College of Informatics, Huazhong Agricultural University, No. 1 Shizishan Street, Hongshan District, Wuhan 430070, China
| |
Collapse
|
5
|
Rossini R, Oshaghi M, Nekrasov M, Bellanger A, Domaschenz R, Dijkwel Y, Abdelhalim M, Collas P, Tremethick D, Paulsen J. Loss of multi-level 3D genome organization during breast cancer progression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.26.568711. [PMID: 38076897 PMCID: PMC10705249 DOI: 10.1101/2023.11.26.568711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Breast cancer entails intricate alterations in genome organization and expression. However, how three-dimensional (3D) chromatin structure changes in the progression from a normal to a breast cancer malignant state remains unknown. To address this, we conducted an analysis combining Hi-C data with lamina-associated domains (LADs), epigenomic marks, and gene expression in an in vitro model of breast cancer progression. Our results reveal that while the fundamental properties of topologically associating domains (TADs) are overall maintained, significant changes occur in the organization of compartments and subcompartments. These changes are closely correlated with alterations in the expression of oncogenic genes. We also observe a restructuring of TAD-TAD interactions, coinciding with a loss of spatial compartmentalization and radial positioning of the 3D genome. Notably, we identify a previously unrecognized interchromosomal insertion event, wherein a locus on chromosome 8 housing the MYC oncogene is inserted into a highly active subcompartment on chromosome 10. This insertion is accompanied by the formation of de novo enhancer contacts and activation of MYC, illustrating how structural genomic variants can alter the 3D genome to drive oncogenic states. In summary, our findings provide evidence for the loss of genome organization at multiple scales during breast cancer progression revealing novel relationships between genome 3D structure and oncogenic processes.
Collapse
Affiliation(s)
- Roberto Rossini
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
| | - Mohammadsaleh Oshaghi
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
| | - Maxim Nekrasov
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Aurélie Bellanger
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
| | - Renae Domaschenz
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Yasmin Dijkwel
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Mohamed Abdelhalim
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
| | - Philippe Collas
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway
- Department of Immunology and Transfusion Medicine, Oslo University Hospital, 0424 Oslo, Norway
| | - David Tremethick
- Department of Genome Sciences, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory, Australia
| | - Jonas Paulsen
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, 0316 Oslo, Norway
- Centre for Bioinformatics, Department of Informatics, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
6
|
Wu Y, Shi Z, Zhou X, Zhang P, Yang X, Ding J, Wu H. scHiCyclePred: a deep learning framework for predicting cell cycle phases from single-cell Hi-C data using multi-scale interaction information. Commun Biol 2024; 7:923. [PMID: 39085477 PMCID: PMC11291681 DOI: 10.1038/s42003-024-06626-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 07/24/2024] [Indexed: 08/02/2024] Open
Abstract
The emergence of single-cell Hi-C (scHi-C) technology has provided unprecedented opportunities for investigating the intricate relationship between cell cycle phases and the three-dimensional (3D) structure of chromatin. However, accurately predicting cell cycle phases based on scHi-C data remains a formidable challenge. Here, we present scHiCyclePred, a prediction model that integrates multiple feature sets to leverage scHi-C data for predicting cell cycle phases. scHiCyclePred extracts 3D chromatin structure features by incorporating multi-scale interaction information. The comparative analysis illustrates that scHiCyclePred surpasses existing methods such as Nagano_method and CIRCLET across various metrics including accuracy (ACC), F1 score, Precision, Recall, and balanced accuracy (BACC). In addition, we evaluate scHiCyclePred against the previously published CIRCLET using the dataset of complex tissues (Liu_dataset). Experimental results reveal significant improvements with scHiCyclePred exhibiting improvements of 0.39, 0.52, 0.52, and 0.39 over the CIRCLET in terms of ACC, F1 score, Precision, and Recall metrics, respectively. Furthermore, we conduct analyses on three-dimensional chromatin dynamics and gene features during the cell cycle, providing a more comprehensive understanding of cell cycle dynamics through chromatin structure. scHiCyclePred not only offers insights into cell biology but also holds promise for catalyzing breakthroughs in disease research. Access scHiCyclePred on GitHub at https:// github.com/HaoWuLab-Bioinformatics/ scHiCyclePred .
Collapse
Affiliation(s)
- Yingfu Wu
- School of Software, Shandong University, Jinan, Shandong, China
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Zhenqi Shi
- School of Software, Shandong University, Jinan, Shandong, China
| | - Xiangfei Zhou
- School of Software, Shandong University, Jinan, Shandong, China
| | - Pengyu Zhang
- College of Information Engineering, Northwest A&F University, Yangling, Shaanxi, China
| | - Xiuhui Yang
- School of Software, Shandong University, Jinan, Shandong, China
| | - Jun Ding
- Department of Medicine, Meakins-Christie Laboratories, McGill University, Montreal, QC, Canada.
| | - Hao Wu
- School of Software, Shandong University, Jinan, Shandong, China.
- Shenzhen Research Institute of Shandong University, Shenzhen, Guangdong, China.
| |
Collapse
|
7
|
Shi Z, Wu H. CTPredictor: A comprehensive and robust framework for predicting cell types by integrating multi-scale features from single-cell Hi-C data. Comput Biol Med 2024; 173:108336. [PMID: 38513390 DOI: 10.1016/j.compbiomed.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/01/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Single-cell Hi-C (scHi-C) has emerged as a powerful technology for deciphering cell-to-cell variability in three-dimensional (3D) chromatin organization, providing insights into genome-wide chromatin interactions and their correlation with cellular functions. Nevertheless, the accurate identification of cell types across different datasets remains a formidable challenge, hindering comprehensive investigations into genome structure. In response, we introduce CTPredictor, an innovative computational method that integrates multi-scale features to accurately predict cell types in various datasets. CTPredictor strategically incorporates three distinct feature sets, namely, small intra-domain contact probability (SICP), smoothed small intra-domain contact probability (SSICP), and smoothed bin contact probability (SBCP). The resulting fusion classification model significantly enhances the accuracy of cell type prediction based on single-cell Hi-C data (scHi-C). Rigorous benchmarking against established methods and three conventional machine learning approaches demonstrates the robust performance of CTPredictor, positioning it as an advanced tool for cell type prediction within scHi-C data. Beyond its prediction capabilities, CTPredictor holds promise in illuminating 3D genome structures and their functional significance across a wide array of biological processes.
Collapse
Affiliation(s)
- Zhenqi Shi
- School of Software, Shandong University, 250100, Jinan, China
| | - Hao Wu
- School of Software, Shandong University, 250100, Jinan, China.
| |
Collapse
|
8
|
Liu R, Xu R, Yan S, Li P, Jia C, Sun H, Sheng K, Wang Y, Zhang Q, Guo J, Xin X, Li X, Guo D. Hi-C, a chromatin 3D structure technique advancing the functional genomics of immune cells. Front Genet 2024; 15:1377238. [PMID: 38586584 PMCID: PMC10995239 DOI: 10.3389/fgene.2024.1377238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 03/13/2024] [Indexed: 04/09/2024] Open
Abstract
The functional performance of immune cells relies on a complex transcriptional regulatory network. The three-dimensional structure of chromatin can affect chromatin status and gene expression patterns, and plays an important regulatory role in gene transcription. Currently available techniques for studying chromatin spatial structure include chromatin conformation capture techniques and their derivatives, chromatin accessibility sequencing techniques, and others. Additionally, the recently emerged deep learning technology can be utilized as a tool to enhance the analysis of data. In this review, we elucidate the definition and significance of the three-dimensional chromatin structure, summarize the technologies available for studying it, and describe the research progress on the chromatin spatial structure of dendritic cells, macrophages, T cells, B cells, and neutrophils.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Dianhao Guo
- School of Clinical and Basic Medical Sciences, Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, Shandong, China
| |
Collapse
|
9
|
Chowdhury HMAM, Boult T, Oluwadare O. Comparative study on chromatin loop callers using Hi-C data reveals their effectiveness. BMC Bioinformatics 2024; 25:123. [PMID: 38515011 PMCID: PMC10958853 DOI: 10.1186/s12859-024-05713-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/19/2024] [Indexed: 03/23/2024] Open
Abstract
BACKGROUND Chromosome is one of the most fundamental part of cell biology where DNA holds the hierarchical information. DNA compacts its size by forming loops, and these regions house various protein particles, including CTCF, SMC3, H3 histone. Numerous sequencing methods, such as Hi-C, ChIP-seq, and Micro-C, have been developed to investigate these properties. Utilizing these data, scientists have developed a variety of loop prediction techniques that have greatly improved their methods for characterizing loop prediction and related aspects. RESULTS In this study, we categorized 22 loop calling methods and conducted a comprehensive study of 11 of them. Additionally, we have provided detailed insights into the methodologies underlying these algorithms for loop detection, categorizing them into five distinct groups based on their fundamental approaches. Furthermore, we have included critical information such as resolution, input and output formats, and parameters. For this analysis, we utilized the GM12878 Hi-C datasets at 5 KB, 10 KB, 100 KB and 250 KB resolutions. Our evaluation criteria encompassed various factors, including memory usages, running time, sequencing depth, and recovery of protein-specific sites such as CTCF, H3K27ac, and RNAPII. CONCLUSION This analysis offers insights into the loop detection processes of each method, along with the strengths and weaknesses of each, enabling readers to effectively choose suitable methods for their datasets. We evaluate the capabilities of these tools and introduce a novel Biological, Consistency, and Computational robustness score ( B C C score ) to measure their overall robustness ensuring a comprehensive evaluation of their performance.
Collapse
Affiliation(s)
- H M A Mohit Chowdhury
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA
| | - Terrance Boult
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado at Colorado Springs, 1420 Austin Bluffs Pkwy, Colorado Springs, CO, 80918, USA.
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
10
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
11
|
Yang J, Zhu X, Wang R, Li M, Tang Q. Revisiting Assessment of Computational Methods for Hi-C Data Analysis. Int J Mol Sci 2023; 24:13814. [PMID: 37762117 PMCID: PMC10531246 DOI: 10.3390/ijms241813814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/30/2023] [Accepted: 09/03/2023] [Indexed: 09/29/2023] Open
Abstract
The performances of algorithms for Hi-C data preprocessing, the identification of topologically associating domains, and the detection of chromatin interactions and promoter-enhancer interactions have been mostly evaluated using semi-quantitative or synthetic data approaches, without utilizing the most recent methods, since 2017. In this study, we comprehensively evaluated 24 popular state-of-the-art methods for the complete end-to-end pipeline of Hi-C data analysis, using manually curated or experimentally validated benchmark datasets, including a CRISPR dataset for promoter-enhancer interaction validation. Our results indicate that, although no single method exhibited superior performance in all situations, HiC-Pro, DomainCaller, and Fit-Hi-C2 showed relatively balanced performances of most evaluation metrics for preprocessing, topologically associating domain identification, and chromatin interaction/promoter-enhancer interaction detection, respectively. The comprehensive comparison presented in this manuscript provides a reference for researchers to choose Hi-C analysis tools that best suit their needs.
Collapse
Affiliation(s)
- Jing Yang
- Livestock and Poultry Multi-Omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China; (J.Y.); (X.Z.); (R.W.)
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| | - Xingxing Zhu
- Livestock and Poultry Multi-Omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China; (J.Y.); (X.Z.); (R.W.)
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| | - Rui Wang
- Livestock and Poultry Multi-Omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China; (J.Y.); (X.Z.); (R.W.)
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| | - Mingzhou Li
- Livestock and Poultry Multi-Omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China; (J.Y.); (X.Z.); (R.W.)
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| | - Qianzi Tang
- Livestock and Poultry Multi-Omics Key Laboratory of Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, Sichuan Agricultural University, Chengdu 611130, China; (J.Y.); (X.Z.); (R.W.)
- Animal Breeding and Genetics Key Laboratory of Sichuan Province, Sichuan Animal Science Academy, Chengdu 610066, China
| |
Collapse
|
12
|
Okabe A, Kaneda A. Hi-C Analysis to Identify Genome-Wide Chromatin Structural Aberration in Cancer. Methods Mol Biol 2023; 2519:127-140. [PMID: 36066718 DOI: 10.1007/978-1-0716-2433-3_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Hi-C is a method that analyzes genome-wide chromatin structure using next-generation sequencer. Chromatin structure is crucial for regulating transcription or replication, and Hi-C has revealed the hierarchical chromatin structures, such as loop, domain , and compartment structures. Aberrant alteration of these structures causes disease, and a number of structural aberrations in cancer cells have been reported recently. Besides, Hi-C can identify chromosome rearrangements that frequently occurred in cancer. Therefore, Hi-C is a powerful technique to analyze epigenomic and genomic aberrations in tumorigenesis. Here we will introduce the basic protocol of Hi-C in experimental and analytical aspects.
Collapse
Affiliation(s)
- Atsushi Okabe
- Department of Molecular Oncology, Graduate School of Medicine, Chiba University, Chiba, Japan
| | - Atsushi Kaneda
- Department of Molecular Oncology, Graduate School of Medicine, Chiba University, Chiba, Japan.
| |
Collapse
|
13
|
Agarwal A, Chen L. DeepPHiC: predicting promoter-centered chromatin interactions using a novel deep learning approach. Bioinformatics 2023; 39:6887158. [PMID: 36495179 PMCID: PMC9825766 DOI: 10.1093/bioinformatics/btac801] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 11/23/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Promoter-centered chromatin interactions, which include promoter-enhancer (PE) and promoter-promoter (PP) interactions, are important to decipher gene regulation and disease mechanisms. The development of next-generation sequencing technologies such as promoter capture Hi-C (pcHi-C) leads to the discovery of promoter-centered chromatin interactions. However, pcHi-C experiments are expensive and thus may be unavailable for tissues/cell types of interest. In addition, these experiments may be underpowered due to insufficient sequencing depth or various artifacts, which results in a limited finding of interactions. Most existing computational methods for predicting chromatin interactions are based on in situ Hi-C and can detect chromatin interactions across the entire genome. However, they may not be optimal for predicting promoter-centered chromatin interactions. RESULTS We develop a supervised multi-modal deep learning model, which utilizes a comprehensive set of features such as genomic sequence, epigenetic signal, anchor distance, evolutionary features and DNA structural features to predict tissue/cell type-specific PE and PP interactions. We further extend the deep learning model in a multi-task learning and a transfer learning framework and demonstrate that the proposed approach outperforms state-of-the-art deep learning methods. Moreover, the proposed approach can achieve comparable prediction performance using predefined biologically relevant tissues/cell types compared to using all tissues/cell types in the pretraining especially for predicting PE interactions. The prediction performance can be further improved by using computationally inferred biologically relevant tissues/cell types in the pretraining, which are defined based on the common genes in the proximity of two anchors in the chromatin interactions. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/DeepPHiC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aman Agarwal
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Li Chen
- To whom correspondence should be addressed.
| |
Collapse
|
14
|
Nicoletti C. Methods for the Differential Analysis of Hi-C Data. Methods Mol Biol 2022; 2301:61-95. [PMID: 34415531 DOI: 10.1007/978-1-0716-1390-0_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The 3D organization of chromatin within the nucleus enables dynamic regulation and cell type-specific transcription of the genome. This is true at multiple levels of resolution: on a large scale, with chromosomes occupying distinct volumes (chromosome territories); at the level of individual chromatin fibers, which are organized into compartmentalized domains (e.g., Topologically Associating Domains-TADs), and at the level of short-range chromatin interactions between functional elements of the genome (e.g., enhancer-promoter loops).The widespread availability of Chromosome Conformation Capture (3C)-based high-throughput techniques has been instrumental in advancing our knowledge of chromatin nuclear organization. In particular, Hi-C has the potential to achieve the most comprehensive characterization of chromatin 3D interactions, as it is theoretically able to detect any pair of restriction fragments connected as a result of ligation by proximity.This chapter will illustrate how to compare the chromatin interactome in different experimental conditions, starting from pre-computed Hi-C contact matrices, how to visualize the results, and how to correlate the observed variations in chromatin interaction strength with changes in gene expression.
Collapse
Affiliation(s)
- Chiara Nicoletti
- Development, Aging and Regeneration Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA, USA.
| |
Collapse
|
15
|
Mohanta TK, Mishra AK, Al-Harrasi A. The 3D Genome: From Structure to Function. Int J Mol Sci 2021; 22:11585. [PMID: 34769016 PMCID: PMC8584255 DOI: 10.3390/ijms222111585] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 01/09/2023] Open
Abstract
The genome is the most functional part of a cell, and genomic contents are organized in a compact three-dimensional (3D) structure. The genome contains millions of nucleotide bases organized in its proper frame. Rapid development in genome sequencing and advanced microscopy techniques have enabled us to understand the 3D spatial organization of the genome. Chromosome capture methods using a ligation approach and the visualization tool of a 3D genome browser have facilitated detailed exploration of the genome. Topologically associated domains (TADs), lamin-associated domains, CCCTC-binding factor domains, cohesin, and chromatin structures are the prominent identified components that encode the 3D structure of the genome. Although TADs are the major contributors to 3D genome organization, they are absent in Arabidopsis. However, a few research groups have reported the presence of TAD-like structures in the plant kingdom.
Collapse
Affiliation(s)
- Tapan Kumar Mohanta
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa 616, Oman
| | - Awdhesh Kumar Mishra
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Gyeongsangbuk-do, Korea; or
| | - Ahmed Al-Harrasi
- Natural and Medical Sciences Research Center, University of Nizwa, Nizwa 616, Oman
| |
Collapse
|
16
|
Wu H, Wu Y, Jiang Y, Zhou B, Zhou H, Chen Z, Xiong Y, Liu Q, Zhang H. scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Brief Bioinform 2021; 23:6374065. [PMID: 34553746 DOI: 10.1093/bib/bbab396] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yuhong Jiang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhongli Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
17
|
Jerkovic I, Cavalli G. Understanding 3D genome organization by multidisciplinary methods. Nat Rev Mol Cell Biol 2021; 22:511-528. [PMID: 33953379 DOI: 10.1038/s41580-021-00362-w] [Citation(s) in RCA: 184] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2021] [Indexed: 02/03/2023]
Abstract
Understanding how chromatin is folded in the nucleus is fundamental to understanding its function. Although 3D genome organization has been historically difficult to study owing to a lack of relevant methodologies, major technological breakthroughs in genome-wide mapping of chromatin contacts and advances in imaging technologies in the twenty-first century considerably improved our understanding of chromosome conformation and nuclear architecture. In this Review, we discuss methods of 3D genome organization analysis, including sequencing-based techniques, such as Hi-C and its derivatives, Micro-C, DamID and others; microscopy-based techniques, such as super-resolution imaging coupled with fluorescence in situ hybridization (FISH), multiplex FISH, in situ genome sequencing and live microscopy methods; and computational and modelling approaches. We describe the most commonly used techniques and their contribution to our current knowledge of nuclear architecture and, finally, we provide a perspective on up-and-coming methods that open possibilities for future major discoveries.
Collapse
Affiliation(s)
- Ivana Jerkovic
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Giacomo Cavalli
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France.
| |
Collapse
|
18
|
Abstract
The spatial organization of the genome in the cell nucleus is pivotal to cell function. However, how the 3D genome organization and its dynamics influence cellular phenotypes remains poorly understood. The very recent development of single-cell technologies for probing the 3D genome, especially single-cell Hi-C (scHi-C), has ushered in a new era of unveiling cell-to-cell variability of 3D genome features at an unprecedented resolution. Here, we review recent developments in computational approaches to the analysis of scHi-C, including data processing, dimensionality reduction, imputation for enhancing data quality, and the revealing of 3D genome features at single-cell resolution. While much progress has been made in computational method development to analyze single-cell 3D genomes, substantial future work is needed to improve data interpretation and multimodal data integration, which are critical to reveal fundamental connections between genome structure and function among heterogeneous cell populations in various biological contexts.
Collapse
Affiliation(s)
- Tianming Zhou
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Ruochi Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA;
| |
Collapse
|
19
|
Meng XH, Xiao HM, Deng HW. Combining artificial intelligence: deep learning with Hi-C data to predict the functional effects of non-coding variants. Bioinformatics 2021; 37:1339-1344. [PMID: 33196774 DOI: 10.1093/bioinformatics/btaa970] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 09/12/2020] [Accepted: 11/05/2020] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Although genome-wide association studies (GWASs) have identified thousands of variants for various traits, the causal variants and the mechanisms underlying the significant loci are largely unknown. In this study, we aim to predict non-coding variants that may functionally affect translation initiation through long-range chromatin interaction. RESULTS By incorporating the Hi-C data, we propose a novel and powerful deep learning model of artificial intelligence to classify interacting and non-interacting fragment pairs and predict the functional effects of sequence alteration of single nucleotide on chromatin interaction and thus on gene expression. The changes in chromatin interaction probability between the reference sequence and the altered sequence reflect the degree of functional impact for the variant. The model was effective and efficient with the classification of interacting and non-interacting fragment pairs. The predicted causal SNPs that had a larger impact on chromatin interaction were more likely to be identified by GWAS and eQTL analyses. We demonstrate that an integrative approach combining artificial intelligence-deep learning with high throughput experimental evidence of chromatin interaction leads to prioritizing the functional variants in disease- and phenotype-related loci and thus will greatly expedite uncover of the biological mechanism underlying the association identified in genomic studies. AVAILABILITY AND IMPLEMENTATION Source code used in data preparing and model training is available at the GitHub website (https://github.com/biocai/DeepHiC). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiang-He Meng
- Centers of System Biology, Data Information and Reproductive Health, School of Basic Medical Science, Central South University, Changsha, Hunan 410008, China.,Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.,Centers of System Biology, Data Information and Reproductive Health, Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410081, China
| | - Hong-Mei Xiao
- Centers of System Biology, Data Information and Reproductive Health, School of Basic Medical Science, Central South University, Changsha, Hunan 410008, China
| | - Hong-Wen Deng
- Centers of System Biology, Data Information and Reproductive Health, School of Basic Medical Science, Central South University, Changsha, Hunan 410008, China.,Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA.,Centers of System Biology, Data Information and Reproductive Health, Laboratory of Molecular and Statistical Genetics, College of Life Sciences, Hunan Normal University, Changsha, Hunan 410081, China
| |
Collapse
|
20
|
O'Donoghue SI. Grand Challenges in Bioinformatics Data Visualization. FRONTIERS IN BIOINFORMATICS 2021; 1:669186. [PMID: 36303723 PMCID: PMC9581027 DOI: 10.3389/fbinf.2021.669186] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/30/2021] [Indexed: 01/17/2023] Open
Affiliation(s)
- Seán I. O'Donoghue
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, NSW, Australia
- CSIRO Data61, Eveleigh, NSW, Australia
| |
Collapse
|
21
|
Li T, Li R, Dong X, Shi L, Lin M, Peng T, Wu P, Liu Y, Li X, He X, Han X, Kang B, Wang Y, Liu Z, Chen Q, Shen Y, Feng M, Wang X, Wu D, Wang J, Li C. Integrative Analysis of Genome, 3D Genome, and Transcriptome Alterations of Clinical Lung Cancer Samples. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:741-753. [PMID: 34116262 PMCID: PMC9170781 DOI: 10.1016/j.gpb.2020.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 03/28/2020] [Accepted: 06/11/2020] [Indexed: 10/31/2022]
Abstract
Genomic studies of cancer cell alterations, such as mutations, copy number variations (CNVs), and translocations, greatly promote our understanding of the genesis and development of cancer. However, the 3D genome architecture of cancers remains less studied due to the complexity of cancer genomes and technical difficulties. To explore the 3D genome structure in clinical lung cancer, we performed Hi-C experiments using paired normal and tumor cells harvested from patients with lung cancer, combining with RNA-seq analysis. We demonstrated the feasibility of studying 3D genome of clinical lung cancer samples with a small number of cells (1 × 104), compared the genome architecture between clinical samples and cell lines of lung cancer, and identified conserved and changed spatial chromatin structures between normal and cancer samples. We also showed that Hi-C data can be used to infer CNVs and point mutations in cancer. By integrating those different types of cancer alterations, we showed significant associations between CNVs, 3D genome, and gene expression. We propose that 3D genome mediates the effects of cancer genomic alterations on gene expression through altering regulatory chromatin structures. Our study highlights the importance of analyzing 3D genomes of clinical cancer samples in addition to cancer cell lines and provides an integrative genomic analysis pipeline for future larger-scale studies in lung cancer and other cancers.
Collapse
Affiliation(s)
- Tingting Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China; State Key Laboratory of Proteomics, National Center of Biomedical Analysis, Institute of Basic Medical Sciences, Beijing 100850, China
| | - Ruifeng Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Xuan Dong
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Lin Shi
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China; Fudan University Center for Clinical Bioinformatics, Shanghai 200433, China
| | - Miao Lin
- Department of Thoracic Surgery, Zhongshan Hospital of Fudan University, Shanghai 200032, China
| | - Ting Peng
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Pengze Wu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Yuting Liu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Xiaoting Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China; School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xuheng He
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Xu Han
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Bin Kang
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yinan Wang
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Zhiheng Liu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Qing Chen
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Yue Shen
- BGI-Shenzhen, Shenzhen 518083, China; BGI-Qingdao, Qingdao 266426, China; Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, BGI-Shenzhen, Shenzhen 518083, China
| | - Mingxiang Feng
- Department of Thoracic Surgery, Zhongshan Hospital of Fudan University, Shanghai 200032, China
| | - Xiangdong Wang
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China; Fudan University Center for Clinical Bioinformatics, Shanghai 200433, China
| | - Duojiao Wu
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China.
| | - Jian Wang
- iCarbonX, Shenzhen 518053, China; Digital Life Research Institute, Shenzhen 518110, China.
| | - Cheng Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China.
| |
Collapse
|
22
|
Wu H, Wang X, Chu M, Li D, Cheng L, Zhou K. HCMB: A stable and efficient algorithm for processing the normalization of highly sparse Hi-C contact data. Comput Struct Biotechnol J 2021; 19:2637-2645. [PMID: 34025950 PMCID: PMC8120939 DOI: 10.1016/j.csbj.2021.04.064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/11/2021] [Accepted: 04/24/2021] [Indexed: 11/17/2022] Open
Abstract
The high-throughput genome-wide chromosome conformation capture (Hi-C) method has recently become an important tool to study chromosomal interactions where one can extract meaningful biological information including P(s) curve, topologically associated domains, A/B compartments, and other biologically relevant signals. Normalization is a critical pre-processing step of downstream analyses for the elimination of systematic and technical biases from chromatin contact matrices due to different mappability, GC content, and restriction fragment lengths. Especially, the problem of high sparsity puts forward a huge challenge on the correction, indicating the urgent need for a stable and efficient method for Hi-C data normalization. Recently, some matrix balancing methods have been developed to normalize Hi-C data, such as the Knight-Ruiz (KR) algorithm, but it failed to normalize contact matrices with high sparsity. Here, we presented an algorithm, Hi-C Matrix Balancing (HCMB), based on an iterative solution of equations, combining with linear search and projection strategy to normalize the Hi-C original interaction data. Both the simulated and experimental data demonstrated that HCMB is robust and efficient in normalizing Hi-C data and preserving the biologically relevant Hi-C features even facing very high sparsity. HCMB is implemented in Python and is freely accessible to non-commercial users at GitHub: https://github.com/HUST-DataMan/HCMB.
Collapse
Affiliation(s)
- Honglong Wu
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Xuebin Wang
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Mengtian Chu
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Dongfang Li
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
- BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China
| | - Lixin Cheng
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen 518020, China
| | - Ke Zhou
- Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, Hubei 430000, China
| |
Collapse
|
23
|
Holgersen EM, Gillespie A, Leavy OC, Baxter JS, Zvereva A, Muirhead G, Johnson N, Sipos O, Dryden NH, Broome LR, Chen Y, Kozin I, Dudbridge F, Fletcher O, Haider S. Identifying high-confidence capture Hi-C interactions using CHiCANE. Nat Protoc 2021; 16:2257-2285. [PMID: 33837305 DOI: 10.1038/s41596-021-00498-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 01/12/2021] [Indexed: 02/07/2023]
Abstract
The ability to identify regulatory interactions that mediate gene expression changes through distal elements, such as risk loci, is transforming our understanding of how genomes are spatially organized and regulated. Capture Hi-C (CHi-C) is a powerful tool to delineate such regulatory interactions. However, primary analysis and downstream interpretation of CHi-C profiles remains challenging and relies on disparate tools with ad-hoc input/output formats and specific assumptions for statistical modeling. Here we present a data processing and interaction calling toolkit (CHiCANE), specialized for the analysis and meaningful interpretation of CHi-C assays. In this protocol, we demonstrate applications of CHiCANE to region capture Hi-C (rCHi-C) and promoter capture Hi-C (pCHi-C) libraries, followed by quality assessment of interaction peaks, as well as downstream analysis specific to rCHi-C and pCHi-C to aid functional interpretation. For a typical rCHi-C/pCHi-C dataset this protocol takes up to 3 d for users with a moderate understanding of R programming and statistical concepts, although this is dependent on dataset size and compute power available. CHiCANE is freely available at https://cran.r-project.org/web/packages/chicane .
Collapse
Affiliation(s)
- Erle M Holgersen
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Andrea Gillespie
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Olivia C Leavy
- Department of Non-communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK.,Department of Health Sciences, University of Leicester, Leicester, UK
| | - Joseph S Baxter
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Alisa Zvereva
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Gareth Muirhead
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Nichola Johnson
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Orsolya Sipos
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Nicola H Dryden
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Laura R Broome
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Yi Chen
- Scientific Computing, The Institute of Cancer Research, London, UK
| | - Igor Kozin
- Scientific Computing, The Institute of Cancer Research, London, UK
| | - Frank Dudbridge
- Department of Health Sciences, University of Leicester, Leicester, UK
| | - Olivia Fletcher
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK.
| | - Syed Haider
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK.
| |
Collapse
|
24
|
Lin D, Sanders J, Noble WS. HiCRep.py: fast comparison of Hi-C contact matrices in Python. Bioinformatics 2021; 37:2996-2997. [PMID: 33576390 PMCID: PMC8479650 DOI: 10.1093/bioinformatics/btab097] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 12/17/2020] [Accepted: 02/08/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Hi-C is the most widely used assay for investigating genome-wide 3D organization of chromatin. When working with Hi-C data, it is often useful to calculate the similarity between contact matrices in order to assess experimental reproducibility or to quantify relationships among Hi-C data from related samples. The HiCRep algorithm has been widely adopted for this task, but the existing R implementation suffers from run time limitations on high-resolution Hi-C data or on large single-cell Hi-C datasets. RESULTS We introduce a Python implementation of HiCRep and demonstrate that it is much faster and consumes much less memory than the existing R implementation. Furthermore, we give examples of HiCRep's ability to accurately distinguish replicates from non-replicates and to reveal cell type structure among collections of Hi-C data. AVAILABILITY AND IMPLEMENTATION HiCRep.py and its documentation are available with a GPL license at https://github.com/Noble-Lab/hicrep. The software may be installed automatically using the pip package installer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dejun Lin
- Department of Genome Sciences, University of Washington, Seattle, WA 98040, USA
| | - Justin Sanders
- Department of Computer Science, Brown University, Providence, RI 02912, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98040, USA,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98040, USA,To whom correspondence should be addressed.
| |
Collapse
|
25
|
Kashangura C. Artificial intelligence enhanced molecular databases can enable improved user-friendly bioinformatics and pave the way for novel applications. S AFR J SCI 2021. [DOI: 10.17159/sajs.2021/8151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
|
26
|
Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nat Genet 2021; 53:110-119. [PMID: 33349701 PMCID: PMC8053422 DOI: 10.1038/s41588-020-00745-3] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 11/02/2020] [Indexed: 01/28/2023]
Abstract
Expression quantitative trait loci (eQTLs) studies provide associations of genetic variants with gene expression but fall short of pinpointing functionally important eQTLs. Here, using H3K27ac HiChIP assays, we mapped eQTLs overlapping active cis-regulatory elements that interact with their target gene promoters (promoter-interacting eQTLs, pieQTLs) in five common immune cell types (Database of Immune Cell Expression, Expression quantitative trait loci and Epigenomics (DICE) cis-interactome project). This approach allowed us to identify functionally important eQTLs and show mechanisms that explain their cell-type restriction. We also devised an approach to eQTL discovery that relies on HiChIP-based promoter interaction maps as a structural framework for deciding which SNPs to test for association with gene expression, and observe ultra-long-distance pieQTLs (>1 megabase away), including several disease-risk variants. We validated the functional role of pieQTLs using reporter assays, CRISPRi, dCas9-tiling guides and Cas9-mediated base-pair editing. In this article we present a method for functional eQTL discovery and provide insights into relevance of noncoding variants for cell-specific gene regulation and for disease association beyond conventional eQTL mapping.
Collapse
|
27
|
Kruse K, Hug CB, Vaquerizas JM. FAN-C: a feature-rich framework for the analysis and visualisation of chromosome conformation capture data. Genome Biol 2020; 21:303. [PMID: 33334380 PMCID: PMC7745377 DOI: 10.1186/s13059-020-02215-9] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 11/30/2020] [Indexed: 01/01/2023] Open
Abstract
Chromosome conformation capture data, particularly from high-throughput approaches such as Hi-C, are typically very complex to analyse. Existing analysis tools are often single-purpose, or limited in compatibility to a small number of data formats, frequently making Hi-C analyses tedious and time-consuming. Here, we present FAN-C, an easy-to-use command-line tool and powerful Python API with a broad feature set covering matrix generation, analysis, and visualisation for C-like data ( https://github.com/vaquerizaslab/fanc ). Due to its compatibility with the most prevalent Hi-C storage formats, FAN-C can be used in combination with a large number of existing analysis tools, thus greatly simplifying Hi-C matrix analysis.
Collapse
Affiliation(s)
- Kai Kruse
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany
| | - Clemens B Hug
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany
| | - Juan M Vaquerizas
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, 48149, Muenster, Germany.
- MRC London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, Du Cane Road, London, W12 0NN, UK.
| |
Collapse
|
28
|
Roayaei Ardakany A, Gezer HT, Lonardi S, Ay F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol 2020; 21:256. [PMID: 32998764 PMCID: PMC7528378 DOI: 10.1186/s13059-020-02167-0] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 09/16/2020] [Indexed: 12/20/2022] Open
Abstract
We present MUSTACHE, a new method for multi-scale detection of chromatin loops from Hi-C and Micro-C contact maps. MUSTACHE employs scale-space theory, a technical advance in computer vision, to detect blob-shaped objects in contact maps. MUSTACHE is scalable to kilobase-resolution maps and reports loops that are highly consistent between replicates and between Hi-C and Micro-C datasets. Compared to other loop callers, such as HiCCUPS and SIP, MUSTACHE recovers a higher number of published ChIA-PET and HiChIP loops as well as loops linking promoters to regulatory elements. Overall, MUSTACHE enables an efficient and comprehensive analysis of chromatin loops. Available at: https://github.com/ay-lab/mustache .
Collapse
Affiliation(s)
- Abbas Roayaei Ardakany
- Centers for Autoimmunity and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, 92037 CA USA
- Computer Science and Engineering, University of California, Riverside, Riverside, 92521 CA USA
| | - Halil Tuvan Gezer
- Centers for Autoimmunity and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, 92037 CA USA
- Computer Science and Engineering, Sabanci University, Tuzla, Istanbul, 34956 Turkey
| | - Stefano Lonardi
- Computer Science and Engineering, University of California, Riverside, Riverside, 92521 CA USA
| | - Ferhat Ay
- Centers for Autoimmunity and Cancer Immunotherapy, La Jolla Institute for Immunology, La Jolla, 92037 CA USA
- School of Medicine, University of California, San Diego, San Diego, 92093 CA USA
| |
Collapse
|
29
|
Oluwadare O, Highsmith M, Turner D, Lieberman Aiden E, Cheng J. GSDB: a database of 3D chromosome and genome structures reconstructed from Hi-C data. BMC Mol Cell Biol 2020; 21:60. [PMID: 32758136 PMCID: PMC7405446 DOI: 10.1186/s12860-020-00304-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 07/29/2020] [Indexed: 11/10/2022] Open
Abstract
Advances in the study of chromosome conformation capture technologies, such as Hi-C technique - capable of capturing chromosomal interactions in a genome-wide scale - have led to the development of three-dimensional chromosome and genome structure reconstruction methods from Hi-C data. The three dimensional genome structure is important because it plays a role in a variety of important biological activities such as DNA replication, gene regulation, genome interaction, and gene expression. In recent years, numerous Hi-C datasets have been generated, and likewise, a number of genome structure construction algorithms have been developed. In this work, we outline the construction of a novel Genome Structure Database (GSDB) to create a comprehensive repository that contains 3D structures for Hi-C datasets constructed by a variety of 3D structure reconstruction tools. The GSDB contains over 50,000 structures from 12 state-of-the-art Hi-C data structure prediction algorithms for 32 Hi-C datasets. GSDB functions as a centralized collection of genome structures which will enable the exploration of the dynamic architectures of chromosomes and genomes for biomedical research. GSDB is accessible at http://sysbio.rnet.missouri.edu/3dgenome/GSDB
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Douglass Turner
- Elastic Image Software LLC, 21 Walnut Street, Lexington, MA, 02421, USA
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
30
|
Kong S, Li Q, Zhang G, Li Q, Huang Q, Huang L, Zhang H, Huang Y, Peng Y, Qin B, Zhang Y. Exonuclease combinations reduce noises in 3D genomics technologies. Nucleic Acids Res 2020; 48:e44. [PMID: 32128590 PMCID: PMC7192622 DOI: 10.1093/nar/gkaa106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 02/04/2020] [Accepted: 02/19/2020] [Indexed: 12/21/2022] Open
Abstract
Chromosome conformation-capture technologies are widely used in 3D genomics; however, experimentally, such methods have high-noise limitations and, therefore, require significant bioinformatics efforts to extract reliable distal interactions. Miscellaneous undesired linear DNAs, present during proximity-ligation, represent a main noise source, which needs to be minimized or eliminated. In this study, different exonuclease combinations were tested to remove linear DNA fragments from a circularized DNA preparation. This method efficiently removed linear DNAs, raised the proportion of annulation and increased the valid-pairs ratio from ∼40% to ∼80% for enhanced interaction detection in standard Hi-C. This strategy is applicable for development of various 3D genomics technologies, or optimization of Hi-C sequencing efficiency.
Collapse
Affiliation(s)
- Siyuan Kong
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Qing Li
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Gaolin Zhang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Qiujia Li
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Qitong Huang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lei Huang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hui Zhang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yinghua Huang
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yanling Peng
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Baoming Qin
- CAS Key Laboratory of Regenerative Biology, Guangdong Provincial Key Laboratory of Stem Cell and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou 510530, China
| | - Yubo Zhang
- Animal Functional Genomics Group, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
31
|
Zheng Y, Zhou P, Keleş S. FreeHi-C spike-in simulations for benchmarking differential chromatin interaction detection. Methods 2020; 189:3-11. [PMID: 32663510 DOI: 10.1016/j.ymeth.2020.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/23/2020] [Accepted: 07/03/2020] [Indexed: 11/16/2022] Open
Abstract
High-throughput genome-wide chromatin conformation capture assay (Hi-C) is routinely used to profile long-range genomic interactions and three-dimensional organization of genomes. A key application of Hi-C is the comparative analysis of genomic interactions across different time points, cellular conditions, or multiple stimuli. While operating characteristics of methods for Hi-C data processing such as normalization, pairwise interaction and higher-order organization detection have been relatively well studied, properties of methods for differential chromatin interaction detection are less investigated. We have recently developed FreeHi-C to enable data-driven non-parametric simulations from Hi-C experiments. Here, we extend FreeHi-C with a user/data-driven spike-in module to facilitate comparisons of differential chromatin interaction detection methods where the ground truth differential chromatin interactions are known under a wide variety of settings. We use FreeHi-C to benchmark four differential chromatin interaction detection methods, namely HiCcompare, multiHiCcompare, diffHic, and Selfish, using three comparative analysis settings with different sequencing depths and spike-in proportions. This comparison reveals distinguished performances in terms of the standard metrics such as the false discovery rate control, detection power, significance order, precision-recall curve, and receiver operating characteristic curve as well as overall genomic properties of the types of differential chromatin interactions detectable by each method. Furthermore, it highlights the lack of power for all methods in small replication settings.
Collapse
Affiliation(s)
- Ye Zheng
- Biostatistics, Bioinformatics and Epidemiology Program, Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA.
| | - Peigen Zhou
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA.
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin - Madison, Madison, WI 53706, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53706, USA.
| |
Collapse
|
32
|
Li FZ, Liu ZE, Li XY, Bu LM, Bu HX, Liu H, Zhang CM. Chromatin 3D structure reconstruction with consideration of adjacency relationship among genomic loci. BMC Bioinformatics 2020; 21:272. [PMID: 32611376 PMCID: PMC7329537 DOI: 10.1186/s12859-020-03612-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 06/18/2020] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Chromatin 3D conformation plays important roles in regulating gene or protein functions. High-throughout chromosome conformation capture (3C)-based technologies, such as Hi-C, have been exploited to acquire the contact frequencies among genomic loci at genome-scale. Various computational tools have been proposed to recover the underlying chromatin 3D structures from in situ Hi-C contact map data. As connected residuals in a polymer, neighboring genomic loci have intrinsic mutual dependencies in building a 3D conformation. However, current methods seldom take this feature into account. RESULTS We present a method called ShNeigh, which combines the classical MDS technique with local dependence of neighboring loci modeled by a Gaussian formula, to infer the best 3D structure from noisy and incomplete contact frequency matrices. We validated ShNeigh by comparing it to two typical distance-based algorithms, ShRec3D and ChromSDE. The comparison results on simulated Hi-C dataset showed that, while keeping the high-speed nature of classical MDS, ShNeigh can recover the true structure better than ShRec3D and ChromSDE. Meanwhile, ShNeigh is more robust to data noise. On the publicly available human GM06990 Hi-C data, we demonstrated that the structures reconstructed by ShNeigh are more reproducible between different restriction enzymes than by ShRec3D and ChromSDE, especially at high resolutions manifested by sparse contact maps, which means ShNeigh is more robust to signal coverage. CONCLUSIONS Our method can recover stable structures in high noise and sparse signal settings. It can also reconstruct similar structures from Hi-C data obtained using different restriction enzymes. Therefore, our method provides a new direction for enhancing the reconstruction quality of chromatin 3D structures.
Collapse
Affiliation(s)
- Fang-Zhen Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China. .,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China.
| | - Zhi-E Liu
- College of Physics and Electronic Engineering, Qilu Normal University, Jinan, China
| | - Xiu-Yuan Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Li-Mei Bu
- Department of Gastroenterology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Shanghai, China
| | - Hong-Xia Bu
- Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Hui Liu
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| | - Cai-Ming Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| |
Collapse
|
33
|
Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat Protoc 2020; 15:991-1012. [PMID: 31980751 DOI: 10.1038/s41596-019-0273-0] [Citation(s) in RCA: 122] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 11/27/2019] [Indexed: 11/08/2022]
Abstract
Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes ~12 h with preprocessing when all use cases are run sequentially (~4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (~48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index.
Collapse
|
34
|
Abstract
The invention of Hi-C has greatly facilitated 3D genome research through an unbiased probing of 3D chromatin interactions. It produces enormous amount of sequencing data that capture multiscale chromatin conformation structures. In the last decade, numerous computational methods have been developed to analyze Hi-C data and predict A/B compartments, topologically associating domains (TADs), and significant chromatin contacts. This chapter introduced the iHiC package that provides several utilities to facilitate Hi-C data analysis with public software and demonstrated its application to a Hi-C dataset generated for mouse embryonic stem (ES) cells.
Collapse
|
35
|
Ben-Elazar S, Chor B, Yakhini Z. The Functional 3D Organization of Unicellular Genomes. Sci Rep 2019; 9:12734. [PMID: 31484964 PMCID: PMC6726614 DOI: 10.1038/s41598-019-48798-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/12/2019] [Indexed: 11/09/2022] Open
Abstract
Genome conformation capture techniques permit a systematic investigation into the functional spatial organization of genomes, including functional aspects like assessing the co-localization of sets of genomic elements. For example, the co-localization of genes targeted by a transcription factor (TF) within a transcription factory. We quantify spatial co-localization using a rigorous statistical model that measures the enrichment of a subset of elements in neighbourhoods inferred from Hi-C data. We also control for co-localization that can be attributed to genomic order. We systematically apply our open-sourced framework, spatial-mHG, to search for spatial co-localization phenomena in multiple unicellular Hi-C datasets with corresponding genomic annotations. Our biological findings shed new light on the functional spatial organization of genomes, including: In C. crescentus, DNA replication genes reside in two genomic clusters that are spatially co-localized. Furthermore, these clusters contain similar gene copies and lay in genomic vicinity to the ori and ter sequences. In S. cerevisae, Ty5 retrotransposon family element spatially co-localize at a spatially adjacent subset of telomeres. In N. crassa, both Proteasome lid subcomplex genes and protein refolding genes jointly spatially co-localize at a shared location. An implementation of our algorithms is available online.
Collapse
|
36
|
Di Filippo L, Righelli D, Gagliardi M, Matarazzo MR, Angelini C. HiCeekR: A Novel Shiny App for Hi-C Data Analysis. Front Genet 2019; 10:1079. [PMID: 31749839 PMCID: PMC6844183 DOI: 10.3389/fgene.2019.01079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/09/2019] [Indexed: 01/14/2023] Open
Abstract
The High-throughput Chromosome Conformation Capture (Hi-C) technique combines the power of the Next Generation Sequencing technologies with chromosome conformation capture approach to study the 3D chromatin organization at the genome-wide scale. Although such a technique is quite recent, many tools are already available for pre-processing and analyzing Hi-C data, allowing to identify chromatin loops, topological associating domains and A/B compartments. However, only a few of them provide an exhaustive analysis pipeline or allow to easily integrate and visualize other omic layers. Moreover, most of the available tools are designed for expert users, who have great confidence with command-line applications. In this paper, we present HiCeekR (https://github.com/lucidif/HiCeekR), a novel R Graphical User Interface (GUI) that allows researchers to easily perform a complete Hi-C data analysis. With the aid of the Shiny libraries, it integrates several R/Bioconductor packages for Hi-C data analysis and visualization, guiding the user during the entire process. Here, we describe its architecture and functionalities, then illustrate its capabilities using a publicly available dataset.
Collapse
Affiliation(s)
- Lucio Di Filippo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Dario Righelli
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Miriam Gagliardi
- Max Planck Institute for Psychiatry, Munich, Germany.,Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Maria Rosaria Matarazzo
- Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| |
Collapse
|
37
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|
38
|
Sauerwald N, Singhal A, Kingsford C. Analysis of the structural variability of topologically associated domains as revealed by Hi-C. NAR Genom Bioinform 2019; 2. [PMID: 31687663 PMCID: PMC6824515 DOI: 10.1093/nargab/lqz008] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Three-dimensional chromosome structure plays an integral role in gene expression and regulation, replication timing, and other cellular processes. Topologically associated domains (TADs), building blocks of chromosome structure, are genomic regions with higher contact frequencies within the region than outside the region. A central question is the degree to which TADs are conserved or vary between conditions. We analyze 137 Hi-C samples from 9 studies under 3 measures to quantify the effects of various sources of biological and experimental variation. We observe significant variation in TAD sets between both non-replicate and replicate samples, and provide initial evidence that this variability does not come from genetic sequence differences. The effects of experimental protocol differences are also measured, demonstrating that samples can have protocol-specific structural changes, but that TADs are generally robust to lab-specific differences. This study represents a systematic quantification of key factors influencing comparisons of chromosome structure, suggesting significant variability and the potential for cell-type-specific structural features, which has previously not been systematically explored. The lack of observed influence of heredity and genetic differences on chromosome structure suggests that factors other than the genetic sequence are driving this structure, which plays an important role in human disease and cellular functioning.
Collapse
Affiliation(s)
- Natalie Sauerwald
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Akshat Singhal
- Department of Computer Science, Stony Brook University, Stony Brook, NY 11790, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
39
|
Ing-Simmons E, Vaquerizas JM. Visualising three-dimensional genome organisation in two dimensions. Development 2019; 146:146/19/dev177162. [PMID: 31558569 DOI: 10.1242/dev.177162] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The three-dimensional organisation of the genome plays a crucial role in developmental gene regulation. In recent years, techniques to investigate this organisation have become more accessible to labs worldwide due to improvements in protocols and decreases in the cost of high-throughput sequencing. However, the resulting datasets are complex and can be challenging to analyse and interpret. Here, we provide a guide to visualisation approaches that can aid the interpretation of such datasets and the communication of biological results.
Collapse
Affiliation(s)
- Elizabeth Ing-Simmons
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, DE-48149 Muenster, Germany
| | - Juan M Vaquerizas
- Max Planck Institute for Molecular Biomedicine, Roentgenstrasse 20, DE-48149 Muenster, Germany
| |
Collapse
|
40
|
Identification of significant chromatin contacts from HiChIP data by FitHiChIP. Nat Commun 2019; 10:4221. [PMID: 31530818 PMCID: PMC6748947 DOI: 10.1038/s41467-019-11950-y] [Citation(s) in RCA: 129] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 08/14/2019] [Indexed: 02/06/2023] Open
Abstract
HiChIP/PLAC-seq is increasingly becoming popular for profiling 3D chromatin contacts among regulatory elements and for annotating functions of genetic variants. Here we describe FitHiChIP, a computational method for loop calling from HiChIP/PLAC-seq data, which jointly models the non-uniform coverage and genomic distance scaling of contact counts to compute statistical significance estimates. We also develop a technique to filter putative bystander loops that can be explained by stronger adjacent loops. Compared to existing methods, FitHiChIP performs better in recovering contacts reported by Hi-C, promoter capture Hi-C and ChIA-PET experiments and in capturing previously validated promoter-enhancer interactions. FitHiChIP loop calls are reproducible among replicates and are consistent across different experimental settings. Our work also provides a framework for differential HiChIP analysis with an option to utilize ChIP-seq data for further characterizing differential loops. Even though designed for HiChIP, FitHiChIP is also applicable to other conformation capture assays. HiChIP/PLAC-seq assay is popular for profiling 3D genome interactions among regulatory elements at kilobase resolution. Here the authors describe FitHiChIP an empirical null-based, flexible computational method for statistical significance estimation and loop calling from HiChIP data.
Collapse
|
41
|
Malik L, Patro R. Rich Chromatin Structure Prediction from Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1448-1458. [PMID: 29994683 DOI: 10.1109/tcbb.2018.2851200] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent studies involving the 3-dimensional conformation of chromatin have revealed the important role it has to play in different processes within the cell. These studies have also led to the discovery of densely interacting segments of the chromosome, called topologically associating domains. The accurate identification of these domains from Hi-C interaction data is an interesting and important computational problem for which numerous methods have been proposed. Unfortunately, most existing algorithms designed to identify these domains assume that they are non-overlapping whereas there is substantial evidence to believe a nested structure exists. We present a methodology to predict hierarchical chromatin domains using chromatin conformation capture data. Our method predicts domains at different resolutions, calculated using intrinsic properties of the chromatin data, and effectively clusters these to construct the hierarchy. At each individual level, the domains are non-overlapping in such a way that the intra-domain interaction frequencies are maximized. We show that our predicted structure is highly enriched for actively transcribing housekeeping genes and various chromatin markers, including CTCF, around the domain boundaries. We also show that large-scale domains, at multiple resolutions within our hierarchy, are conserved across cell types and species. We also provide comparisons against existing tools for extracting hierarchical domains. Our software, Matryoshka, is written in C++11 and licensed under GPL v3; it is available at https://github.com/COMBINE-lab/matryoshka.
Collapse
|
42
|
Zhu G, Deng W, Hu H, Ma R, Zhang S, Yang J, Peng J, Kaplan T, Zeng J. Reconstructing spatial organizations of chromosomes through manifold learning. Nucleic Acids Res 2019; 46:e50. [PMID: 29408992 PMCID: PMC5934626 DOI: 10.1093/nar/gky065] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2017] [Accepted: 01/23/2018] [Indexed: 01/09/2023] Open
Abstract
Decoding the spatial organizations of chromosomes has crucial implications for studying eukaryotic gene regulation. Recently, chromosomal conformation capture based technologies, such as Hi-C, have been widely used to uncover the interaction frequencies of genomic loci in a high-throughput and genome-wide manner and provide new insights into the folding of three-dimensional (3D) genome structure. In this paper, we develop a novel manifold learning based framework, called GEM (Genomic organization reconstructor based on conformational Energy and Manifold learning), to reconstruct the three-dimensional organizations of chromosomes by integrating Hi-C data with biophysical feasibility. Unlike previous methods, which explicitly assume specific relationships between Hi-C interaction frequencies and spatial distances, our model directly embeds the neighboring affinities from Hi-C space into 3D Euclidean space. Extensive validations demonstrated that GEM not only greatly outperformed other state-of-art modeling methods but also provided a physically and physiologically valid 3D representations of the organizations of chromosomes. Furthermore, we for the first time apply the modeled chromatin structures to recover long-range genomic interactions missing from original Hi-C data.
Collapse
Affiliation(s)
- Guangxiang Zhu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Wenxuan Deng
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Hailin Hu
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Rui Ma
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Sai Zhang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jinglin Yang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tommy Kaplan
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, 91904, Israel
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
43
|
Computational Processing and Quality Control of Hi-C, Capture Hi-C and Capture-C Data. Genes (Basel) 2019; 10:genes10070548. [PMID: 31323892 PMCID: PMC6678864 DOI: 10.3390/genes10070548] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/06/2019] [Accepted: 07/14/2019] [Indexed: 01/08/2023] Open
Abstract
Hi-C, capture Hi-C (CHC) and Capture-C have contributed greatly to our present understanding of the three-dimensional organization of genomes in the context of transcriptional regulation by characterizing the roles of topological associated domains, enhancer promoter loops and other three-dimensional genomic interactions. The analysis is based on counts of chimeric read pairs that map to interacting regions of the genome. However, the processing and quality control presents a number of unique challenges. We review here the experimental and computational foundations and explain how the characteristics of restriction digests, sonication fragments and read pairs can be exploited to distinguish technical artefacts from valid read pairs originating from true chromatin interactions.
Collapse
|
44
|
Golov AK, Ulianov SV, Luzhin AV, Kalabusheva EP, Kantidze OL, Flyamer IM, Razin SV, Gavrilov AA. C-TALE, a new cost-effective method for targeted enrichment of Hi-C/3C-seq libraries. Methods 2019; 170:48-60. [PMID: 31252062 DOI: 10.1016/j.ymeth.2019.06.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Accepted: 06/22/2019] [Indexed: 11/17/2022] Open
Abstract
Studies performed using Hi-C and other high-throughput whole-genome C-methods have demonstrated that 3D organization of eukaryotic genomes is functionally relevant. Unfortunately, ultra-deep sequencing of Hi-C libraries necessary to detect loop structures in large vertebrate genomes remains rather expensive. However, many studies are in fact aimed at determining the fine-scale 3D structure of comparatively small genomic regions up to several Mb in length. Such studies typically focus on the spatial structure of domains of coregulated genes, molecular mechanisms of loop formation, and interrogation of functional significance of GWAS-revealed polymorphisms. Therefore, a handful of molecular techniques based on Hi-C have been developed to address such issues. These techniques commonly rely on in-solution hybridization of Hi-C/3C-seq libraries with pools of biotinylated baits covering the region of interest, followed by deep sequencing of the enriched library. Here, we describe a new protocol of this kind, C-TALE (Chromatin TArget Ligation Enrichment). Preparation of hybridization probes from bacterial artificial chromosomes and an additional round of enrichment make C-TALE a cost-effective alternative to existing many-versus-all C-methods.
Collapse
Affiliation(s)
- Arkadiy K Golov
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia; Mental Health Research Center, Moscow, Russia
| | - Sergey V Ulianov
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia; Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Artem V Luzhin
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ekaterina P Kalabusheva
- Koltzov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, Russia; Pirogov Russian National Research Medical University, Research Institute of Translational Medicine, Department of Regenerative Medicine, Moscow, Russia
| | - Omar L Kantidze
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
| | - Ilya M Flyamer
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
| | - Sergey V Razin
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia; Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Alexey A Gavrilov
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia.
| |
Collapse
|
45
|
Qian M, Cheng Y, Wang X. The methodology study of three-dimensional (3D) genome research. Semin Cell Dev Biol 2019; 90:12-18. [DOI: 10.1016/j.semcdb.2018.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2018] [Accepted: 07/03/2018] [Indexed: 12/12/2022]
|
46
|
Stansfield JC, Tran D, Nguyen T, Dozmorov MG. R Tutorial: Detection of Differentially Interacting Chromatin Regions From Multiple Hi-C Datasets. CURRENT PROTOCOLS IN BIOINFORMATICS 2019; 66:e76. [PMID: 31125519 PMCID: PMC6588411 DOI: 10.1002/cpbi.76] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The three-dimensional (3D) interactions of chromatin regulate cell-type-specific gene expression, recombination, X-chromosome inactivation, and many other genomic processes. High-throughput chromatin conformation capture (Hi-C) technologies capture the structure of the chromatin on a global scale by measuring all-vs.-all interactions and can provide new insights into genomic regulation. The workflow presented here describes how to analyze and interpret a comparative Hi-C experiment. We describe the process of obtaining Hi-C data from public repositories and give suggestions for pre-processing pipelines for users who intend to analyze their own raw data. We then describe the data normalization and comparative analysis process. We present three protocols describing the use of the multiHiCcompare, diffHic, and FIND R packages, respectively, to perform a comparative analysis of Hi-C experiments. Finally, visualization of the results and downstream interpretation of the differentially interacting regions are discussed. The bulk of this tutorial uses the R programming environment, and the processes described can be performed with most operating systems and a single computer. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- John C. Stansfield
- Dept. of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - Duc Tran
- Dept. of Computer Science & Engineering, University of Nevada, Reno, NV, 89557, USA
| | - Tin Nguyen
- Dept. of Computer Science & Engineering, University of Nevada, Reno, NV, 89557, USA
| | - Mikhail G. Dozmorov
- Dept. of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| |
Collapse
|
47
|
Abbas A, He X, Niu J, Zhou B, Zhu G, Ma T, Song J, Gao J, Zhang MQ, Zeng J. Integrating Hi-C and FISH data for modeling of the 3D organization of chromosomes. Nat Commun 2019; 10:2049. [PMID: 31053705 PMCID: PMC6499832 DOI: 10.1038/s41467-019-10005-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 04/12/2019] [Indexed: 12/13/2022] Open
Abstract
The new advances in various experimental techniques that provide complementary information about the spatial conformations of chromosomes have inspired researchers to develop computational methods to fully exploit the merits of individual data sources and combine them to improve the modeling of chromosome structure. Here we propose GEM-FISH, a method for reconstructing the 3D models of chromosomes through systematically integrating both Hi-C and FISH data with the prior biophysical knowledge of a polymer model. Comprehensive tests on a set of chromosomes, for which both Hi-C and FISH data are available, demonstrate that GEM-FISH can outperform previous chromosome structure modeling methods and accurately capture the higher order spatial features of chromosome conformations. Moreover, our reconstructed 3D models of chromosomes revealed interesting patterns of spatial distributions of super-enhancers which can provide useful insights into understanding the functional roles of these super-enhancers in gene regulation.
Collapse
Affiliation(s)
- Ahmed Abbas
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Xuan He
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Jing Niu
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Bin Zhou
- School of Life Science, Tsinghua University, Beijing, 100084, China
| | - Guangxiang Zhu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Tszshan Ma
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Jiangpeikun Song
- School of Life Science, Tsinghua University, Beijing, 100084, China
| | - Juntao Gao
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China
| | - Michael Q Zhang
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China
- Department of Biological Sciences, Center for Systems Biology, the University of Texas at Dallas, Richardson, TX, 75080-3021, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
48
|
Oluwadare O, Highsmith M, Cheng J. An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data. Biol Proced Online 2019; 21:7. [PMID: 31049033 PMCID: PMC6482566 DOI: 10.1186/s12575-019-0094-0] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 04/01/2019] [Indexed: 01/08/2023] Open
Abstract
Over the past decade, methods for predicting three-dimensional (3-D) chromosome and genome structures have proliferated. This has been primarily due to the development of high-throughput, next-generation chromosome conformation capture (3C) technologies, which have provided next-generation sequencing data about chromosome conformations in order to map the 3-D genome structure. The introduction of the Hi-C technique-a variant of the 3C method-has allowed researchers to extract the interaction frequency (IF) for all loci of a genome at high-throughput and at a genome-wide scale. In this review we describe, categorize, and compare the various methods developed to map chromosome and genome structures from 3C data-particularly Hi-C data. We summarize the improvements introduced by these methods, describe the approach used for method evaluation, and discuss how these advancements shape the future of genome structure construction.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
49
|
Zhou Y, Gerrard DL, Wang J, Li T, Yang Y, Fritz AJ, Rajendran M, Fu X, Stein G, Schiff R, Lin S, Frietze S, Jin VX. Temporal dynamic reorganization of 3D chromatin architecture in hormone-induced breast cancer and endocrine resistance. Nat Commun 2019; 10:1522. [PMID: 30944316 PMCID: PMC6447566 DOI: 10.1038/s41467-019-09320-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 02/27/2019] [Indexed: 01/01/2023] Open
Abstract
Recent studies have demonstrated that chromatin architecture is linked to the progression of cancers. However, the roles of 3D structure and its dynamics in hormone-dependent breast cancer and endocrine resistance are largely unknown. Here we report the dynamics of 3D chromatin structure across a time course of estradiol (E2) stimulation in human estrogen receptor α (ERα)-positive breast cancer cells. We identified subsets of temporally highly dynamic compartments predominantly associated with active open chromatin and found that these highly dynamic compartments showed higher alteration in tamoxifen-resistant breast cancer cells. Remarkably, these compartments are characterized by active chromatin states, and enhanced ERα binding but decreased transcription factor CCCTC-binding factor (CTCF) binding. We finally identified a set of ERα-bound promoter-enhancer looping genes enclosed within altered domains that are enriched with cancer invasion, aggressiveness or metabolism signaling pathways. This large-scale analysis expands our understanding of high-order temporal chromatin reorganization underlying hormone-dependent breast cancer.
Collapse
Affiliation(s)
- Yufan Zhou
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Diana L Gerrard
- MLRS Department, University of Vermont, Burlington, VT, 05405, USA
| | - Junbai Wang
- Department of Pathology, Oslo University Hospital-Norwegian Radium Hospital, 0310, Montebello, Oslo, Norway
| | - Tian Li
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Yini Yang
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, 78229, USA
| | - Andrew J Fritz
- Department of Biochemistry, University of Vermont, Burlington, VT, 05405, USA
| | - Mahitha Rajendran
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA.,Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Xiaoyong Fu
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA.,Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Gary Stein
- Department of Surgery, University of Vermont Larner College of Medicine, 89 Beaumont Avenue, Given C401, Burlington, Vermont, 05405, USA
| | - Rachel Schiff
- Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA.,Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, 77030, USA.,Dan L. Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Shili Lin
- Department of Statistics, The Ohio State University, Columbus, OH, 43210, USA
| | - Seth Frietze
- MLRS Department, University of Vermont, Burlington, VT, 05405, USA.
| | - Victor X Jin
- Department of Molecular Medicine, University of Texas Health San Antonio, San Antonio, TX, 78229, USA.
| |
Collapse
|
50
|
Huynh L, Hormozdiari F. TAD fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biol 2019; 20:60. [PMID: 30898144 PMCID: PMC6427865 DOI: 10.1186/s13059-019-1666-7] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 03/01/2019] [Indexed: 11/17/2022] Open
Abstract
Deletions that fuse two adjacent topologically associating domains (TADs) can cause severe developmental disorders. We provide a formal method to quantify deletions based on their potential disruption of the three-dimensional genome structure, denoted as the TAD fusion score. Furthermore, we show that deletions that cause TAD fusion are rare and under negative selection in the general population. Finally, we show that our method correctly gives higher scores to deletions reported to cause various disorders, including developmental disorders and cancer, in comparison to the deletions reported in the 1000 Genomes Project. The TAD fusion score tool is publicly available at https://github.com/HormozdiariLab/TAD-fusion-score .
Collapse
Affiliation(s)
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, USA.
- UC Davis MIND Institute, Sacramento, USA.
- Department of Biochemistry and Molecular Medicine, UC Davis, Sacramento, USA.
| |
Collapse
|