1
|
MacKay K, Kusalik A. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data. Brief Funct Genomics 2021; 19:292-308. [PMID: 32353112 PMCID: PMC7388788 DOI: 10.1093/bfgp/elaa004] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/30/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022] Open
Abstract
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Collapse
|
2
|
Abstract
BACKGROUND Recent advances in genome analysis have established that chromatin has preferred 3D conformations, which bring distant loci into contact. Identifying these contacts is important for us to understand possible interactions between these loci. This has motivated the creation of the Hi-C technology, which detects long-range chromosomal interactions. Distance geometry-based algorithms, such as ChromSDE and ShRec3D, have been able to utilize Hi-C data to infer 3D chromosomal structures. However, these algorithms, being matrix-based, are space- and time-consuming on very large datasets. A human genome of 100 kilobase resolution would involve ∼30,000 loci, requiring gigabytes just in storing the matrices. RESULTS We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem. CONCLUSIONS SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .
Collapse
Affiliation(s)
- Yanlin Zhang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Weiwei Liu
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Yu Lin
- Research School of Computer Science, the Australian National University, Canberra, Australia
| | - Yen Kaow Ng
- Department of Computer Science, Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| |
Collapse
|
3
|
Diament A, Tuller T. Modeling three-dimensional genomic organization in evolution and pathogenesis. Semin Cell Dev Biol 2018; 90:78-93. [PMID: 30030143 DOI: 10.1016/j.semcdb.2018.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/08/2018] [Indexed: 12/17/2022]
Abstract
The regulation of gene expression is mediated via the complex three-dimensional (3D) conformation of the genetic material and its interactions with various intracellular factors. Various experimental and computational approaches have been developed in recent years for understating the relation between the 3D conformation of the genome and the phenotypes of cells in normal condition and diseases. In this review, we will discuss novel approaches for analyzing and modeling the 3D genomic conformation, focusing on deciphering disease-causing mutations that affect gene expression. We conclude that as this is a very challenging mission, an important direction should involve the comparative analysis of various 3D models from various organisms or cells.
Collapse
Affiliation(s)
- Alon Diament
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tamir Tuller
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel; The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv 6997801, Israel.
| |
Collapse
|
4
|
Diament A, Tuller T. Tracking the evolution of 3D gene organization demonstrates its connection to phenotypic divergence. Nucleic Acids Res 2017; 45:4330-4343. [PMID: 28369658 PMCID: PMC5416853 DOI: 10.1093/nar/gkx205] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/20/2017] [Indexed: 12/20/2022] Open
Abstract
It has recently been shown that the organization of genes in eukaryotic genomes, and specifically in 3D, is strongly related to gene expression and function and partially conserved between organisms. However, previous studies of 3D genomic organization analyzed each organism independently from others. Here, we propose an approach for unified inter-organismal analysis of gene organization based on a network representation of Hi-C data. We define and detect four classes of spatially co-evolving orthologous modules (SCOMs), i.e. gene families that co-evolve in their 3D organization, based on patterns of divergence and conservation of distances. We demonstrate our methodology on Hi-C data from Saccharomyces cerevisiae and Schizosaccharomyces pombe, and identify, among others, modules relating to RNA splicing machinery and chromatin silencing by small RNA which are central to S. pombe's lifestyle. Our results emphasize the importance of 3D genomic organization in eukaryotes and suggest that the evolutionary mechanisms that shape gene organization affect the organism fitness and phenotypes. The proposed algorithms can be utilized in future studies of genome evolution and comparative analysis of spatial genomic organization in different tissues, conditions and single cells.
Collapse
Affiliation(s)
- Alon Diament
- Biomedical Engineering Dept., Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tamir Tuller
- Biomedical Engineering Dept., Tel Aviv University, Tel Aviv 6997801, Israel.,The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
5
|
Sefer E, Duggal G, Kingsford C. Deconvolution of Ensemble Chromatin Interaction Data Reveals the Latent Mixing Structures in Cell Subpopulations. J Comput Biol 2016; 23:425-38. [PMID: 27267775 PMCID: PMC4904159 DOI: 10.1089/cmb.2015.0210] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Chromosome conformation capture (3C) experiments provide a window into the spatial packing of a genome in three dimensions within the cell. This structure has been shown to be correlated with gene regulation, cancer mutations, and other genomic functions. However, 3C provides mixed measurements on a population of typically millions of cells, each with a different genome structure due to the fluidity of the genome and differing cell states. Here, we present several algorithms to deconvolve these measured 3C matrices into estimations of the contact matrices for each subpopulation of cells and relative densities of each subpopulation. We formulate the problem as that of choosing matrices and densities that minimize the Frobenius distance between the observed 3C matrix and the weighted sum of the estimated subpopulation matrices. Results on HeLa 5C and mouse and bacteria Hi-C data demonstrate the methods' effectiveness. We also show that domain boundaries from deconvolved matrices are often more enriched or depleted for regulatory chromatin markers when compared to boundaries from convolved matrices.
Collapse
Affiliation(s)
- Emre Sefer
- School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Geet Duggal
- School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania
| | - Carl Kingsford
- School of Computer Science, Carnegie Mellon University , Pittsburgh, Pennsylvania
| |
Collapse
|
6
|
Yun X, Xia L, Tang B, Zhang H, Li F, Zhang Z. 3CDB: a manually curated database of chromosome conformation capture data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw044. [PMID: 27081154 PMCID: PMC4831724 DOI: 10.1093/database/baw044] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 03/11/2016] [Indexed: 01/23/2023]
Abstract
Chromosome conformation capture (3C) is a biochemical technology to analyse contact frequencies between selected genomic sites in a cell population. Its recent genomic variants, e.g. Hi-C/ chromatin interaction analysis by paired-end tag (ChIA-PET), have enabled the study of nuclear organization at an unprecedented level. However, due to the inherent low resolution and ultrahigh cost of Hi-C/ChIA-PET, 3C is still the gold standard for determining interactions between given regulatory DNA elements, such as enhancers and promoters. Therefore, we developed a database of 3C determined functional chromatin interactions (3CDB;http://3cdb.big.ac.cn). To construct 3CDB, we searched PubMed and Google Scholar with carefully designed keyword combinations and retrieved more than 5000 articles from which we manually extracted 3319 interactions in 17 species. Moreover, we proposed a systematic evaluation scheme for data reliability and classified the interactions into four categories. Contact frequencies are not directly comparable as a result of various modified 3C protocols employed among laboratories. Our evaluation scheme provides a plausible solution to this long-standing problem in the field. A user-friendly web interface was designed to assist quick searches in 3CDB. We believe that 3CDB will provide fundamental information for experimental design and phylogenetic analysis, as well as bridge the gap between molecular and systems biologists who must now contend with noisy high-throughput data.Database URL:http://3cdb.big.ac.cn.
Collapse
Affiliation(s)
- Xiaoxiao Yun
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Lili Xia
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Bixia Tang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Hui Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Feifei Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, 100101, China
| |
Collapse
|
7
|
Three-dimensional Genomic Organization of Genes’ Function in Eukaryotes. Evol Biol 2016. [DOI: 10.1007/978-3-319-41324-2_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Segal MR, Bengtsson HL. Reconstruction of 3D genome architecture via a two-stage algorithm. BMC Bioinformatics 2015; 16:373. [PMID: 26553003 PMCID: PMC4638111 DOI: 10.1186/s12859-015-0799-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 10/27/2015] [Indexed: 11/10/2022] Open
Abstract
Background The three-dimensional (3D) configuration of chromosomes within the eukaryote nucleus is an important factor for several cellular functions, including gene expression regulation, and has also been linked with cancer-causing translocation events. While visualization of such architecture remains limited to low resolutions, the ability to infer structures at increasing resolutions has been enabled by recently-devised chromosome conformation capture techniques. In particular, when coupled with next generation sequencing, such methods yield an inventory of genome-wide chromatin contacts or interactions. Various algorithms have been advanced to operate on such contact data to produce reconstructed 3D configurations. Studies have shown that these reconstructions can provide added value over raw interaction data with respect to downstream biological insights. However, only limited, low-resolution reconstructions have been realized for mammals due to computational bottlenecks. Results Here we propose a two-stage algorithm to partially overcome these computational barriers. The central idea is to initially utilize existing reconstruction techniques on an individual chromosome basis, using intra-chromosomal contacts, and then to relatively position these chromosome-level reconstructions using inter-chromosomal contacts. This two-stage strategy represents a natural approach in view of the within- versus between- chromosome distribution of contacts. It can increase resolution ≈ 20 fold for mouse and human. After describing the algorithm we present 3D architectures for mouse embryonic stem cells and human lymphoblastoid cells. We evaluate the impact of several factors on reconstruction reproducibility and explore a variety of sampling schemes. We further analyze replicate data at differing resolutions obtained from recently devised in situ Hi-C assays. In all instances we demonstrate insensitivity of the whole-genome 3D reconstruction obtained by the two-stage algorithm to the sampling strategy used. Conclusions Our two-stage algorithm has the potential to significantly increase the resolution of 3D genome reconstructions. The improvements are such that we can progress from 1 Mb resolution to 100 kb resolution, notable since this latter value has been identified as critical to inferring topological domains in analyses performed on the contact (rather than 3D) level.
Collapse
Affiliation(s)
- Mark R Segal
- Division of Bioinformatics, Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, 94158, CA, USA.
| | - Henrik L Bengtsson
- Division of Bioinformatics, Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, 94158, CA, USA.
| |
Collapse
|