1
|
Segal MR. Assessing chromatin relocalization in 3D using the patient rule induction method. Biostatistics 2023; 24:618-634. [PMID: 34494087 PMCID: PMC10449022 DOI: 10.1093/biostatistics/kxab033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 05/10/2021] [Accepted: 08/07/2021] [Indexed: 11/12/2022] Open
Abstract
Three-dimensional (3D) genome architecture is critical for numerous cellular processes, including transcription, while certain conformation-driven structural alterations are frequently oncogenic. Inferring 3D chromatin configurations has been advanced by the emergence of chromatin conformation capture assays, notably Hi-C, and attendant 3D reconstruction algorithms. These have enhanced understanding of chromatin spatial organization and afforded numerous downstream biological insights. Until recently, comparisons of 3D reconstructions between conditions and/or cell types were limited to prescribed structural features. However, multiMDS, a pioneering approach developed by Rieber and Mahony (2019). that performs joint reconstruction and alignment, enables quantification of all locus-specific differences between paired Hi-C data sets. By subsequently mapping these differences to the linear (1D) genome the identification of relocalization regions is facilitated through the use of peak calling in conjunction with continuous wavelet transformation. Here, we seek to refine this approach by performing the search for significant relocalization regions in terms of the 3D structures themselves, thereby retaining the benefits of 3D reconstruction and avoiding limitations associated with the 1D perspective. The search for (extreme) relocalization regions is conducted using the patient rule induction method (PRIM). Considerations surrounding orienting structures with respect to compartmental and principal component axes are discussed, as are approaches to inference and reconstruction accuracy assessment. The illustration makes recourse to comparisons between four different cell types.
Collapse
Affiliation(s)
- Mark R Segal
- Department of Epidemiology and Biostatistics, University of
California, 550 16th Street, San Francisco, CA 94143-0560, USA
| |
Collapse
|
2
|
Olshen AB, Segal MR. Does multi-way, long-range chromatin contact data advance 3D genome reconstruction? BMC Bioinformatics 2023; 24:64. [PMID: 36829114 PMCID: PMC9951495 DOI: 10.1186/s12859-023-05170-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 02/02/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Methods for inferring the three-dimensional (3D) configuration of chromatin from conformation capture assays that provide strictly pairwise interactions, notably Hi-C, utilize the attendant contact matrix as input. More recent assays, in particular split-pool recognition of interactions by tag extension (SPRITE), capture multi-way interactions instead of solely pairwise contacts. These assays yield contacts that straddle appreciably greater genomic distances than Hi-C, in addition to instances of exceptionally high-order chromatin interaction. Such attributes are anticipated to be consequential with respect to 3D genome reconstruction, a task yet to be undertaken with multi-way contact data. However, performing such 3D reconstruction using distance-based reconstruction techniques requires framing multi-way contacts as (pairwise) distances. Comparing approaches for so doing, and assessing the resultant impact of long-range and multi-way contacts, are the objectives of this study. RESULTS We obtained 3D reconstructions via multi-dimensional scaling under a variety of weighting schemes for mapping SPRITE multi-way contacts to pairwise distances. Resultant configurations were compared following Procrustes alignment and relationships were assessed between associated Procrustes root mean square errors and key features such as the extent of multi-way and/or long-range contacts. We found that these features had surprisingly limited influence on 3D reconstruction, a finding we attribute to their influence being diminished by the preponderance of pairwise contacts. CONCLUSION Distance-based 3D genome reconstruction using SPRITE multi-way contact data is not appreciably affected by the weighting scheme used to convert multi-way interactions to pairwise distances.
Collapse
Affiliation(s)
- Adam B. Olshen
- grid.266102.10000 0001 2297 6811Department of Epidemiology and Biostatistics and Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA USA
| | - Mark R. Segal
- grid.266102.10000 0001 2297 6811Department of Epidemiology and Biostatistics, University of California, San Francisco, CA USA
| |
Collapse
|
3
|
Segal MR. Can 3D diploid genome reconstruction from unphased Hi-C data be salvaged? NAR Genom Bioinform 2022; 4:lqac038. [PMID: 35571676 PMCID: PMC9097817 DOI: 10.1093/nargab/lqac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/31/2022] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
The three-dimensional (3D) configuration of chromatin impacts numerous cellular processes. However, directly observing chromatin architecture at high resolution is challenging. Accordingly, inferring 3D structure utilizing chromatin conformation capture assays, notably Hi-C, has received considerable attention, with a multitude of reconstruction algorithms advanced. While these have enhanced appreciation of chromatin organization, most suffer from a serious shortcoming when faced with diploid genomes: inability to disambiguate contacts between corresponding loci on homologous chromosomes, making attendant reconstructions potentially meaningless. Three recent proposals offer a computational way forward at the expense of strong assumptions. Here, we show that making plausible assumptions about the components of homologous chromosome contacts provides a basis for rescuing conventional consensus-based, unphased reconstruction. This would be consequential since not only are assumptions needed for diploid reconstruction considerable, but the sophistication of select unphased algorithms affords substantive advantages with regard resolution and folding complexity. Rather than presuming that the requisite salvaging assumptions are met, we exploit a recent imaging technology, in situ genome sequencing (IGS), to comprehensively evaluate their reasonableness. We analogously use IGS to assess assumptions underpinning diploid reconstruction algorithms. Results convincingly demonstrate that, in all instances, assumptions are not met, making further algorithm development, potentially informed by IGS data, essential.
Collapse
Affiliation(s)
- Mark R Segal
- Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, CA 94143-0560, USA
| |
Collapse
|
4
|
Collins B, Oluwadare O, Brown P. ChromeBat: A Bio-Inspired Approach to 3D Genome Reconstruction. Genes (Basel) 2021; 12:1757. [PMID: 34828363 PMCID: PMC8617892 DOI: 10.3390/genes12111757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 10/28/2021] [Accepted: 11/01/2021] [Indexed: 11/20/2022] Open
Abstract
With the advent of Next Generation Sequencing and the Hi-C experiment, high quality genome-wide contact data are becoming increasingly available. These data represents an empirical measure of how a genome interacts inside the nucleus. Genome conformation is of particular interest as it has been experimentally shown to be a driving force for many genomic functions from regulation to transcription. Thus, the Three Dimensional-Genome Reconstruction Problem (3D-GRP) seeks to take Hi-C data and produces a complete physical genome structure as it appears in the nucleus for genomic analysis. We propose and develop a novel method to solve the Chromosome and Genome Reconstruction problem based on the Bat Algorithm (BA) which we called ChromeBat. We demonstrate on real Hi-C data that ChromeBat is capable of state-of-the-art performance. Additionally, the domain of Genome Reconstruction has been criticized for lacking algorithmic diversity, and the bio-inspired nature of ChromeBat contributes algorithmic diversity to the problem domain. ChromeBat is an effective approach for solving the Genome Reconstruction Problem.
Collapse
Affiliation(s)
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, USA; (B.C.); (P.B.)
| | | |
Collapse
|
5
|
Lin X, Qi Y, Latham AP, Zhang B. Multiscale modeling of genome organization with maximum entropy optimization. J Chem Phys 2021; 155:010901. [PMID: 34241389 PMCID: PMC8253599 DOI: 10.1063/5.0044150] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 04/28/2021] [Indexed: 12/15/2022] Open
Abstract
Three-dimensional (3D) organization of the human genome plays an essential role in all DNA-templated processes, including gene transcription, gene regulation, and DNA replication. Computational modeling can be an effective way of building high-resolution genome structures and improving our understanding of these molecular processes. However, it faces significant challenges as the human genome consists of over 6 × 109 base pairs, a system size that exceeds the capacity of traditional modeling approaches. In this perspective, we review the progress that has been made in modeling the human genome. Coarse-grained models parameterized to reproduce experimental data via the maximum entropy optimization algorithm serve as effective means to study genome organization at various length scales. They have provided insight into the principles of whole-genome organization and enabled de novo predictions of chromosome structures from epigenetic modifications. Applications of these models at a near-atomistic resolution further revealed physicochemical interactions that drive the phase separation of disordered proteins and dictate chromatin stability in situ. We conclude with an outlook on the opportunities and challenges in studying chromosome dynamics.
Collapse
Affiliation(s)
- Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Yifeng Qi
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Andrew P Latham
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
6
|
MacKay K, Kusalik A. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data. Brief Funct Genomics 2021; 19:292-308. [PMID: 32353112 PMCID: PMC7388788 DOI: 10.1093/bfgp/elaa004] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/30/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022] Open
Abstract
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Collapse
|
7
|
Guarnera E, Tan ZW, Berezovsky IN. Three-dimensional chromatin ensemble reconstruction via stochastic embedding. Structure 2021; 29:622-634.e3. [PMID: 33567266 DOI: 10.1016/j.str.2021.01.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 11/17/2020] [Accepted: 01/13/2021] [Indexed: 01/04/2023]
Abstract
We propose a comprehensive method for reconstructing the whole-genome chromatin ensemble from the Hi-C data. The procedure starts from Markov state modeling (MSM), delineating the structural hierarchy of chromatin organization with partitioning and effective interactions archetypal for corresponding levels of hierarchy. The stochastic embedding procedure introduced in this work provides the 3D ensemble reconstruction, using effective interactions obtained by the MSM as the input. As a result, we obtain the structural ensemble of a genome, allowing one to model the functional and the cell-type variability in the chromatin structure. The whole-genome reconstructions performed on the human B lymphoblastoid (GM12878) and lung fibroblast (IMR90) Hi-C data unravel distinctions in their morphologies and in the spatial arrangement of intermingling chromosomal territories, paving the way to studies of chromatin dynamics, developmental changes, and conformational transitions taking place in normal cells and during potential pathological developments.
Collapse
Affiliation(s)
- Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A(∗)STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, Singapore 117597, Singapore.
| |
Collapse
|
8
|
Tuzhilina E, Hastie TJ, Segal MR. Principal curve approaches for inferring 3D chromatin architecture. Biostatistics 2020; 23:626-642. [PMID: 33221831 DOI: 10.1093/biostatistics/kxaa046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 09/26/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
Three-dimensional (3D) genome spatial organization is critical for numerous cellular processes, including transcription, while certain conformation-driven structural alterations are frequently oncogenic. Genome architecture had been notoriously difficult to elucidate, but the advent of the suite of chromatin conformation capture assays, notably Hi-C, has transformed understanding of chromatin structure and provided downstream biological insights. Although many findings have flowed from direct analysis of the pairwise proximity data produced by these assays, there is added value in generating corresponding 3D reconstructions deriving from superposing genomic features on the reconstruction. Accordingly, many methods for inferring 3D architecture from proximity data have been advanced. However, none of these approaches exploit the fact that single chromosome solutions constitute a one-dimensional (1D) curve in 3D. Rather, this aspect has either been addressed by imposition of constraints, which is both computationally burdensome and cell type specific, or ignored with contiguity imposed after the fact. Here, we target finding a 1D curve by extending principal curve methodology to the metric scaling problem. We illustrate how this approach yields a sequence of candidate solutions, indexed by an underlying smoothness or degrees-of-freedom parameter, and propose methods for selection from this sequence. We apply the methodology to Hi-C data obtained on IMR90 cells and so are positioned to evaluate reconstruction accuracy by referencing orthogonal imaging data. The results indicate the utility and reproducibility of our principal curve approach in the face of underlying structural variation.
Collapse
Affiliation(s)
- Elena Tuzhilina
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| | - Trevor J Hastie
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| | - Mark R Segal
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| |
Collapse
|
9
|
Meluzzi D, Arya G. Computational approaches for inferring 3D conformations of chromatin from chromosome conformation capture data. Methods 2020; 181-182:24-34. [PMID: 31470090 PMCID: PMC7044057 DOI: 10.1016/j.ymeth.2019.08.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/24/2019] [Accepted: 08/23/2019] [Indexed: 02/08/2023] Open
Abstract
Chromosome conformation capture (3C) and its variants are powerful experimental techniques for probing intra- and inter-chromosomal interactions within cell nuclei at high resolution and in a high-throughput, quantitative manner. The contact maps derived from such experiments provide an avenue for inferring the 3D spatial organization of the genome. This review provides an overview of the various computational methods developed in the past decade for addressing the very important but challenging problem of deducing the detailed 3D structure or structure population of chromosomal domains, chromosomes, and even entire genomes from 3C contact maps.
Collapse
Affiliation(s)
- Dario Meluzzi
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Gaurav Arya
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, United States.
| |
Collapse
|
10
|
Segal MR, Fletez-Brant K. Assessing stationary distributions derived from chromatin contact maps. BMC Bioinformatics 2020; 21:73. [PMID: 32093610 PMCID: PMC7041182 DOI: 10.1186/s12859-020-3424-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 02/17/2020] [Indexed: 11/20/2022] Open
Abstract
Background The spatial configuration of chromosomes is essential to various cellular processes, notably gene regulation, while architecture related alterations, such as translocations and gene fusions, are often cancer drivers. Thus, eliciting chromatin conformation is important, yet challenging due to compaction, dynamics and scale. However, a variety of recent assays, in particular Hi-C, have generated new details of chromatin structure, spawning a number of novel biological findings. Many findings have resulted from analyses on the level of native contact data as generated by the assays. Alternatively, reconstruction based approaches often proceed by first converting contact frequencies into distances, then generating a three dimensional (3D) chromatin configuration that best recapitulates these distances. Subsequent analyses can enrich contact level analyses via superposition of genomic attributes on the reconstruction. But, such advantages depend on the accuracy of the reconstruction which, absent gold standards, is inherently difficult to assess. Attempts at accuracy evaluation have relied on simulation and/or FISH imaging that typically features a handful of low resolution probes. While newly advanced multiplexed FISH imaging offers possibilities for refined 3D reconstruction accuracy evaluation, availability of such data is limited due to assay complexity and the resolution thereof is appreciably lower than the reconstructions being assessed. Accordingly, there is demand for new methods of reconstruction accuracy appraisal. Results Here we explore the potential of recently proposed stationary distributions, hereafter StatDns, derived from Hi-C contact matrices, to serve as a basis for reconstruction accuracy assessment. Current usage of such StatDns has focussed on the identification of highly interactive regions (HIRs): computationally defined regions of the genome purportedly involved in numerous long-range intra-chromosomal contacts. Consistent identification of HIRs would be informative with respect to inferred 3D architecture since the corresponding regions of the reconstruction would have an elevated number of k nearest neighbors (kNNs). More generally, we anticipate a monotone decreasing relationship between StatDn values and kNN distances. After initially evaluating the reproducibility of StatDns across replicate Hi-C data sets, we use this implied StatDn - kNN relationship to gauge the utility of StatDns for reconstruction validation, making recourse to both real and simulated examples. Conclusions Our analyses demonstrate that, as constructed, StatDns do not provide a suitable measure for assessing the accuracy of 3D genome reconstructions. Whether this is attributable to specific choices surrounding normalization in defining StatDns or to the logic underlying their very formulation remains to be determined.
Collapse
Affiliation(s)
- Mark R Segal
- Division of Bioinformatics, Department of Epidemiology and Biostatistics, UCSF, 550 16th Street, San Francisco, 94158, CA, USA.
| | - Kipper Fletez-Brant
- Computational Biology, 23andMe, Inc., 899 West Evelyn Avenue, Mountain View, 94041, CA, USA
| |
Collapse
|
11
|
Trieu T, Oluwadare O, Wopata J, Cheng J. GenomeFlow: a comprehensive graphical tool for modeling and analyzing 3D genome structure. Bioinformatics 2020; 35:1416-1418. [PMID: 30215673 PMCID: PMC6477968 DOI: 10.1093/bioinformatics/bty802] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Revised: 08/29/2018] [Accepted: 09/11/2018] [Indexed: 02/01/2023] Open
Abstract
Motivation Three-dimensional (3D) genome organization plays important functional roles in cells. User-friendly tools for reconstructing 3D genome models from chromosomal conformation capturing data and analyzing them are needed for the study of 3D genome organization. Results We built a comprehensive graphical tool (GenomeFlow) to facilitate the entire process of modeling and analysis of 3D genome organization. This process includes the mapping of Hi-C data to one-dimensional (1D) reference genomes, the generation, normalization and visualization of two-dimensional (2D) chromosomal contact maps, the reconstruction and the visualization of the 3D models of chromosome and genome, the analysis of 3D models and the integration of these models with functional genomics data. This graphical tool is the first of its kind in reconstructing, storing, analyzing and annotating 3D genome models. It can reconstruct 3D genome models from Hi-C data and visualize them in real-time. This tool also allows users to overlay gene annotation, gene expression data and genome methylation data on top of 3D genome models. Availability and implementation The source code and user manual: https://github.com/jianlin-cheng/GenomeFlow. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tuan Trieu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Oluwatosin Oluwadare
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Julia Wopata
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| |
Collapse
|
12
|
Abstract
BACKGROUND Recent advances in genome analysis have established that chromatin has preferred 3D conformations, which bring distant loci into contact. Identifying these contacts is important for us to understand possible interactions between these loci. This has motivated the creation of the Hi-C technology, which detects long-range chromosomal interactions. Distance geometry-based algorithms, such as ChromSDE and ShRec3D, have been able to utilize Hi-C data to infer 3D chromosomal structures. However, these algorithms, being matrix-based, are space- and time-consuming on very large datasets. A human genome of 100 kilobase resolution would involve ∼30,000 loci, requiring gigabytes just in storing the matrices. RESULTS We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem. CONCLUSIONS SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .
Collapse
Affiliation(s)
- Yanlin Zhang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Weiwei Liu
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Yu Lin
- Research School of Computer Science, the Australian National University, Canberra, Australia
| | - Yen Kaow Ng
- Department of Computer Science, Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| |
Collapse
|
13
|
Hierarchical Reconstruction of High-Resolution 3D Models of Large Chromosomes. Sci Rep 2019; 9:4971. [PMID: 30899036 PMCID: PMC6428844 DOI: 10.1038/s41598-019-41369-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Accepted: 03/07/2019] [Indexed: 11/08/2022] Open
Abstract
Eukaryotic chromosomes are often composed of components organized into multiple scales, such as nucleosomes, chromatin fibers, topologically associated domains (TAD), chromosome compartments, and chromosome territories. Therefore, reconstructing detailed 3D models of chromosomes in high resolution is useful for advancing genome research. However, the task of constructing quality high-resolution 3D models is still challenging with existing methods. Hence, we designed a hierarchical algorithm, called Hierarchical3DGenome, to reconstruct 3D chromosome models at high resolution (<=5 Kilobase (KB)). The algorithm first reconstructs high-resolution 3D models at TAD level. The TAD models are then assembled to form complete high-resolution chromosomal models. The assembly of TAD models is guided by a complete low-resolution chromosome model. The algorithm is successfully used to reconstruct 3D chromosome models at 5 KB resolution for the human B-cell (GM12878). These high-resolution models satisfy Hi-C chromosomal contacts well and are consistent with models built at lower (i.e. 1 MB) resolution, and with the data of fluorescent in situ hybridization experiments. The Java source code of Hierarchical3DGenome and its user manual are available here https://github.com/BDM-Lab/Hierarchical3DGenome .
Collapse
|
14
|
Caudai C, Salerno E, Zoppe M, Tonazzini A. Estimation of the Spatial Chromatin Structure Based on a Multiresolution Bead-Chain Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:550-559. [PMID: 29994172 DOI: 10.1109/tcbb.2018.2791439] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present a method to infer 3D chromatin configurations from Chromosome Conformation Capture data. Quite a few methods have been proposed to estimate the structure of the nuclear dna in homogeneous populations of cells from this kind of data. Many of them transform contact frequencies into euclidean distances between pairs of chromatin fragments, and then reconstruct the structure by solving a distance-to-geometry problem. To avoid inconsistencies, our method is based on a score function that does not require any frequency-to-distance translation. We propose a multiscale chromatin model where the chromatin fiber is suitably partitioned at each scale. The partial structures are estimated independently, and connected to rebuild the whole fiber. Our score function consists of a data-fit part and a penalty part, balanced automatically at each scale and each subchain. The penalty part enforces soft geometric constraints. As many different structures can fit the data, our sampling strategy produces a set of solutions with similar scores. The procedure contains a few parameters, independent of both the scale and the genomic segment treated. The partition of the fiber, along with intrinsically parallel parts, make this method computationally efficient. Results from human genome data support the biological plausibility of our solutions.
Collapse
|
15
|
Caudai C, Salerno E, Zoppe M, Merelli I, Tonazzini A. ChromStruct 4: A Python Code to Estimate the Chromatin Structure from Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018:1-1. [PMID: 29993555 DOI: 10.1109/tcbb.2018.2838669] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A method and a stand-alone Python(TM) code to estimate the 3D chromatin structure from chromosome conformation capture data are presented. The method is based on a multiresolution, modified-bead-chain chromatin model, evolved through quaternion operators in a Monte Carlo sampling. The solution space to be sampled is generated by a score function with a data-fit part and a constraint part where the available prior knowledge is implicitly coded. The final solution is a set of 3D configurations that are compatible with both the data and the prior knowledge. The iterative code, provided here as additional material, is equipped with a graphical user interface and stores its results in standard-format files for 3D visualization. We describe the mathematical-computational aspects of the method and explain the details of the code. Some experimental results are reported, with a demonstration of their fit to the data.
Collapse
|
16
|
Shah FR, Bhat YA, Wani AH. Subnuclear distribution of proteins: Links with genome architecture. Nucleus 2018; 9:42-55. [PMID: 28910577 PMCID: PMC5973252 DOI: 10.1080/19491034.2017.1361578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Revised: 07/25/2017] [Accepted: 07/26/2017] [Indexed: 02/08/2023] Open
Abstract
Metazoan genomes have a hierarchal 3-dimensional (3D) organization scaling from nucleosomes, loops, topologically associating domains (TADs), compartments, to chromosome territories. The 3D organization of genome has been linked with development, differentiation and disease. However, the principles governing the 3D chromatin architecture are just beginning to get unraveled. The nucleus has very high concentration of proteins and these proteins are either diffusely distributed throughout the nucleus, or aggregated in the form of foci/bodies/clusters/speckles or in combination of both. Several evidences suggest that the distribution of proteins within the nuclear space is linked to the organization and function of genome. Here, we describe advances made in understanding the relationship between subnuclear distribution of proteins and genome architecture.
Collapse
Affiliation(s)
- Fouziya R. Shah
- Biotechnology, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Younus A. Bhat
- Biotechnology, School of Biological Sciences, University of Kashmir, Srinagar, India
| | - Ajazul H. Wani
- Biotechnology, School of Biological Sciences, University of Kashmir, Srinagar, India
| |
Collapse
|
17
|
Network analysis identifies chromosome intermingling regions as regulatory hotspots for transcription. Proc Natl Acad Sci U S A 2017; 114:13714-13719. [PMID: 29229825 PMCID: PMC5748172 DOI: 10.1073/pnas.1708028115] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
We develop a network analysis approach for identifying clusters of interactions between chromosomes, which we validate experimentally. Our method integrates 1D features of the genome, such as epigenetic marks, with 3D interactions, allowing us to study spatially colocalized regions between chromosomes that are functionally relevant. We observe that clusters of interchromosomal regions fall into active and inactive categories. We find that active clusters share transcription factors and are enriched for transcriptional machinery, suggesting that chromosome intermingling regions play a key role in genome regulation. Our method provides a unique quantitative framework that can be broadly applied to study the principles of genome organization and regulation during processes such as cell differentiation and reprogramming. The 3D structure of the genome plays a key role in regulatory control of the cell. Experimental methods such as high-throughput chromosome conformation capture (Hi-C) have been developed to probe the 3D structure of the genome. However, it remains a challenge to deduce from these data chromosome regions that are colocalized and coregulated. Here, we present an integrative approach that leverages 1D functional genomic features (e.g., epigenetic marks) with 3D interactions from Hi-C data to identify functional interchromosomal interactions. We construct a weighted network with 250-kb genomic regions as nodes and Hi-C interactions as edges, where the edge weights are given by the correlation between 1D genomic features. Individual interacting clusters are determined using weighted correlation clustering on the network. We show that intermingling regions generally fall into either active or inactive clusters based on the enrichment for RNA polymerase II (RNAPII) and H3K9me3, respectively. We show that active clusters are hotspots for transcription factor binding sites. We also validate our predictions experimentally by 3D fluorescence in situ hybridization (FISH) experiments and show that active RNAPII is enriched in predicted active clusters. Our method provides a general quantitative framework that couples 1D genomic features with 3D interactions from Hi-C to probe the guiding principles that link the spatial organization of the genome with regulatory control.
Collapse
|