1
|
Drogaris C, Zhang Y, Zhang E, Nazarova E, Sarrazin-Gendron R, Wilhelm-Landry S, Cyr Y, Majewski J, Blanchette M, Waldispühl J. ARGV: 3D genome structure exploration using augmented reality. BMC Bioinformatics 2024; 25:277. [PMID: 39192184 DOI: 10.1186/s12859-024-05882-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Accepted: 07/25/2024] [Indexed: 08/29/2024] Open
Abstract
Over the past two decades, scientists have increasingly realized the importance of the three-dimensional (3D) genome organization in regulating cellular activity. Hi-C and related experiments yield 2D contact matrices that can be used to infer 3D models of chromosome structure. Visualizing and analyzing genomes in 3D space remains challenging. Here, we present ARGV, an augmented reality 3D Genome Viewer. ARGV contains more than 350 pre-computed and annotated genome structures inferred from Hi-C and imaging data. It offers interactive and collaborative visualization of genomes in 3D space, using standard mobile phones or tablets. A user study comparing ARGV to existing tools demonstrates its benefits.
Collapse
Affiliation(s)
| | - Yanlin Zhang
- School of Computer Science, McGill University, Montréal, QC, H3A 0E9, Canada
| | - Eric Zhang
- School of Computer Science, McGill University, Montréal, QC, H3A 0E9, Canada
| | - Elena Nazarova
- School of Computer Science, McGill University, Montréal, QC, H3A 0E9, Canada
| | | | | | - Yan Cyr
- Beam Me Up Inc., 5925 Monkland Ave, Suite, 100, Montréal, H4A 1G7, Canada
| | - Jacek Majewski
- Department of Human Genetics, McGill University, Montréal, QC, H3A 1B1, Canada
| | - Mathieu Blanchette
- School of Computer Science, McGill University, Montréal, QC, H3A 0E9, Canada
| | - Jérôme Waldispühl
- School of Computer Science, McGill University, Montréal, QC, H3A 0E9, Canada.
| |
Collapse
|
2
|
Kadlof M, Banecki K, Chiliński M, Plewczynski D. Chromatin image-driven modelling. Methods 2024; 226:54-60. [PMID: 38636797 DOI: 10.1016/j.ymeth.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/13/2024] [Accepted: 04/05/2024] [Indexed: 04/20/2024] Open
Abstract
The challenge of modelling the spatial conformation of chromatin remains an open problem. While multiple data-driven approaches have been proposed, each has limitations. This work introduces two image-driven modelling methods based on the Molecular Dynamics Flexible Fitting (MDFF) approach: the force method and the correlational method. Both methods have already been used successfully in protein modelling. We propose a novel way to employ them for building chromatin models directly from 3D images. This approach is termed image-driven modelling. Additionally, we introduce the initial structure generator, a tool designed to generate optimal starting structures for the proposed algorithms. The methods are versatile and can be applied to various data types, with minor modifications to accommodate new generation imaging techniques.
Collapse
Affiliation(s)
- Michał Kadlof
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
| | - Krzysztof Banecki
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Mateusz Chiliński
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland; Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
3
|
Zhang Y, Cameron CJF, Blanchette M. Posterior inference of Hi-C contact frequency through sampling. FRONTIERS IN BIOINFORMATICS 2024; 3:1285828. [PMID: 38455089 PMCID: PMC10919286 DOI: 10.3389/fbinf.2023.1285828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/20/2023] [Indexed: 03/09/2024] Open
Abstract
Hi-C is one of the most widely used approaches to study three-dimensional genome conformations. Contacts captured by a Hi-C experiment are represented in a contact frequency matrix. Due to the limited sequencing depth and other factors, Hi-C contact frequency matrices are only approximations of the true interaction frequencies and are further reported without any quantification of uncertainty. Hence, downstream analyses based on Hi-C contact maps (e.g., TAD and loop annotation) are themselves point estimations. Here, we present the Hi-C interaction frequency sampler (HiCSampler) that reliably infers the posterior distribution of the interaction frequency for a given Hi-C contact map by exploiting dependencies between neighboring loci. Posterior predictive checks demonstrate that HiCSampler can infer highly predictive chromosomal interaction frequency. Summary statistics calculated by HiCSampler provide a measurement of the uncertainty for Hi-C experiments, and samples inferred by HiCSampler are ready for use by most downstream analysis tools off the shelf and permit uncertainty measurements in these analyses without modifications.
Collapse
Affiliation(s)
- Yanlin Zhang
- School of Computer Science, McGill University, Montréal, QC, Canada
| | - Christopher J. F. Cameron
- School of Computer Science, McGill University, Montréal, QC, Canada
- Department of Biochemistry and Goodman Cancer Research Center, McGill University, Montreal, QC, Canada
| | | |
Collapse
|
4
|
Cifuentes D, Draisma J, Henriksson O, Korchmaros A, Kubjas K. 3D Genome Reconstruction from Partially Phased Hi-C Data. Bull Math Biol 2024; 86:33. [PMID: 38386111 PMCID: PMC10884149 DOI: 10.1007/s11538-024-01263-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024]
Abstract
The 3-dimensional (3D) structure of the genome is of significant importance for many cellular processes. In this paper, we study the problem of reconstructing the 3D structure of chromosomes from Hi-C data of diploid organisms, which poses additional challenges compared to the better-studied haploid setting. With the help of techniques from algebraic geometry, we prove that a small amount of phased data is sufficient to ensure finite identifiability, both for noiseless and noisy data. In the light of these results, we propose a new 3D reconstruction method based on semidefinite programming, paired with numerical algebraic geometry and local optimization. The performance of this method is tested on several simulated datasets under different noise levels and with different amounts of phased data. We also apply it to a real dataset from mouse X chromosomes, and we are then able to recover previously known structural features.
Collapse
Affiliation(s)
- Diego Cifuentes
- School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Drive, NW, Atlanta, GA, 30332, USA
| | - Jan Draisma
- Mathematisches Institut, University of Bern, Sidlerstrasse 5, 3012, Bern, Switzerland
| | - Oskar Henriksson
- Department of Mathematical Sciences, University of Copenhagen, Universitetsparken 5, 2100, Copenhagen, Denmark
| | - Annachiara Korchmaros
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Kaie Kubjas
- Department of Mathematics and Systems Analysis, Aalto University, P.O. Box 11100, 00076, Aalto, Finland.
| |
Collapse
|
5
|
Lin X, Zhang B. Explicit ion modeling predicts physicochemical interactions for chromatin organization. eLife 2024; 12:RP90073. [PMID: 38289342 PMCID: PMC10945522 DOI: 10.7554/elife.90073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open
Abstract
Molecular mechanisms that dictate chromatin organization in vivo are under active investigation, and the extent to which intrinsic interactions contribute to this process remains debatable. A central quantity for evaluating their contribution is the strength of nucleosome-nucleosome binding, which previous experiments have estimated to range from 2 to 14 kBT. We introduce an explicit ion model to dramatically enhance the accuracy of residue-level coarse-grained modeling approaches across a wide range of ionic concentrations. This model allows for de novo predictions of chromatin organization and remains computationally efficient, enabling large-scale conformational sampling for free energy calculations. It reproduces the energetics of protein-DNA binding and unwinding of single nucleosomal DNA, and resolves the differential impact of mono- and divalent ions on chromatin conformations. Moreover, we showed that the model can reconcile various experiments on quantifying nucleosomal interactions, providing an explanation for the large discrepancy between existing estimations. We predict the interaction strength at physiological conditions to be 9 kBT, a value that is nonetheless sensitive to DNA linker length and the presence of linker histones. Our study strongly supports the contribution of physicochemical interactions to the phase behavior of chromatin aggregates and chromatin organization inside the nucleus.
Collapse
Affiliation(s)
- Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of TechnologyCambridgeUnited States
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of TechnologyCambridgeUnited States
| |
Collapse
|
6
|
Liu T, Qiu QT, Hua KJ, Ma BG. Chromosome structure modeling tools and their evaluation in bacteria. Brief Bioinform 2024; 25:bbae044. [PMID: 38385874 PMCID: PMC10883143 DOI: 10.1093/bib/bbae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/31/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024] Open
Abstract
The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.
Collapse
Affiliation(s)
- Tong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qin-Tian Qiu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Kang-Jian Hua
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
7
|
Lin X, Zhang B. Explicit Ion Modeling Predicts Physicochemical Interactions for Chromatin Organization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.16.541030. [PMID: 37293007 PMCID: PMC10245791 DOI: 10.1101/2023.05.16.541030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Molecular mechanisms that dictate chromatin organization in vivo are under active investigation, and the extent to which intrinsic interactions contribute to this process remains debatable. A central quantity for evaluating their contribution is the strength of nucleosome-nucleosome binding, which previous experiments have estimated to range from 2 to 14 kBT. We introduce an explicit ion model to dramatically enhance the accuracy of residue-level coarse-grained modeling approaches across a wide range of ionic concentrations. This model allows for de novo predictions of chromatin organization and remains computationally efficient, enabling large-scale conformational sampling for free energy calculations. It reproduces the energetics of protein-DNA binding and unwinding of single nucleosomal DNA, and resolves the differential impact of mono and divalent ions on chromatin conformations. Moreover, we showed that the model can reconcile various experiments on quantifying nucleosomal interactions, providing an explanation for the large discrepancy between existing estimations. We predict the interaction strength at physiological conditions to be 9 kBT, a value that is nonetheless sensitive to DNA linker length and the presence of linker histones. Our study strongly supports the contribution of physicochemical interactions to the phase behavior of chromatin aggregates and chromatin organization inside the nucleus.
Collapse
Affiliation(s)
- Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
8
|
Habeck M. Bayesian methods in integrative structure modeling. Biol Chem 2023; 404:741-754. [PMID: 37505205 DOI: 10.1515/hsz-2023-0145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023]
Abstract
There is a growing interest in characterizing the structure and dynamics of large biomolecular assemblies and their interactions within the cellular environment. A diverse array of experimental techniques allows us to study biomolecular systems on a variety of length and time scales. These techniques range from imaging with light, X-rays or electrons, to spectroscopic methods, cross-linking mass spectrometry and functional genomics approaches, and are complemented by AI-assisted protein structure prediction methods. A challenge is to integrate all of these data into a model of the system and its functional dynamics. This review focuses on Bayesian approaches to integrative structure modeling. We sketch the principles of Bayesian inference, highlight recent applications to integrative modeling and conclude with a discussion of current challenges and future perspectives.
Collapse
Affiliation(s)
- Michael Habeck
- Microscopic Image Analysis Group, Jena University Hospital, D-07743 Jena, Germany
- Max Planck Institute for Multidisciplinary Sciences, d-37077 Göttingen, Germany
| |
Collapse
|
9
|
Shi G, Thirumalai D. A maximum-entropy model to predict 3D structural ensembles of chromatin from pairwise distances with applications to interphase chromosomes and structural variants. Nat Commun 2023; 14:1150. [PMID: 36854665 PMCID: PMC9974990 DOI: 10.1038/s41467-023-36412-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 01/31/2023] [Indexed: 03/02/2023] Open
Abstract
The principles that govern the organization of genomes, which are needed for an understanding of how chromosomes are packaged and function in eukaryotic cells, could be deciphered if the three-dimensional (3D) structures are known. Recently, single-cell imaging techniques have been developed to determine the 3D coordinates of genomic loci in vivo. Here, we introduce a computational method (Distance Matrix to Ensemble of Structures, DIMES), based on the maximum entropy principle, with experimental pairwise distances between loci as constraints, to generate a unique ensemble of 3D chromatin structures. Using the ensemble of structures, we quantitatively account for the distribution of pairwise distances, three-body co-localization, and higher-order interactions. The DIMES method can be applied to both small and chromosome-scale imaging data to quantify the extent of heterogeneity and fluctuations in the shapes across various length scales. We develop a perturbation method in conjunction with DIMES to predict the changes in 3D structures from structural variations. Our method also reveals quantitative differences between the 3D structures inferred from Hi-C and those measured in imaging experiments. Finally, the physical interpretation of the parameters extracted from DIMES provides insights into the origin of phase separation between euchromatin and heterochromatin domains.
Collapse
Affiliation(s)
- Guang Shi
- Department of Chemistry, University of Texas at Austin, Austin, Texas, 78712, USA. .,Department of Materials Science, University of Illinois, Urbana, Illinois, 61801, USA.
| | - D Thirumalai
- Department of Chemistry, University of Texas at Austin, Austin, Texas, 78712, USA. .,Department of Physics, University of Texas at Austin, Austin, Texas, 78712, USA.
| |
Collapse
|
10
|
Varoquaux N, Noble WS, Vert JP. Inference of 3D genome architecture by modeling overdispersion of Hi-C data. Bioinformatics 2023; 39:btac838. [PMID: 36594573 PMCID: PMC9857972 DOI: 10.1093/bioinformatics/btac838] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 11/16/2022] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelle Varoquaux
- TIMC, Université Grenoble Alpes, CNRS, Grenoble INP, Grenoble 38000, France
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jean-Philippe Vert
- Brain Team, Google Research, Paris 75009, France
- Centre for Computational Biology , MINES ParisTech, PSL University, Paris 75006, France
| |
Collapse
|
11
|
Hovenga V, Kalita J, Oluwadare O. HiC-GNN: A generalizable model for 3D chromosome reconstruction using graph convolutional neural networks. Comput Struct Biotechnol J 2022; 21:812-836. [PMID: 36698967 PMCID: PMC9842867 DOI: 10.1016/j.csbj.2022.12.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 12/08/2022] [Accepted: 12/30/2022] [Indexed: 01/02/2023] Open
Abstract
Chromosome conformation capture (3 C) is a method of measuring chromosome topology in terms of loci interaction. The Hi-C method is a derivative of 3 C that allows for genome-wide quantification of chromosome interaction. From such interaction data, it is possible to infer the three-dimensional (3D) structure of the underlying chromosome. In this paper, we developed a novel method, HiC-GNN, for predicting the 3D structures of chromosomes from Hi-C data. HiC-GNN is unique from other methods for chromosome structure prediction in that the models learned by HiC-GNN can be generalized to data that is distinct from the training data. This aspect of HiC-GNN allows models that were trained on one Hi-C contact map to be used for inference on entirely different maps. To the authors' knowledge, this generalizing capability is not present in any existing methods. HiC-GNN uses a node embedding algorithm and a graph neural network to predict the 3D coordinates of each genomic loci from the corresponding Hi-C contact data. Unlike other methods, our algorithm allows for the storage of pre-trained parameters, thus enabling prediction on data that is entirely different from the training data. We show that our method can accurately generalize a single model across Hi-C resolutions, multiple restriction enzymes, and multiple cell populations while maintaining reconstruction accuracy across three Hi-C datasets. Our algorithm outperforms the state-of-the-art methods in accuracy of prediction and runtime and introduces a novel method for 3D structure prediction from Hi-C data. All our source codes and data are available at https://github.com/OluwadareLab/HiC-GNN.
Collapse
Affiliation(s)
- Van Hovenga
- Department of Mathematics, University of Colorado, Colorado Springs, CO, United States
| | - Jugal Kalita
- Department of Computer Science, University of Colorado, Colorado Springs, CO, United States
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO, United States,Corresponding author.
| |
Collapse
|
12
|
Lamberti WF, Zang C. Extracting physical characteristics of higher-order chromatin structures from 3D image data. Comput Struct Biotechnol J 2022; 20:3387-3398. [PMID: 35832633 PMCID: PMC9260447 DOI: 10.1016/j.csbj.2022.06.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 06/09/2022] [Accepted: 06/09/2022] [Indexed: 11/25/2022] Open
Abstract
Higher-order chromatin structures have functional impacts on gene regulation and cell identity determination. Using high-throughput sequencing (HTS)-based methods like Hi-C, active or inactive compartments and open or closed topologically associating domain (TAD) structures can be identified on a cell population level. Recently developed high-resolution three-dimensional (3D) molecular imaging techniques such as 3D electron microscopy with in situ hybridization (3D-EMSIH) and 3D structured illumination microscopy (3D-SIM) enable direct detection of physical representations of chromatin structures in a single cell. However, computational analysis of 3D image data with explainability and interpretability on functional characteristics of chromatin structures is still challenging. We developed Extracting Physical-Characteristics from Images of Chromatin Structures (EPICS), a machine-learning based computational method for processing high-resolution chromatin 3D image data. Using EPICS on images produced by 3D-EMISH or 3D-SIM techniques, we generated more direct 3D representations of higher-order chromatin structures, identified major chromatin domains, and determined the open or closed status of each domain. We identified several high-contributing features from the model as the major physical characteristics that define the open or closed chromatin domains, demonstrating the explainability and interpretability of EPICS. EPICS can be applied to the analysis of other high-resolution 3D molecular imaging data for spatial genomics studies. The R and Python codes of EPICS are available at https://github.com/zang-lab/epics.
Collapse
Affiliation(s)
- William Franz Lamberti
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Chongzhi Zang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
13
|
Liu T, Wang Z. scHiCEmbed: Bin-Specific Embeddings of Single-Cell Hi-C Data Using Graph Auto-Encoders. Genes (Basel) 2022; 13:genes13061048. [PMID: 35741810 PMCID: PMC9222580 DOI: 10.3390/genes13061048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 02/05/2023] Open
Abstract
Most publicly accessible single-cell Hi-C data are sparse and cannot reach a higher resolution. Therefore, learning latent representations (bin-specific embeddings) of sparse single-cell Hi-C matrices would provide us with a novel way of mining valuable information hidden in the limited number of single-cell Hi-C contacts. We present scHiCEmbed, an unsupervised computational method for learning bin-specific embeddings of single-cell Hi-C data, and the computational system is applied to the tasks of 3D structure reconstruction of whole genomes and detection of topologically associating domains (TAD). The only input of scHiCEmbed is a raw or scHiCluster-imputed single-cell Hi-C matrix. The main process of scHiCEmbed is to embed each node/bin in a higher dimensional space using graph auto-encoders. The learned n-by-3 bin-specific embedding/latent matrix is considered the final reconstructed 3D genome structure. For TAD detection, we use constrained hierarchical clustering on the latent matrix to classify bins: S_Dbw is used to determine the optimal number of clusters, and each cluster is considered as one potential TAD. Our reconstructed 3D structures for individual chromatins at different cell stages reveal the expanding process of chromatins during the cell cycle. We observe that the TADs called from single-cell Hi-C data are not shared across individual cells and that the TAD boundaries called from raw or imputed single-cell Hi-C are significantly different from those called from bulk Hi-C, confirming the cell-to-cell variability in terms of TAD definitions. The source code for scHiCEmbed is publicly available, and the URL can be found in the conclusion section.
Collapse
|
14
|
Osuntoki IG, Harrison A, Dai H, Bao Y, Zabet NR. ZipHiC: a novel Bayesian framework to identify enriched interactions and experimental biases in Hi-C data. Bioinformatics 2022; 38:3523-3531. [PMID: 35678507 PMCID: PMC9272800 DOI: 10.1093/bioinformatics/btac387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Revised: 05/23/2022] [Accepted: 06/07/2022] [Indexed: 11/26/2022] Open
Abstract
Motivation Several computational and statistical methods have been developed to analyze data generated through the 3C-based methods, especially the Hi-C. Most of the existing methods do not account for dependency in Hi-C data. Results Here, we present ZipHiC, a novel statistical method to explore Hi-C data focusing on the detection of enriched contacts. ZipHiC implements a Bayesian method based on a hidden Markov random field (HMRF) model and the Approximate Bayesian Computation (ABC) to detect interactions in two-dimensional space based on a Hi-C contact frequency matrix. ZipHiC uses data on the sources of biases related to the contact frequency matrix, allows borrowing information from neighbours using the Potts model and improves computation speed using the ABC model. In addition to outperforming existing tools on both simulated and real data, our model also provides insights into different sources of biases that affects Hi-C data. We show that some datasets display higher biases from DNA accessibility or Transposable Elements content. Furthermore, our analysis in Drosophila melanogaster showed that approximately half of the detected significant interactions connect promoters with other parts of the genome indicating a functional biological role. Finally, we found that the micro-C datasets display higher biases from DNA accessibility compared to a similar Hi-C experiment, but this can be corrected by ZipHiC. Availability and implementation The R scripts are available at https://github.com/igosungithub/HMRFHiC.git. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Itunu G Osuntoki
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Statistics, Modelling and Economics Department, UK Health Security Agency, London, NW9 5EQ, United Kingdom
| | - Andrew Harrison
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Hongsheng Dai
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Yanchun Bao
- Department of Mathematical Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom
| | - Nicolae Radu Zabet
- School of Life Sciences, University of Essex, Colchester, CO4 3SQ, United Kingdom.,Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, E1 2AT, United Kingdom
| |
Collapse
|
15
|
Madsen-Østerbye J, Bellanger A, Galigniana NM, Collas P. Biology and Model Predictions of the Dynamics and Heterogeneity of Chromatin-Nuclear Lamina Interactions. Front Cell Dev Biol 2022; 10:913458. [PMID: 35693945 PMCID: PMC9178083 DOI: 10.3389/fcell.2022.913458] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 05/12/2022] [Indexed: 11/13/2022] Open
Abstract
Associations of chromatin with the nuclear lamina, at the nuclear periphery, help shape the genome in 3 dimensions. The genomic landscape of lamina-associated domains (LADs) is well characterized, but much remains unknown on the physical and mechanistic properties of chromatin conformation at the nuclear lamina. Computational models of chromatin folding at, and interactions with, a surface representing the nuclear lamina are emerging in attempts to characterize these properties and predict chromatin behavior at the lamina in health and disease. Here, we highlight the heterogeneous nature of the nuclear lamina and LADs, outline the main 3-dimensional chromatin structural modeling methods, review applications of modeling chromatin-lamina interactions and discuss biological insights inferred from these models in normal and disease states. Lastly, we address perspectives on future developments in modeling chromatin interactions with the nuclear lamina.
Collapse
Affiliation(s)
- Julia Madsen-Østerbye
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Aurélie Bellanger
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
| | - Natalia M. Galigniana
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
- Department of Immunology and Transfusion Medicine, Oslo University Hospital, Oslo, Norway
| | - Philippe Collas
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Oslo, Norway
- Department of Immunology and Transfusion Medicine, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
16
|
Wang H, Yang J, Zhang Y, Qian J, Wang J. Reconstruct high-resolution 3D genome structures for diverse cell-types using FLAMINGO. Nat Commun 2022; 13:2645. [PMID: 35551182 PMCID: PMC9098643 DOI: 10.1038/s41467-022-30270-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 04/22/2022] [Indexed: 11/30/2022] Open
Abstract
High-resolution reconstruction of spatial chromosome organizations from chromatin contact maps is highly demanded, but is hindered by extensive pairwise constraints, substantial missing data, and limited resolution and cell-type availabilities. Here, we present FLAMINGO, a computational method that addresses these challenges by compressing inter-dependent Hi-C interactions to delineate the underlying low-rank structures in 3D space, based on the low-rank matrix completion technique. FLAMINGO successfully generates 5 kb- and 1 kb-resolution spatial conformations for all chromosomes in the human genome across multiple cell-types, the largest resources to date. Compared to other methods using various experimental metrics, FLAMINGO consistently demonstrates superior accuracy in recapitulating observed structures with raises in scalability by orders of magnitude. The reconstructed 3D structures efficiently facilitate discoveries of higher-order multi-way interactions, imply biological interpretations of long-range QTLs, reveal geometrical properties of chromatin, and provide high-resolution references to understand structural variabilities. Importantly, FLAMINGO achieves robust predictions against high rates of missing data and significantly boosts 3D structure resolutions. Moreover, FLAMINGO shows vigorous cross cell-type structure predictions that capture cell-type specific spatial configurations via integration of 1D epigenomic signals. FLAMINGO can be widely applied to large-scale chromatin contact maps and expand high-resolution spatial genome conformations for diverse cell-types.
Collapse
Affiliation(s)
- Hao Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiaxin Yang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA
| | - Yu Zhang
- Center for Immunobiology, Department of Investigative Medicine, Western Michigan University Homer Stryker M.D. School of Medicine, Kalamazoo, MI, 49007, USA
| | - Jianliang Qian
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA.
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
17
|
Esposito A, Abraham A, Conte M, Vercellone F, Prisco A, Bianco S, Chiariello AM. The Physics of DNA Folding: Polymer Models and Phase-Separation. Polymers (Basel) 2022; 14:1918. [PMID: 35567087 PMCID: PMC9104579 DOI: 10.3390/polym14091918] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 04/23/2022] [Accepted: 04/27/2022] [Indexed: 02/04/2023] Open
Abstract
Within cell nuclei, several biophysical processes occur in order to allow the correct activities of the genome such as transcription and gene regulation. To quantitatively investigate such processes, polymer physics models have been developed to unveil the molecular mechanisms underlying genome functions. Among these, phase-separation plays a key role since it controls gene activity and shapes chromatin spatial structure. In this paper, we review some recent experimental and theoretical progress in the field and show that polymer physics in synergy with numerical simulations can be helpful for several purposes, including the study of molecular condensates, gene-enhancer dynamics, and the three-dimensional reconstruction of real genomic regions.
Collapse
Affiliation(s)
- Andrea Esposito
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
| | - Alex Abraham
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
| | - Mattia Conte
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
| | - Francesca Vercellone
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
| | | | - Simona Bianco
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
- Berlin Institute for Medical Systems Biology, Max-Delbrück Centre (MDC) for Molecular Medicine, 10115 Berlin, Germany
| | - Andrea M. Chiariello
- Dipartimento di Fisica, Università di Napoli Federico II, INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy; (A.E.); (A.A.); (M.C.); (F.V.)
| |
Collapse
|
18
|
Liang J, Perez-Rathke A. Minimalistic 3D chromatin models: Sparse interactions in single cells drive the chromatin fold and form many-body units. Curr Opin Struct Biol 2021; 71:200-214. [PMID: 34399301 DOI: 10.1016/j.sbi.2021.06.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/27/2021] [Accepted: 06/29/2021] [Indexed: 11/26/2022]
Abstract
Computational three-dimensional chromatin modeling has helped uncover principles of genome organization. Here, we discuss methods for modeling three-dimensional chromatin structures, with focus on a minimalistic polymer model which inverts population Hi-C into single-cell conformations. Utilizing only basic physical properties, this model reveals that a few specific Hi-C interactions can fold chromatin into conformations consistent with single-cell imaging, Dip-C, and FISH measurements. Aggregated single-cell chromatin conformations also reproduce Hi-C frequencies. This approach allows quantification of structural heterogeneity and discovery of many-body interaction units and has revealed additional insights, including (1) topologically associating domains as a byproduct of folding driven by specific interactions, (2) cell subpopulations with different structural scaffolds are developmental stage dependent, and (3) the functional landscape of many-body units within enhancer-rich regions. We also discuss these findings in relation to the genome structure-function relationship.
Collapse
Affiliation(s)
- Jie Liang
- Center for Bioinformatics and Quantitative Biology & Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, 60612, USA.
| | - Alan Perez-Rathke
- Center for Bioinformatics and Quantitative Biology & Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, 60612, USA
| |
Collapse
|
19
|
Polymer models are a versatile tool to study chromatin 3D organization. Biochem Soc Trans 2021; 49:1675-1684. [PMID: 34282837 DOI: 10.1042/bst20201004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 06/21/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
The development of new experimental technologies is opening the way to a deeper investigation of the three-dimensional organization of chromosomes inside the cell nucleus. Genome architecture is linked to vital functional purposes, yet a full comprehension of the mechanisms behind DNA folding is still far from being accomplished. Theoretical approaches based on polymer physics have been employed to understand the complexity of chromatin architecture data and to unveil the basic mechanisms shaping its structure. Here, we review some recent advances in the field to discuss how Polymer Physics, combined with numerical Molecular Dynamics simulation and Machine Learning based inference, can capture important aspects of genome organization, including the description of tissue-specific structural rearrangements, the detection of novel, regulatory-linked architectural elements and the structural variability of chromatin at the single-cell level.
Collapse
|
20
|
Lin X, Qi Y, Latham AP, Zhang B. Multiscale modeling of genome organization with maximum entropy optimization. J Chem Phys 2021; 155:010901. [PMID: 34241389 PMCID: PMC8253599 DOI: 10.1063/5.0044150] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 04/28/2021] [Indexed: 12/15/2022] Open
Abstract
Three-dimensional (3D) organization of the human genome plays an essential role in all DNA-templated processes, including gene transcription, gene regulation, and DNA replication. Computational modeling can be an effective way of building high-resolution genome structures and improving our understanding of these molecular processes. However, it faces significant challenges as the human genome consists of over 6 × 109 base pairs, a system size that exceeds the capacity of traditional modeling approaches. In this perspective, we review the progress that has been made in modeling the human genome. Coarse-grained models parameterized to reproduce experimental data via the maximum entropy optimization algorithm serve as effective means to study genome organization at various length scales. They have provided insight into the principles of whole-genome organization and enabled de novo predictions of chromosome structures from epigenetic modifications. Applications of these models at a near-atomistic resolution further revealed physicochemical interactions that drive the phase separation of disordered proteins and dictate chromatin stability in situ. We conclude with an outlook on the opportunities and challenges in studying chromosome dynamics.
Collapse
Affiliation(s)
- Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Yifeng Qi
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Andrew P. Latham
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
21
|
MacKay K, Kusalik A. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data. Brief Funct Genomics 2021; 19:292-308. [PMID: 32353112 PMCID: PMC7388788 DOI: 10.1093/bfgp/elaa004] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/30/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022] Open
Abstract
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Collapse
|
22
|
Zha M, Wang N, Zhang C, Wang Z. Inferring Single-Cell 3D Chromosomal Structures Based on the Lennard-Jones Potential. Int J Mol Sci 2021; 22:ijms22115914. [PMID: 34072879 PMCID: PMC8199262 DOI: 10.3390/ijms22115914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 05/23/2021] [Accepted: 05/28/2021] [Indexed: 11/16/2022] Open
Abstract
Reconstructing three-dimensional (3D) chromosomal structures based on single-cell Hi-C data is a challenging scientific problem due to the extreme sparseness of the single-cell Hi-C data. In this research, we used the Lennard-Jones potential to reconstruct both 500 kb and high-resolution 50 kb chromosomal structures based on single-cell Hi-C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse single-cell Hi-C contact matrices. We designed a novel loss function based on the Lennard-Jones potential, in which the ε value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have single-cell Hi-C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolis-Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3D-FISH data.
Collapse
Affiliation(s)
- Mengsheng Zha
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA;
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
- Correspondence:
| |
Collapse
|
23
|
Integration of Multiple Resolution Data in 3D Chromatin Reconstruction Using ChromStruct. BIOLOGY 2021; 10:biology10040338. [PMID: 33923796 PMCID: PMC8072831 DOI: 10.3390/biology10040338] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/30/2021] [Accepted: 04/15/2021] [Indexed: 11/29/2022]
Abstract
The three-dimensional structure of chromatin in the cellular nucleus carries important information that is connected to physiological and pathological correlates and dysfunctional cell behaviour. As direct observation is not feasible at present, on one side, several experimental techniques have been developed to provide information on the spatial organization of the DNA in the cell; on the other side, several computational methods have been developed to elaborate experimental data and infer 3D chromatin conformations. The most relevant experimental methods are Chromosome Conformation Capture and its derivatives, chromatin immunoprecipitation and sequencing techniques (CHIP-seq), RNA-seq, fluorescence in situ hybridization (FISH) and other genetic and biochemical techniques. All of them provide important and complementary information that relate to the three-dimensional organization of chromatin. However, these techniques employ very different experimental protocols and provide information that is not easily integrated, due to different contexts and different resolutions. Here, we present an open-source tool, which is an expansion of the previously reported code ChromStruct, for inferring the 3D structure of chromatin that, by exploiting a multilevel approach, allows an easy integration of information derived from different experimental protocols and referred to different resolution levels of the structure, from a few kilobases up to Megabases. Our results show that the introduction of chromatin modelling features related to CTCF CHIA-PET data, histone modification CHIP-seq, and RNA-seq data produce appreciable improvements in ChromStruct’s 3D reconstructions, compared to the use of HI-C data alone, at a local level and at a very high resolution.
Collapse
|
24
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|
25
|
Soto CJ, Zhao PA, Klein KN, Gilbert DM, Srivastava A. STATISTICAL COMPARISONS OF CHROMOSOMAL SHAPE POPULATIONS. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2021; 2021:788-791. [PMID: 35165532 PMCID: PMC8840943 DOI: 10.1109/isbi48211.2021.9433812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper develops statistical tools for testing differences in shapes of chromosomes resulting from certain gene knockouts (KO), specifically RIF1 gene KO (RKO) and the cohesin subunit RAD21 gene KO (CKO). It utilizes a two-sample test for comparing shapes of KO chromosomes with wild type (WT) at two levels: (1) Coarse shape analysis, where one compares shapes of full or large parts of chromosomes, and (2) Fine shape analysis, where chromosomes are first segmented into (TAD-based) pieces and then the corresponding pieces are compared across populations. The shape comparisons - coarse and fine - are based on an elastic shape metric for comparing shapes of 3D curves. The experiments show that the KO populations, RKO and CKO, have statistically significant differences from WT at both coarse and fine levels. Furthermore, this framework highlights local regions where these differences are most prominent.
Collapse
Affiliation(s)
- Carlos J Soto
- Department of Statistics, Pennsylvania State University, State College, PA, USA
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Peiyao A Zhao
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Kyle N Klein
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| |
Collapse
|
26
|
Soto C, Bryner D, Neretti N, Srivastava A. Toward a Three-Dimensional Chromosome Shape Alphabet. J Comput Biol 2021; 28:601-618. [PMID: 33720766 DOI: 10.1089/cmb.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The study of the three-dimensional (3D) structure of chromosomes-the largest macromolecules in biology-is one of the most challenging to date in structural biology. Here, we develop a novel representation of 3D chromosome structures, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet from an ensemble of chromosome 3D structures inferred from Hi-C data-via SIMBA3D or other methods-by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs' 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof of concept for these CSLs by reconstructing independent test curves by using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize shapes in an ensemble of chromosome 3D structures by using generalized sequence logos.
Collapse
Affiliation(s)
- Carlos Soto
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Darshan Bryner
- Naval Surface Warfare Center Panama City Division, Panama City, Florida, USA
| | - Nicola Neretti
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
27
|
Cheng Z, Liu L, Lin G, Yi C, Chu X, Liang Y, Zhou W, Jin X. ReHiC: Enhancing Hi-C data resolution via residual convolutional network. J Bioinform Comput Biol 2021; 19:2150001. [PMID: 33685371 DOI: 10.1142/s0219720021500013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
High-throughput chromosome conformation capture (Hi-C) is one of the most popular methods for studying the three-dimensional organization of genomes. However, Hi-C protocols can be expensive since they require large amounts of sample material and may be time-consuming. Most commonly used Hi-C data are low-resolution. Such data can only be used to identify large-scale genomic interactions and are not sufficient to identify the small-scale patterns. We propose a novel deep learning-based computational approach (named ReHiC) that enhances the resolution of Hi-C data and allows us to achieve high-resolution Hi-C data at a relatively low cost. Our model only requires 1/16 down-sampling ratio of the original sequence reading to predict higher resolution Hi-C data. This is very close to high-resolution data in terms of numerical distribution and interaction distribution. More importantly, our framework stacks deeper and converges faster due to residual blocks in the core of the network. Extensive experiments show that ReHiC performs better than HiCPlus and HiCNN, two recently developed and frequently used methods to look at the spatial organization of chromatin structure in the cell. Moreover, the portability of our framework verified by extensive experiments shows that the trained model can also enhance the Hi-C matrix of other cell types efficiently. In conclusion, ReHiC offers more accurate high-resolution image reconstruction in a broad field.
Collapse
Affiliation(s)
- Zhe Cheng
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming 650000, China
| | - Guoliang Lin
- State Key Laboratory for Conservation and Utilization of Bio-resource and School of Life Sciences, Yunnan University, Kunming 650000, China
| | - Chao Yi
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Xing Chu
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Yu Liang
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Wei Zhou
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| | - Xin Jin
- National Pilot School of Software, Yunnan University, Kunming 650000, China.,Engineering Research Center of Cyberspace, Yunnan University, Kunming 650000, China
| |
Collapse
|
28
|
Chiariello AM, Bianco S, Esposito A, Fiorillo L, Conte M, Irani E, Musella F, Abraham A, Prisco A, Nicodemi M. Physical mechanisms of chromatin spatial organization. FEBS J 2021; 289:1180-1190. [PMID: 33583147 DOI: 10.1111/febs.15762] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Revised: 01/22/2021] [Accepted: 02/11/2021] [Indexed: 12/11/2022]
Affiliation(s)
- Andrea M. Chiariello
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Simona Bianco
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Andrea Esposito
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Luca Fiorillo
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Mattia Conte
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Ehsan Irani
- Berlin Institute for Medical Systems BiologyMax‐Delbrück Centre (MDC) for Molecular Medicine Berlin Germany
| | - Francesco Musella
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | - Alex Abraham
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
| | | | - Mario Nicodemi
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant’Angelo Naples Italy
- Berlin Institute for Medical Systems BiologyMax‐Delbrück Centre (MDC) for Molecular Medicine Berlin Germany
- Berlin Institute of Health (BIH)MDC‐Berlin Germany
| |
Collapse
|
29
|
Han C, Xie Q, Lin S. Are dropout imputation methods for scRNA-seq effective for scHi-C data? Brief Bioinform 2020; 22:5985294. [PMID: 33201180 DOI: 10.1093/bib/bbaa289] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 01/07/2023] Open
Abstract
The prevalence of dropout events is a serious problem for single-cell Hi-C (scHiC) data due to insufficient sequencing depth and data coverage, which brings difficulties in downstream studies such as clustering and structural analysis. Complicating things further is the fact that dropouts are confounded with structural zeros due to underlying properties, leading to observed zeros being a mixture of both types of events. Although a great deal of progress has been made in imputing dropout events for single cell RNA-sequencing (RNA-seq) data, little has been done in identifying structural zeros and imputing dropouts for scHiC data. In this paper, we adapted several methods from the single-cell RNA-seq literature for inference on observed zeros in scHiC data and evaluated their effectiveness. Through an extensive simulation study and real data analysis, we have shown that a couple of the adapted single-cell RNA-seq algorithms can be powerful for correctly identifying structural zeros and accurately imputing dropout values. Downstream analysis using the imputed values showed considerable improvement for clustering cells of the same types together over clustering results before imputation.
Collapse
Affiliation(s)
| | | | - Shili Lin
- Translational Data Analytics Institute at the Ohio State University
| |
Collapse
|
30
|
Meluzzi D, Arya G. Computational approaches for inferring 3D conformations of chromatin from chromosome conformation capture data. Methods 2020; 181-182:24-34. [PMID: 31470090 PMCID: PMC7044057 DOI: 10.1016/j.ymeth.2019.08.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/24/2019] [Accepted: 08/23/2019] [Indexed: 02/08/2023] Open
Abstract
Chromosome conformation capture (3C) and its variants are powerful experimental techniques for probing intra- and inter-chromosomal interactions within cell nuclei at high resolution and in a high-throughput, quantitative manner. The contact maps derived from such experiments provide an avenue for inferring the 3D spatial organization of the genome. This review provides an overview of the various computational methods developed in the past decade for addressing the very important but challenging problem of deducing the detailed 3D structure or structure population of chromosomal domains, chromosomes, and even entire genomes from 3C contact maps.
Collapse
Affiliation(s)
- Dario Meluzzi
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Gaurav Arya
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, United States.
| |
Collapse
|
31
|
Kadlof M, Rozycka J, Plewczynski D. Spring Model - Chromatin Modeling Tool Based on OpenMM. Methods 2020; 181-182:62-69. [PMID: 31790732 DOI: 10.1016/j.ymeth.2019.11.014] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 11/22/2019] [Accepted: 11/26/2019] [Indexed: 12/01/2022] Open
Abstract
Chromatin structure modeling is a rapidly developing field. Parallel to the enormous growth of available experimental data, there is a growing need of building and visualizing 3D structures of nuclei, chromosomes, chromatin domains, and single loops associated with particular gene loci. Here, we present a tool for chromatin domain modeling; it is available as a webservice and standalone python script. Our tool is based on molecular mechanics and utilizes the OpenMM engine for model generation. In this method the user provides contacts between chromatin regions and obtains a 3D structure that satisfies them. Additional parameters allow for the control of fibre stiffness, initial structure adjustments and simulation resolution, there are also options for structure refinement and modeling in a spherical container. The user may provide contacts in the form of bead indices, or insert interactions in genome coordinates sourced from BEDPE files. After the simulation is complete, the user is able to download the structure in the Protein Data Bank (PDB) format for further analysis. We dedicate this tool to all who are interested in chromatin structures. It is suitable for quick visualization of datasets, studying the impact of structural variants (SVs), inspecting the effects of adding and removing particular contacts, and measuring features such as maximum distances between sites (e.g.promoter-enhancer), or local chromatin density.
Collapse
Affiliation(s)
- Michal Kadlof
- Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland; Faculty of Physics, University of Warsaw, Pasteura 5, 02-093 Warsaw, Poland.
| | - Julia Rozycka
- Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, S. Banacha 2c, 02-097 Warsaw, Poland; Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland.
| |
Collapse
|
32
|
Bulathsinghalage C, Liu L. Network-based method for regions with statistically frequent interchromosomal interactions at single-cell resolution. BMC Bioinformatics 2020; 21:369. [PMID: 32998686 PMCID: PMC7526258 DOI: 10.1186/s12859-020-03689-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Chromosome conformation capture-based methods, especially Hi-C, enable scientists to detect genome-wide chromatin interactions and study the spatial organization of chromatin, which plays important roles in gene expression regulation, DNA replication and repair etc. Thus, developing computational methods to unravel patterns behind the data becomes critical. Existing computational methods focus on intrachromosomal interactions and ignore interchromosomal interactions partly because there is no prior knowledge for interchromosomal interactions and the frequency of interchromosomal interactions is much lower while the search space is much larger. With the development of single-cell technologies, the advent of single-cell Hi-C makes interrogating the spatial structure of chromatin at single-cell resolution possible. It also brings a new type of frequency information, the number of single cells with chromatin interactions between two disjoint chromosome regions. RESULTS Considering the lack of computational methods on interchromosomal interactions and the unsurprisingly frequent intrachromosomal interactions along the diagonal of a chromatin contact map, we propose a computational method dedicated to analyzing interchromosomal interactions of single-cell Hi-C with this new frequency information. To the best of our knowledge, our proposed tool is the first to identify regions with statistically frequent interchromosomal interactions at single-cell resolution. We demonstrate that the tool utilizing networks and binomial statistical tests can identify interesting structural regions through visualization, comparison and enrichment analysis and it also supports different configurations to provide users with flexibility. CONCLUSIONS It will be a useful tool for analyzing single-cell Hi-C interchromosomal interactions.
Collapse
Affiliation(s)
| | - Lu Liu
- North Dakota State University, 1340 Administration Ave, Fargo, 58102, USA.
| |
Collapse
|
33
|
Oluwadare O, Highsmith M, Turner D, Lieberman Aiden E, Cheng J. GSDB: a database of 3D chromosome and genome structures reconstructed from Hi-C data. BMC Mol Cell Biol 2020; 21:60. [PMID: 32758136 PMCID: PMC7405446 DOI: 10.1186/s12860-020-00304-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 07/29/2020] [Indexed: 11/10/2022] Open
Abstract
Advances in the study of chromosome conformation capture technologies, such as Hi-C technique - capable of capturing chromosomal interactions in a genome-wide scale - have led to the development of three-dimensional chromosome and genome structure reconstruction methods from Hi-C data. The three dimensional genome structure is important because it plays a role in a variety of important biological activities such as DNA replication, gene regulation, genome interaction, and gene expression. In recent years, numerous Hi-C datasets have been generated, and likewise, a number of genome structure construction algorithms have been developed. In this work, we outline the construction of a novel Genome Structure Database (GSDB) to create a comprehensive repository that contains 3D structures for Hi-C datasets constructed by a variety of 3D structure reconstruction tools. The GSDB contains over 50,000 structures from 12 state-of-the-art Hi-C data structure prediction algorithms for 32 Hi-C datasets. GSDB functions as a centralized collection of genome structures which will enable the exploration of the dynamic architectures of chromosomes and genomes for biomedical research. GSDB is accessible at http://sysbio.rnet.missouri.edu/3dgenome/GSDB
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Douglass Turner
- Elastic Image Software LLC, 21 Walnut Street, Lexington, MA, 02421, USA
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
34
|
Kunz T, Rieber L, Mahony S. Assessing relationships between chromatin interactions and regulatory genomic activities using the self-organizing map. Methods 2020; 189:12-21. [PMID: 32652235 DOI: 10.1016/j.ymeth.2020.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 06/09/2020] [Accepted: 07/03/2020] [Indexed: 11/24/2022] Open
Abstract
Few existing methods enable the visualization of relationships between regulatory genomic activities and genome organization as captured by Hi-C experimental data. Genome-wide Hi-C datasets are often displayed using "heatmap" matrices, but it is difficult to intuit from these heatmaps which biochemical activities are compartmentalized together. High-dimensional Hi-C data vectors can alternatively be projected onto three-dimensional space using dimensionality reduction techniques. The resulting three-dimensional structures can serve as scaffolds for projecting other forms of genomic information, thereby enabling the exploration of relationships between genome organization and various genome annotations. However, while three-dimensional models are contextually appropriate for chromatin interaction data, some analyses and visualizations may be more intuitively and conveniently performed in two-dimensional space. We present a novel approach to the visualization and analysis of chromatin organization based on the Self-Organizing Map (SOM). The SOM algorithm provides a two-dimensional manifold which adapts to represent the high dimensional chromatin interaction space. The resulting data structure can then be used to assess relationships between regulatory genomic activities and chromatin interactions. For example, given a set of genomic coordinates corresponding to a given biochemical activity, the degree to which this activity is segregated or compartmentalized in chromatin interaction space can be intuitively visualized on the 2D SOM grid and quantified using Lorenz curve analysis. We demonstrate our approach for exploratory analysis of genome compartmentalization in a high-resolution Hi-C dataset from the human GM12878 cell line. Our SOM-based approach provides an intuitive visualization of the large-scale structure of Hi-C data and serves as a platform for integrative analyses of the relationships between various genomic activities and genome organization.
Collapse
Affiliation(s)
- Timothy Kunz
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Lila Rieber
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Shaun Mahony
- Biochemistry & Molecular Biology Department, Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA.
| |
Collapse
|
35
|
Zhu H, Wang Z. SCL: a lattice-based approach to infer 3D chromosome structures from single-cell Hi-C data. Bioinformatics 2020; 35:3981-3988. [PMID: 30865261 PMCID: PMC6792089 DOI: 10.1093/bioinformatics/btz181] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/31/2019] [Accepted: 03/12/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the 3D structures of chromosomes based on single-cell Hi-C data. RESULTS We developed single-cell lattice (SCL), a computational method to reconstruct 3D structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2 D Gaussian function specifically for the characteristics of single-cell Hi-C data. A chromosome is represented as beads-on-a-string and stored in a 3 D cubic lattice. Metropolis-Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3 D structures (at both 500 and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3 D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topologically associating domains (TADs), and H3K4me3 enriched TADs by mapping data from previous studies onto the SCL-inferred 3 D structures. AVAILABILITY AND IMPLEMENTATION The C++ source code of SCL is freely available at http://dna.cs.miami.edu/SCL/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Zhu
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| |
Collapse
|
36
|
Li FZ, Liu ZE, Li XY, Bu LM, Bu HX, Liu H, Zhang CM. Chromatin 3D structure reconstruction with consideration of adjacency relationship among genomic loci. BMC Bioinformatics 2020; 21:272. [PMID: 32611376 PMCID: PMC7329537 DOI: 10.1186/s12859-020-03612-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 06/18/2020] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Chromatin 3D conformation plays important roles in regulating gene or protein functions. High-throughout chromosome conformation capture (3C)-based technologies, such as Hi-C, have been exploited to acquire the contact frequencies among genomic loci at genome-scale. Various computational tools have been proposed to recover the underlying chromatin 3D structures from in situ Hi-C contact map data. As connected residuals in a polymer, neighboring genomic loci have intrinsic mutual dependencies in building a 3D conformation. However, current methods seldom take this feature into account. RESULTS We present a method called ShNeigh, which combines the classical MDS technique with local dependence of neighboring loci modeled by a Gaussian formula, to infer the best 3D structure from noisy and incomplete contact frequency matrices. We validated ShNeigh by comparing it to two typical distance-based algorithms, ShRec3D and ChromSDE. The comparison results on simulated Hi-C dataset showed that, while keeping the high-speed nature of classical MDS, ShNeigh can recover the true structure better than ShRec3D and ChromSDE. Meanwhile, ShNeigh is more robust to data noise. On the publicly available human GM06990 Hi-C data, we demonstrated that the structures reconstructed by ShNeigh are more reproducible between different restriction enzymes than by ShRec3D and ChromSDE, especially at high resolutions manifested by sparse contact maps, which means ShNeigh is more robust to signal coverage. CONCLUSIONS Our method can recover stable structures in high noise and sparse signal settings. It can also reconstruct similar structures from Hi-C data obtained using different restriction enzymes. Therefore, our method provides a new direction for enhancing the reconstruction quality of chromatin 3D structures.
Collapse
Affiliation(s)
- Fang-Zhen Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China. .,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China.
| | - Zhi-E Liu
- College of Physics and Electronic Engineering, Qilu Normal University, Jinan, China
| | - Xiu-Yuan Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Li-Mei Bu
- Department of Gastroenterology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Shanghai, China
| | - Hong-Xia Bu
- Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Hui Liu
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| | - Cai-Ming Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| |
Collapse
|
37
|
Liu T, Wang Z. HiCNN: a very deep convolutional neural network to better enhance the resolution of Hi-C data. Bioinformatics 2020; 35:4222-4228. [PMID: 31056636 PMCID: PMC6821373 DOI: 10.1093/bioinformatics/btz251] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 03/07/2019] [Accepted: 04/05/2019] [Indexed: 11/29/2022] Open
Abstract
Motivation High-resolution Hi-C data are indispensable for the studies of three-dimensional (3D) genome organization at kilobase level. However, generating high-resolution Hi-C data (e.g. 5 kb) by conducting Hi-C experiments needs millions of mammalian cells, which may eventually generate billions of paired-end reads with a high sequencing cost. Therefore, it will be important and helpful if we can enhance the resolutions of Hi-C data by computational methods. Results We developed a new computational method named HiCNN that used a 54-layer very deep convolutional neural network to enhance the resolutions of Hi-C data. The network contains both global and local residual learning with multiple speedup techniques included resulting in fast convergence. We used mean squared errors and Pearson’s correlation coefficients between real high-resolution and computationally predicted high-resolution Hi-C data to evaluate the method. The evaluation results show that HiCNN consistently outperforms HiCPlus, the only existing tool in the literature, when training and testing data are extracted from the same cell type (i.e. GM12878) and from two different cell types in the same or different species (i.e. GM12878 as training with K562 as testing, and GM12878 as training with CH12-LX as testing). We further found that the HiCNN-enhanced high-resolution Hi-C data are more consistent with real experimental high-resolution Hi-C data than HiCPlus-enhanced data in terms of indicating statistically significant interactions. Moreover, HiCNN can efficiently enhance low-resolution Hi-C data, which eventually helps recover two chromatin loops that were confirmed by 3D-FISH. Availability and implementation HiCNN is freely available at http://dna.cs.miami.edu/HiCNN/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| |
Collapse
|
38
|
Bayesian inference of chromatin structure ensembles from population-averaged contact data. Proc Natl Acad Sci U S A 2020; 117:7824-7830. [PMID: 32193349 DOI: 10.1073/pnas.1910364117] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Mounting experimental evidence suggests a role for the spatial organization of chromatin in crucial processes of the cell nucleus such as transcription regulation. Chromosome conformation capture techniques allow us to characterize chromatin structure by mapping contacts between chromosomal loci on a genome-wide scale. The most widespread modality is to measure contact frequencies averaged over a population of cells. Single-cell variants exist, but suffer from low contact numbers and have not yet gained the same resolution as population methods. While intriguing biological insights have already been garnered from ensemble-averaged data, information about three-dimensional (3D) genome organization in the underlying individual cells remains largely obscured because the contact maps show only an average over a huge population of cells. Moreover, computational methods for structure modeling of chromatin have mostly focused on fitting a single consensus structure, thereby ignoring any cell-to-cell variability in the model itself. Here, we propose a fully Bayesian method to infer ensembles of chromatin structures and to determine the optimal number of states in a principled, objective way. We illustrate our approach on simulated data and compute multistate models of chromatin from chromosome conformation capture carbon copy (5C) data. Comparison with independent data suggests that the inferred ensembles represent the underlying sample population faithfully. Harnessing the rich information contained in multistate models, we investigate cell-to-cell variability of chromatin organization into topologically associating domains, thus highlighting the ability of our approach to deliver insights into chromatin organization of great biological relevance.
Collapse
|
39
|
Zhang R, Hu M, Zhu Y, Qin Z, Deng K, Liu JS. Inferring Spatial Organization of Individual Topologically Associated Domains via Piecewise Helical Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:647-656. [PMID: 30113897 PMCID: PMC7202374 DOI: 10.1109/tcbb.2018.2865349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The recently developed Hi-C technology enables a genome-wide view of chromosome spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specifically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. In this study, we propose a parsimonious, easy to interpret, and robust piecewise helical model for the inference of 3D chromosomal structure of individual topologically associated domain from Hi-C data. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model fitting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.
Collapse
|
40
|
Hu Y, Zhao H, Zhao Y, Zheng J, Guo Y, Ma J. Characterization of chromosome organization in the differentiation of acute myeloid leukemia cells by all-trans retinoic acid. Life Sci 2020; 249:117479. [PMID: 32119959 DOI: 10.1016/j.lfs.2020.117479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Revised: 02/16/2020] [Accepted: 02/26/2020] [Indexed: 12/16/2022]
Abstract
A concomitant change of nucleus shape and chromosome conformation often happens in all-trans retinoic acid (ATRA)-induced differentiation of acute myeloid leukemia cells. However, the relation between the 3D chromosome architecture and the genome-wide epigenetic pattern for transcriptional regulation is poorly understood. In this study, high-throughput chromosome conformation capture (Hi-C) and chromosome immunoprecipitation (ChIP-seq) were employed to investigate the landscape of chromosome distal interaction and H3K4/27me3 in HL-60 cells treated with ATRA. We observed a general loss of topological associated domains (TADs) at PTPN11 during the differentiation of HL-60 cells. Furthermore, the significantly reduced enrichment of CCCTC binding factor (CTCF) near the boundary where PTPN11 located, as well as the decreased H3K4me3 and increased H3K27me3 enrichment at PTPN11 upon ATRA treatment was observed. Taken together, our study indicated a regulatory mechanism behind the silenced PTPN11 in HL-60 cells differentiation.
Collapse
Affiliation(s)
- Yanping Hu
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan 450008, PR China; Henan Key Laboratory of Molecular Pathology, Zhengzhou, Henan 450008, PR China
| | - Hongchao Zhao
- Department of gastroenterology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450008, PR China
| | - Yixun Zhao
- Endoscopic Center, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan 450008, PR China
| | - Jiawen Zheng
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan 450008, PR China
| | - Yongjun Guo
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan 450008, PR China; Henan Key Laboratory of Molecular Pathology, Zhengzhou, Henan 450008, PR China.
| | - Jie Ma
- Department of Molecular Pathology, The Affiliated Cancer Hospital of Zhengzhou University, Henan Cancer Hospital, Zhengzhou, Henan 450008, PR China; Henan Key Laboratory of Molecular Pathology, Zhengzhou, Henan 450008, PR China.
| |
Collapse
|
41
|
Abstract
BACKGROUND The genome architecture mapping (GAM) technique can capture genome-wide chromatin interactions. However, besides the known systematic biases in the raw GAM data, we have found a new type of systematic bias. It is necessary to develop and evaluate effective normalization methods to remove all systematic biases in the raw GAM data. RESULTS We have detected a new type of systematic bias, the fragment length bias, in the genome architecture mapping (GAM) data, which is significantly different from the bias of window detection frequency previously mentioned in the paper introducing the GAM method but is similar to the bias of distances between restriction sites existing in raw Hi-C data. We have found that the normalization method (a normalized variant of the linkage disequilibrium) used in the GAM paper is not able to effectively eliminate the new fragment length bias at 1 Mb resolution (slightly better at 30 kb resolution). We have developed an R package named normGAM for eliminating the new fragment length bias together with the other three biases existing in raw GAM data, which are the biases related to window detection frequency, mappability, and GC content. Five normalization methods have been implemented and included in the R package including Knight-Ruiz 2-norm (KR2, newly designed by us), normalized linkage disequilibrium (NLD), vanilla coverage (VC), sequential component normalization (SCN), and iterative correction and eigenvector decomposition (ICE). CONCLUSIONS Based on our evaluations, the five normalization methods can eliminate the four biases existing in raw GAM data, with VC and KR2 performing better than the others. We have observed that the KR2-normalized GAM data have a higher correlation with the KR-normalized Hi-C data on the same cell samples indicating that the KR-related methods are better than the others for keeping the consistency between the GAM and Hi-C experiments. Compared with the raw GAM data, the normalized GAM data are more consistent with the normalized distances from the fluorescence in situ hybridization (FISH) experiments. The source code of normGAM can be freely downloaded from http://dna.cs.miami.edu/normGAM/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
42
|
Abstract
BACKGROUND Topologically associating domains (TADs) are genomic regions with varying lengths. The interactions within TADs are more frequent than those between different TADs. TADs or sub-TADs are considered the structural and functional units of the mammalian genomes. Although TADs are important for understanding how genomes function, we have limited knowledge about their 3D structural properties. RESULTS In this study, we designed and benchmarked three metrics for capturing the three-dimensional and two-dimensional structural signatures of TADs, which can help better understand TADs' structural properties and the relationships between structural properties and genetic and epigenetic features. The first metric for capturing 3D structural properties is radius of gyration, which in this study is used to measure the spatial compactness of TADs. The mass value of each DNA bead in a 3D structure is novelly defined as one or more genetic or epigenetic feature(s). The second metric is folding degree. The last metric is exponent parameter, which is used to capture the 2D structural properties based on TADs' Hi-C contact matrices. In general, we observed significant correlations between the three metrics and the genetic and epigenetic features. We made the same observations when using H3K4me3, transcription start sites, and RNA polymerase II to represent the mass value in the modified radius-of-gyration metric. Moreover, we have found that the TADs in the clusters of depleted chromatin states apparently correspond to smaller exponent parameters and larger radius of gyrations. In addition, a new objective function of multidimensional scaling for modelling chromatin or TADs 3D structures was designed and benchmarked, which can handle the DNA bead-pairs with zero Hi-C contact values. CONCLUSIONS The web server for reconstructing chromatin 3D structures using multiple different objective functions and the related source code are publicly available at http://dna.cs.miami.edu/3DChrom/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| |
Collapse
|
43
|
Fiorillo L, Bianco S, Esposito A, Conte M, Sciarretta R, Musella F, Chiariello AM. A modern challenge of polymer physics: Novel ways to study, interpret, and reconstruct chromatin structure. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1454] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Luca Fiorillo
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Simona Bianco
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Andrea Esposito
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Mattia Conte
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Renato Sciarretta
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Francesco Musella
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| | - Andrea M. Chiariello
- Dipartimento di Fisica Università di Napoli Federico II, and INFN Napoli Complesso Universitario di Monte Sant'Angelo Naples Italy
| |
Collapse
|
44
|
Liu T, Wang Z. HiCNN2: Enhancing the Resolution of Hi-C Data Using an Ensemble of Convolutional Neural Networks. Genes (Basel) 2019; 10:genes10110862. [PMID: 31671634 PMCID: PMC6896157 DOI: 10.3390/genes10110862] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 10/28/2019] [Indexed: 12/17/2022] Open
Abstract
We present a deep-learning package named HiCNN2 to learn the mapping between low-resolution and high-resolution Hi-C (a technique for capturing genome-wide chromatin interactions) data, which can enhance the resolution of Hi-C interaction matrices. The HiCNN2 package includes three methods each with a different deep learning architecture: HiCNN2-1 is based on one single convolutional neural network (ConvNet); HiCNN2-2 consists of an ensemble of two different ConvNets; and HiCNN2-3 is an ensemble of three different ConvNets. Our evaluation results indicate that HiCNN2-enhanced high-resolution Hi-C data achieve smaller mean squared error and higher Pearson’s correlation coefficients with experimental high-resolution Hi-C data compared with existing methods HiCPlus and HiCNN. Moreover, all of the three HiCNN2 methods can recover more significant interactions detected by Fit-Hi-C compared to HiCPlus and HiCNN. Based on our evaluation results, we would recommend using HiCNN2-1 and HiCNN2-3 if recovering more significant interactions from Hi-C data is of interest, and HiCNN2-2 and HiCNN if the goal is to achieve higher reproducibility scores between the enhanced Hi-C matrix and the real high-resolution Hi-C matrix.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124, USA.
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124, USA.
| |
Collapse
|
45
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 222] [Impact Index Per Article: 44.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
46
|
Briand N, Collas P. Laminopathy-causing lamin A mutations reconfigure lamina-associated domains and local spatial chromatin conformation. Nucleus 2019. [PMID: 29517398 PMCID: PMC5973257 DOI: 10.1080/19491034.2018.1449498] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The nuclear lamina contributes to the regulation of gene expression and to chromatin organization. Mutations in A-type nuclear lamins cause laminopathies, some of which are associated with a loss of heterochromatin at the nuclear periphery. Until recently however, little if any information has been provided on where and how lamin A interacts with the genome and on how disease-causing lamin A mutations may rearrange genome conformation. Here, we review aspects of nuclear lamin association with the genome. We highlight recent evidence of reorganization of lamin A-chromatin interactions in cellular models of laminopathies, and implications on the 3-dimensional rearrangement of chromatin in these models, including patient cells. We discuss how a hot-spot lipodystrophic lamin A mutation alters chromatin conformation and epigenetic patterns at an anti-adipogenic locus, and conclude with remarks on links between lamin A, Polycomb and the pathophysiology of laminopathies. The recent findings presented here collectively argue towards a deregulation of large-scale and local spatial genome organization by a subset of lamin A mutations causing laminopathies.
Collapse
Affiliation(s)
- Nolwenn Briand
- a Department of Molecular Medicine , Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo , Oslo , Norway
| | - Philippe Collas
- a Department of Molecular Medicine , Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo , Oslo , Norway.,b Norwegian Center for Stem Cell Research, Department of Immunology and Transfusion Medicine , Oslo University Hospital , Oslo , Norway
| |
Collapse
|
47
|
Kapilevich V, Seno S, Matsuda H, Takenaka Y. Chromatin 3D Reconstruction from Chromosomal Contacts Using a Genetic Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1620-1626. [PMID: 29994156 DOI: 10.1109/tcbb.2018.2814995] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent epigenetics research has demonstrated that chromatin conformation plays an important role in various aspects of gene regulation. Chromosome Conformation Capture (3C) technology makes it possible to analyze the spatial organization of chromatin in a cell. Several algorithms for three-dimensional reconstruction of chromatin structure from 3C experimental data have been proposed. Compared to other algorithms, ShRec3D, one of the most advanced algorithms, can reconstruct a chromatin model in the shortest time for high-resolution whole-genome experimental data. However, ShRec3D employs a graph shortest path algorithm, which introduces errors in the resulting model. We propose an improved algorithm that optimizes shortest path distances using a genetic algorithm approach. The proposed algorithm and ShRec3D were compared using in silico 3C experimental data. Compared to ShRec3D, the proposed algorithm demonstrated significant improvement relative to the similarity between the algorithm's output and the original model with a reasonable increase to calculation time.
Collapse
|
48
|
Li X, An Z, Zhang Z. Comparison of computational methods for 3D genome analysis at single-cell Hi-C level. Methods 2019; 181-182:52-61. [PMID: 31445093 DOI: 10.1016/j.ymeth.2019.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/09/2019] [Accepted: 08/19/2019] [Indexed: 11/18/2022] Open
Abstract
Hi-C is a high-throughput chromosome conformation capture technology that is becoming routine in the literature. Although the price of sequencing has been dropping dramatically, high-resolution Hi-C data are not always an option for many studies, such as in single cells. However, the performance of current computational methods based on Hi-C at the ultra-sparse data condition has yet to be fully assessed. Therefore, in this paper, after briefly surveying the primary computational methods for Hi-C data analysis, we assess the performance of representative methods on data normalization, identification of compartments, Topologically Associating Domains (TADs) and chromatin loops under the condition of ultra-low resolution. We showed that most state-of-the-art methods do not work properly for that condition. Then, we applied the three best-performing methods on real single-cell Hi-C data, and their performance indicates that compartments may be a statistical feature emerging from the cell population, while TADs and chromatin loops may dynamically exist in single cells.
Collapse
Affiliation(s)
- Xiao Li
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Ziyang An
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; School of Life Science, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
49
|
Zhu H, Wang N, Sun JZ, Pandey RB, Wang Z. Inferring the three-dimensional structures of the X-chromosome during X-inactivation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2019; 16:7384-7404. [PMID: 31698618 PMCID: PMC7772933 DOI: 10.3934/mbe.2019369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Hi-C experiment can capture the genome-wide spatial proximities of the DNA, based on which it is possible to computationally reconstruct the three-dimensional (3D) structures of chromosomes. The transcripts of the long non-coding RNA (lncRNA) Xist spread throughout the entire X-chromosome and alter the 3D structure of the X-chromosome, which also inactivates one copy of the two X-chromosomes in a cell. The Hi-C experiments are expensive and time-consuming to conduct, but the Hi-C data of the active and inactive X-chromosomes are available. However, the Hi-C data of the X-chromosome during the process of X-chromosome inactivation (XCI) are not available. Therefore, the 3D structure of the X-chromosome during the process of X-chromosome inactivation (XCI) remains to be unknown. We have developed a new approach to reconstruct the 3D structure of the X-chromosome during XCI, in which the chain of DNA beads representing a chromosome is stored and simulated inside a 3D cubic lattice. A 2D Gaussian function is used to model the zero values in the 2D Hi-C contact matrices. By applying simulated annealing and Metropolis-Hastings simulations, we first generated the 3D structures of the X-chromosome before and after XCI. Then, we used Xist localization intensities on the X-chromosome (RAP data) to model the traveling speeds or acceleration between all bead pairs during the process of XCI. The 3D structures of the X-chromosome at 3 hours, 6 hours, and 24 hours after the start of the Xist expression, which initiates the XCI process, have been reconstructed. The source code and the reconstructed 3D structures of the X-chromosome can be downloaded from http://dna.cs.miami.edu/3D-XCI/.
Collapse
Affiliation(s)
- Hao Zhu
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA
| | - Jonathan Z. Sun
- Department of Computer Science, College of Charleston, Charleston, SC 29424, USA
| | - Ras B. Pandey
- Department of Physics and Astronomy, University of Southern Mississippi, 118 College Drive #5046, Hattiesburg, MS 39406, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
| |
Collapse
|
50
|
Yildirim A, Feig M. High-resolution 3D models of Caulobacter crescentus chromosome reveal genome structural variability and organization. Nucleic Acids Res 2019. [PMID: 29529244 PMCID: PMC5934669 DOI: 10.1093/nar/gky141] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-resolution three-dimensional models of Caulobacter crescentus nucleoid structures were generated via a multi-scale modeling protocol. Models were built as a plectonemically supercoiled circular DNA and by incorporating chromosome conformation capture based data to generate an ensemble of base pair resolution models consistent with the experimental data. Significant structural variability was found with different degrees of bending and twisting but with overall similar topologies and shapes that are consistent with C. crescentus cell dimensions. The models allowed a direct mapping of the genomic sequence onto the three-dimensional nucleoid structures. Distinct spatial distributions were found for several genomic elements such as AT-rich sequence elements where nucleoid associated proteins (NAPs) are likely to bind, promoter sites, and some genes with common cellular functions. These findings shed light on the correlation between the spatial organization of the genome and biological functions.
Collapse
Affiliation(s)
- Asli Yildirim
- Department of Chemistry, Michigan State University, East Lansing, MI 48824, USA
| | - Michael Feig
- Department of Biochemistry & Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|