1
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat Methods 2022; 19:938-949. [PMID: 35817938 PMCID: PMC9349046 DOI: 10.1038/s41592-022-01527-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 05/18/2022] [Indexed: 02/07/2023]
Abstract
A multitude of sequencing-based and microscopy technologies provide the means to unravel the relationship between the three-dimensional organization of genomes and key regulatory processes of genome function. Here, we develop a multimodal data integration approach to produce populations of single-cell genome structures that are highly predictive for nuclear locations of genes and nuclear bodies, local chromatin compaction and spatial segregation of functionally related chromatin. We demonstrate that multimodal data integration can compensate for systematic errors in some of the data and can greatly increase accuracy and coverage of genome structure models. We also show that alternative combinations of different orthogonal data sources can converge to models with similar predictive power. Moreover, our study reveals the key contributions of low-frequency (‘rare’) interchromosomal contacts to accurately predicting the global nuclear architecture, including the positioning of genes and chromosomes. Overall, our results highlight the benefits of multimodal data integration for genome structure analysis, available through the Integrative Genome Modeling software package. The Integrative Genome Modeling platform is a tool for population-based three-dimensional genome structure modeling and analysis by integrating various experimental data sources.
Collapse
|
3
|
Yildirim A, Boninsegna L, Zhan Y, Alber F. Uncovering the Principles of Genome Folding by 3D Chromatin Modeling. Cold Spring Harb Perspect Biol 2022; 14:a039693. [PMID: 34400556 PMCID: PMC9248826 DOI: 10.1101/cshperspect.a039693] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Our understanding of how genomic DNA is tightly packed inside the nucleus, yet is still accessible for vital cellular processes, has grown dramatically over recent years with advances in microscopy and genomics technologies. Computational methods have played a pivotal role in the structural interpretation of experimental data, which helped unravel some organizational principles of genome folding. Here, we give an overview of current computational efforts in mechanistic and data-driven 3D chromatin structure modeling. We discuss strengths and limitations of different methods and evaluate the added value and benefits of computational approaches to infer the 3D structural and dynamic properties of the genome and its underlying mechanisms at different scales and resolution, ranging from the dynamic formation of chromatin loops and topological associated domains to nuclear compartmentalization of chromatin and nuclear bodies.
Collapse
Affiliation(s)
- Asli Yildirim
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
4
|
MacKay K, Kusalik A. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data. Brief Funct Genomics 2021; 19:292-308. [PMID: 32353112 PMCID: PMC7388788 DOI: 10.1093/bfgp/elaa004] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/30/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022] Open
Abstract
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Collapse
|
5
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|
6
|
Bulathsinghalage C, Liu L. Network-based method for regions with statistically frequent interchromosomal interactions at single-cell resolution. BMC Bioinformatics 2020; 21:369. [PMID: 32998686 PMCID: PMC7526258 DOI: 10.1186/s12859-020-03689-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Chromosome conformation capture-based methods, especially Hi-C, enable scientists to detect genome-wide chromatin interactions and study the spatial organization of chromatin, which plays important roles in gene expression regulation, DNA replication and repair etc. Thus, developing computational methods to unravel patterns behind the data becomes critical. Existing computational methods focus on intrachromosomal interactions and ignore interchromosomal interactions partly because there is no prior knowledge for interchromosomal interactions and the frequency of interchromosomal interactions is much lower while the search space is much larger. With the development of single-cell technologies, the advent of single-cell Hi-C makes interrogating the spatial structure of chromatin at single-cell resolution possible. It also brings a new type of frequency information, the number of single cells with chromatin interactions between two disjoint chromosome regions. RESULTS Considering the lack of computational methods on interchromosomal interactions and the unsurprisingly frequent intrachromosomal interactions along the diagonal of a chromatin contact map, we propose a computational method dedicated to analyzing interchromosomal interactions of single-cell Hi-C with this new frequency information. To the best of our knowledge, our proposed tool is the first to identify regions with statistically frequent interchromosomal interactions at single-cell resolution. We demonstrate that the tool utilizing networks and binomial statistical tests can identify interesting structural regions through visualization, comparison and enrichment analysis and it also supports different configurations to provide users with flexibility. CONCLUSIONS It will be a useful tool for analyzing single-cell Hi-C interchromosomal interactions.
Collapse
Affiliation(s)
| | - Lu Liu
- North Dakota State University, 1340 Administration Ave, Fargo, 58102, USA.
| |
Collapse
|
7
|
Zhu H, Wang Z. SCL: a lattice-based approach to infer 3D chromosome structures from single-cell Hi-C data. Bioinformatics 2020; 35:3981-3988. [PMID: 30865261 PMCID: PMC6792089 DOI: 10.1093/bioinformatics/btz181] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/31/2019] [Accepted: 03/12/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In contrast to population-based Hi-C data, single-cell Hi-C data are zero-inflated and do not indicate the frequency of proximate DNA segments. There are a limited number of computational tools that can model the 3D structures of chromosomes based on single-cell Hi-C data. RESULTS We developed single-cell lattice (SCL), a computational method to reconstruct 3D structures of chromosomes based on single-cell Hi-C data. We designed a loss function and a 2 D Gaussian function specifically for the characteristics of single-cell Hi-C data. A chromosome is represented as beads-on-a-string and stored in a 3 D cubic lattice. Metropolis-Hastings simulation and simulated annealing are used to simulate the structure and minimize the loss function. We evaluated the SCL-inferred 3 D structures (at both 500 and 50 kb resolutions) using multiple criteria and compared them with the ones generated by another modeling software program. The results indicate that the 3 D structures generated by SCL closely fit single-cell Hi-C data. We also found similar patterns of trans-chromosomal contact beads, Lamin-B1 enriched topologically associating domains (TADs), and H3K4me3 enriched TADs by mapping data from previous studies onto the SCL-inferred 3 D structures. AVAILABILITY AND IMPLEMENTATION The C++ source code of SCL is freely available at http://dna.cs.miami.edu/SCL/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Zhu
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, Hattiesburg, MS, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, FL, USA
| |
Collapse
|
8
|
Li FZ, Liu ZE, Li XY, Bu LM, Bu HX, Liu H, Zhang CM. Chromatin 3D structure reconstruction with consideration of adjacency relationship among genomic loci. BMC Bioinformatics 2020; 21:272. [PMID: 32611376 PMCID: PMC7329537 DOI: 10.1186/s12859-020-03612-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Accepted: 06/18/2020] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Chromatin 3D conformation plays important roles in regulating gene or protein functions. High-throughout chromosome conformation capture (3C)-based technologies, such as Hi-C, have been exploited to acquire the contact frequencies among genomic loci at genome-scale. Various computational tools have been proposed to recover the underlying chromatin 3D structures from in situ Hi-C contact map data. As connected residuals in a polymer, neighboring genomic loci have intrinsic mutual dependencies in building a 3D conformation. However, current methods seldom take this feature into account. RESULTS We present a method called ShNeigh, which combines the classical MDS technique with local dependence of neighboring loci modeled by a Gaussian formula, to infer the best 3D structure from noisy and incomplete contact frequency matrices. We validated ShNeigh by comparing it to two typical distance-based algorithms, ShRec3D and ChromSDE. The comparison results on simulated Hi-C dataset showed that, while keeping the high-speed nature of classical MDS, ShNeigh can recover the true structure better than ShRec3D and ChromSDE. Meanwhile, ShNeigh is more robust to data noise. On the publicly available human GM06990 Hi-C data, we demonstrated that the structures reconstructed by ShNeigh are more reproducible between different restriction enzymes than by ShRec3D and ChromSDE, especially at high resolutions manifested by sparse contact maps, which means ShNeigh is more robust to signal coverage. CONCLUSIONS Our method can recover stable structures in high noise and sparse signal settings. It can also reconstruct similar structures from Hi-C data obtained using different restriction enzymes. Therefore, our method provides a new direction for enhancing the reconstruction quality of chromatin 3D structures.
Collapse
Affiliation(s)
- Fang-Zhen Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China. .,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China.
| | - Zhi-E Liu
- College of Physics and Electronic Engineering, Qilu Normal University, Jinan, China
| | - Xiu-Yuan Li
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Li-Mei Bu
- Department of Gastroenterology, Shanghai Pudong Hospital, Fudan University Pudong Medical Center, Shanghai, China
| | - Hong-Xia Bu
- Key Laboratory of Machine Learning and Financial Data Mining in Universities of Shandong, Jinan, China
| | - Hui Liu
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| | - Cai-Ming Zhang
- School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.,Digital Media Technology Key Lab of Shandong Province, Jinan, China
| |
Collapse
|
9
|
Stephenson N, Shane E, Chase J, Rowland J, Ries D, Justice N, Zhang J, Chan L, Cao R. Survey of Machine Learning Techniques in Drug Discovery. Curr Drug Metab 2019; 20:185-193. [DOI: 10.2174/1389200219666180820112457] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 01/01/2018] [Accepted: 03/19/2018] [Indexed: 12/19/2022]
Abstract
Background:Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery.Methods:We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery.Results:Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.Conclusion:The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.
Collapse
Affiliation(s)
- Natalie Stephenson
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Emily Shane
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jessica Chase
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jason Rowland
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - David Ries
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Nicola Justice
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Jie Zhang
- Key Laboratory of Hebei Province for Plant Physiology and Molecular Pathology, College of Life Sciences, Hebei Agricultural University, Baoding, China
| | - Leong Chan
- School of Business, Pacific Lutheran University, Tacoma, WA 98447, United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA 98447, United States
| |
Collapse
|
10
|
Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019; 87:1165-1178. [PMID: 30985027 PMCID: PMC6800999 DOI: 10.1002/prot.25697] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 04/04/2019] [Accepted: 04/12/2019] [Indexed: 12/28/2022]
Abstract
Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.
Collapse
Affiliation(s)
- Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, Washington
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
11
|
Oluwadare O, Highsmith M, Cheng J. An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data. Biol Proced Online 2019; 21:7. [PMID: 31049033 PMCID: PMC6482566 DOI: 10.1186/s12575-019-0094-0] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 04/01/2019] [Indexed: 01/08/2023] Open
Abstract
Over the past decade, methods for predicting three-dimensional (3-D) chromosome and genome structures have proliferated. This has been primarily due to the development of high-throughput, next-generation chromosome conformation capture (3C) technologies, which have provided next-generation sequencing data about chromosome conformations in order to map the 3-D genome structure. The introduction of the Hi-C technique-a variant of the 3C method-has allowed researchers to extract the interaction frequency (IF) for all loci of a genome at high-throughput and at a genome-wide scale. In this review we describe, categorize, and compare the various methods developed to map chromosome and genome structures from 3C data-particularly Hi-C data. We summarize the improvements introduced by these methods, describe the approach used for method evaluation, and discuss how these advancements shape the future of genome structure construction.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
12
|
Abstract
BACKGROUND Recent advances in genome analysis have established that chromatin has preferred 3D conformations, which bring distant loci into contact. Identifying these contacts is important for us to understand possible interactions between these loci. This has motivated the creation of the Hi-C technology, which detects long-range chromosomal interactions. Distance geometry-based algorithms, such as ChromSDE and ShRec3D, have been able to utilize Hi-C data to infer 3D chromosomal structures. However, these algorithms, being matrix-based, are space- and time-consuming on very large datasets. A human genome of 100 kilobase resolution would involve ∼30,000 loci, requiring gigabytes just in storing the matrices. RESULTS We propose a succinct representation of the distance matrices which tremendously reduces the space requirement. We give a complete solution, called SuperRec, for the inference of chromosomal structures from Hi-C data, through iterative solving the large-scale weighted multidimensional scaling problem. CONCLUSIONS SuperRec runs faster than earlier systems without compromising on result accuracy. The SuperRec package can be obtained from http://www.cs.cityu.edu.hk/~shuaicli/SuperRec .
Collapse
Affiliation(s)
- Yanlin Zhang
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Weiwei Liu
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| | - Yu Lin
- Research School of Computer Science, the Australian National University, Canberra, Australia
| | - Yen Kaow Ng
- Department of Computer Science, Faculty of Information and Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia
| | - Shuaicheng Li
- Department of Computer Science, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR
| |
Collapse
|
13
|
Hierarchical Reconstruction of High-Resolution 3D Models of Large Chromosomes. Sci Rep 2019; 9:4971. [PMID: 30899036 PMCID: PMC6428844 DOI: 10.1038/s41598-019-41369-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Accepted: 03/07/2019] [Indexed: 11/08/2022] Open
Abstract
Eukaryotic chromosomes are often composed of components organized into multiple scales, such as nucleosomes, chromatin fibers, topologically associated domains (TAD), chromosome compartments, and chromosome territories. Therefore, reconstructing detailed 3D models of chromosomes in high resolution is useful for advancing genome research. However, the task of constructing quality high-resolution 3D models is still challenging with existing methods. Hence, we designed a hierarchical algorithm, called Hierarchical3DGenome, to reconstruct 3D chromosome models at high resolution (<=5 Kilobase (KB)). The algorithm first reconstructs high-resolution 3D models at TAD level. The TAD models are then assembled to form complete high-resolution chromosomal models. The assembly of TAD models is guided by a complete low-resolution chromosome model. The algorithm is successfully used to reconstruct 3D chromosome models at 5 KB resolution for the human B-cell (GM12878). These high-resolution models satisfy Hi-C chromosomal contacts well and are consistent with models built at lower (i.e. 1 MB) resolution, and with the data of fluorescent in situ hybridization experiments. The Java source code of Hierarchical3DGenome and its user manual are available here https://github.com/BDM-Lab/Hierarchical3DGenome .
Collapse
|
14
|
Liu T, Porter J, Zhao C, Zhu H, Wang N, Sun Z, Mo YY, Wang Z. TADKB: Family classification and a knowledge base of topologically associating domains. BMC Genomics 2019; 20:217. [PMID: 30871473 PMCID: PMC6419456 DOI: 10.1186/s12864-019-5551-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Accepted: 02/21/2019] [Indexed: 01/01/2023] Open
Abstract
Background Topologically associating domains (TADs) are considered the structural and functional units of the genome. However, there is a lack of an integrated resource for TADs in the literature where researchers can obtain family classifications and detailed information about TADs. Results We built an online knowledge base TADKB integrating knowledge for TADs in eleven cell types of human and mouse. For each TAD, TADKB provides the predicted three-dimensional (3D) structures of chromosomes and TADs, and detailed annotations about the protein-coding genes and long non-coding RNAs (lncRNAs) existent in each TAD. Besides the 3D chromosomal structures inferred by population Hi-C, the single-cell haplotype-resolved chromosomal 3D structures of 17 GM12878 cells are also integrated in TADKB. A user can submit query gene/lncRNA ID/sequence to search for the TAD(s) that contain(s) the query gene or lncRNA. We also classified TADs into families. To achieve that, we used the TM-scores between reconstructed 3D structures of TADs as structural similarities and the Pearson’s correlation coefficients between the fold enrichment of chromatin states as functional similarities. All of the TADs in one cell type were clustered based on structural and functional similarities respectively using the spectral clustering algorithm with various predefined numbers of clusters. We have compared the overlapping TADs from structural and functional clusters and found that most of the TADs in the functional clusters with depleted chromatin states are clustered into one or two structural clusters. This novel finding indicates a connection between the 3D structures of TADs and their DNA functions in terms of chromatin states. Conclusion TADKB is available at http://dna.cs.miami.edu/TADKB/. Electronic supplementary material The online version of this article (10.1186/s12864-019-5551-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124-4245, USA
| | - Jacob Porter
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Chenguang Zhao
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Hao Zhu
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ, 07305, USA
| | - Zheng Sun
- Department of Electrical and Computer Engineering, California Baptist University, 3739 Adams Street, Riverside, CA, 92504, USA
| | - Yin-Yuan Mo
- Department of Pharmacology and Toxicology, University of Mississippi Medical Center, 2500 N State St, Jackson, MS, 39216, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124-4245, USA.
| |
Collapse
|
15
|
Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. BMC Bioinformatics 2018; 19:496. [PMID: 30591009 PMCID: PMC6309071 DOI: 10.1186/s12859-018-2464-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Hi-C data have been widely used to reconstruct chromosomal three-dimensional (3D) structures. One of the key limitations of Hi-C is the unclear relationship between spatial distance and the number of Hi-C contacts. Many methods used a fixed parameter when converting the number of Hi-C contacts to wish distances. However, a single parameter cannot properly explain the relationship between wish distances and genomic distances or the locations of topologically associating domains (TADs). RESULTS We have addressed one of the key issues of using Hi-C data, that is, the unclear relationship between spatial distances and the number of Hi-C contacts, which is crucial to understand significant biological functions, such as the enhancer-promoter interactions. Specifically, we developed a new method to infer this converting parameter and pairwise Euclidean distances based on the topology of the Hi-C complex network (HiCNet). The inferred distances were modeled by clustering coefficient and multiple other types of constraints. We found that our inferred distances between bead-pairs within the same TAD were apparently smaller than those distances between bead-pairs from different TADs. Our inferred distances had a higher correlation with fluorescence in situ hybridization (FISH) data, fitted the localization patterns of Xist transcripts on DNA, and better matched 156 pairs of protein-enabled long-range chromatin interactions detected by ChIA-PET. Using the inferred distances and another round of optimization, we further reconstructed 40 kb high-resolution 3D chromosomal structures of mouse male ES cells. The high-resolution structures successfully illustrate TADs and DNA loops (peaks in Hi-C contact heatmaps) that usually indicate enhancer-promoter interactions. CONCLUSIONS We developed a novel method to infer the wish distances between DNA bead-pairs from Hi-C contacts. High-resolution 3D structures of chromosomes were built based on the newly-inferred wish distances. This whole process has been implemented as a tool named HiCNet, which is publicly available at http://dna.cs.miami.edu/HiCNet/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA.
| |
Collapse
|
16
|
Varoquaux N. Unfolding the Genome: The Case Study of P. falciparum. Int J Biostat 2018; 15:ijb-2017-0061. [PMID: 29878883 DOI: 10.1515/ijb-2017-0061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 05/10/2018] [Indexed: 11/15/2022]
Abstract
The development of new ways to probe samples for the three-dimensional (3D) structure of DNA paves the way for in depth and systematic analyses of the genome architecture. 3C-like methods coupled with high-throughput sequencing can now assess physical interactions between pairs of loci in a genome-wide fashion, thus enabling the creation of genome-by-genome contact maps. The spreading of such protocols creates many new opportunities for methodological development: how can we infer 3D models from these contact maps? Can such models help us gain insights into biological processes? Several recent studies applied such protocols to P. falciparum (the deadliest of the five human malaria parasites), assessing its genome organization at different moments of its life cycle. With its small genomic size, fairly simple (yet changing) genomic organization during its lifecyle and strong correlation between chromatin folding and gene expression, this parasite is the ideal case study for applying and developing methods to infer 3D models and use them for downstream analysis. Here, I review a set of methods used to build and analyse three-dimensional models from contact maps data with a special highlight on P. falciparum's genome organization.
Collapse
Affiliation(s)
- Nelle Varoquaux
- Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, California, USA
- Berkeley Institute for Data Science, 190, Doe libraryBerkeley, United States of America
| |
Collapse
|
17
|
Abstract
Motivation Recent experiments have provided Hi-C data at resolution as high as 1 kbp. However, 3D structural inference from high-resolution Hi-C datasets is often computationally unfeasible using existing methods. Results We have developed miniMDS, an approximation of multidimensional scaling (MDS) that partitions a Hi-C dataset, performs high-resolution MDS separately on each partition, and then reassembles the partitions using low-resolution MDS. miniMDS is faster, more accurate, and uses less memory than existing methods for inferring the human genome at high resolution (10 kbp). Availability and implementation A Python implementation of miniMDS is available on GitHub: https://github.com/seqcode/miniMDS. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lila Rieber
- Department of Biochemistry and Molecular Biology and Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| | - Shaun Mahony
- Department of Biochemistry and Molecular Biology and Center for Eukaryotic Gene Regulation, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
18
|
Abstract
Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.
Collapse
|
19
|
Li J, Zhang W, Li X. 3D Genome Reconstruction with ShRec3D+ and Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:460-468. [PMID: 26955049 DOI: 10.1109/tcbb.2016.2535372] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Hi-C technology, a chromosome conformation capture (3C) based method, has been developed to capture genome-wide interactions at a given resolution. The next challenge is to reconstruct 3D structure of genome from the 3C-derived data computationally. Several existing methods have been proposed to obtain a consensus structure or ensemble structures. These methods can be categorized as probabilistic models or restraint-based models. In this paper, we propose a method, named ShRec3D+, to infer a consensus 3D structure of a genome from Hi-C data. The method is a two-step algorithm which is based on ChromSDE and ShRec3D methods. First, correct the conversion factor by golden section search for converting interaction frequency data to a distance weighted graph. Second, apply shortest-path algorithm and multi-dimensional scaling (MDS) algorithm to compute the 3D coordinates of a set of genomic loci from the distance graph. We validate ShRec3D+ accuracy on both simulation data and publicly Hi-C data. Our test results indicate that our method successfully corrects the parameter with a given resolution, is more accurate than ShRec3D, and is more efficient and robust than ChromSDE.
Collapse
|
20
|
Oluwadare O, Zhang Y, Cheng J. A maximum likelihood algorithm for reconstructing 3D structures of human chromosomes from chromosomal contact data. BMC Genomics 2018; 19:161. [PMID: 29471801 PMCID: PMC5824572 DOI: 10.1186/s12864-018-4546-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 02/13/2018] [Indexed: 01/07/2023] Open
Abstract
Background The development of chromosomal conformation capture techniques, particularly, the Hi-C technique, has made the analysis and study of the spatial conformation of a genome an important topic in bioinformatics and computational biology. Aided by high-throughput next generation sequencing techniques, the Hi-C technique can generate genome-wide, large-scale intra- and inter-chromosomal interaction data capable of describing in details the spatial interactions within a genome. These data can be used to reconstruct 3D structures of chromosomes that can be used to study DNA replication, gene regulation, genome interaction, genome folding, and genome function. Results Here, we introduce a maximum likelihood algorithm called 3DMax to construct the 3D structure of a chromosome from Hi-C data. 3DMax employs a maximum likelihood approach to infer the 3D structures of a chromosome, while automatically re-estimating the conversion factor (α) for converting Interaction Frequency (IF) to distance. Our results show that the models generated by 3DMax from a simulated Hi-C dataset match the true models better than most of the existing methods. 3DMax is more robust to structural variability and noise. Compared on a real Hi-C dataset, 3DMax constructs chromosomal models that fit the data better than most methods, and it is faster than all other methods. The models reconstructed by 3DMax were consistent with fluorescent in situ hybridization (FISH) experiments and existing knowledge about the organization of human chromosomes, such as chromosome compartmentalization. Conclusions 3DMax is an effective approach to reconstructing 3D chromosomal models. The results, and the models generated for the simulated and real Hi-C datasets are available here: http://sysbio.rnet.missouri.edu/bdm_download/3DMax/. The source code is available here: https://github.com/BDM-Lab/3DMax. A short video demonstrating how to use 3DMax can be found here: https://youtu.be/ehQUFWoHwfo.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Yuxiang Zhang
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
21
|
Gürsoy G, Xu Y, Kenter AL, Liang J. Computational construction of 3D chromatin ensembles and prediction of functional interactions of alpha-globin locus from 5C data. Nucleic Acids Res 2017; 45:11547-11558. [PMID: 28981716 PMCID: PMC5714131 DOI: 10.1093/nar/gkx784] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 08/30/2017] [Indexed: 01/23/2023] Open
Abstract
Conformation capture technologies measure frequencies of interactions between chromatin regions. However, understanding gene-regulation require knowledge of detailed spatial structures of heterogeneous chromatin in cells. Here we describe the nC-SAC (n-Constrained-Self Avoiding Chromatin) method that transforms experimental interaction frequencies into 3D ensembles of chromatin chains. nC-SAC first distinguishes specific from non-specific interaction frequencies, then generates 3D chromatin ensembles using identified specific interactions as spatial constraints. Application to α-globin locus shows that these constraints (∼20%) drive the formation of ∼99% all experimentally captured interactions, in which ∼30% additional to the imposed constraints is found to be specific. Many novel specific spatial contacts not captured by experiments are also predicted. A subset, of which independent ChIA-PET data are available, is validated to be RNAPII-, CTCF-, and RAD21-mediated. Their positioning in the architectural context of imposed specific interactions from nC-SAC is highly important. Our results also suggest the presence of a many-body structural unit involving α-globin gene, its enhancers, and POL3RK gene for regulating the expression of α-globin in silent cells.
Collapse
Affiliation(s)
- Gamze Gürsoy
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Yun Xu
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Amy L Kenter
- Department of Microbiology and Immunology, University of Illinois College of Medicine, Chicago, IL 60612, USA
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
22
|
The Role of Chromatin Density in Cell Population Heterogeneity during Stem Cell Differentiation. Sci Rep 2017; 7:13307. [PMID: 29042584 PMCID: PMC5645312 DOI: 10.1038/s41598-017-13731-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 09/27/2017] [Indexed: 11/20/2022] Open
Abstract
We incorporate three-dimensional (3D) conformation of chromosome (Hi-C) and single-cell RNA sequencing data together with discrete stochastic simulation, to explore the role of chromatin reorganization in determining gene expression heterogeneity during development. While previous research has emphasized the importance of chromatin architecture on activation and suppression of certain regulatory genes and gene networks, our study demonstrates how chromatin remodeling can dictate gene expression distribution by folding into distinct topological domains. We hypothesize that the local DNA density during differentiation accentuate transcriptional bursting due to the crowding effect of chromatin. This phenomenon yields a heterogeneous cell population, thereby increasing the potential of differentiation of the stem cells.
Collapse
|
23
|
Trieu T, Cheng J. 3D genome structure modeling by Lorentzian objective function. Nucleic Acids Res 2017; 45:1049-1058. [PMID: 28180292 PMCID: PMC5430849 DOI: 10.1093/nar/gkw1155] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Revised: 11/01/2016] [Accepted: 11/04/2016] [Indexed: 12/19/2022] Open
Abstract
The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.
Collapse
Affiliation(s)
- Tuan Trieu
- Computer Science Department, University of Missouri-Columbia, MO, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri-Columbia, MO, USA.,Informatics Institute, University of Missouri-Columbia, MO, USA
| |
Collapse
|
24
|
Li Q, Tjong H, Li X, Gong K, Zhou XJ, Chiolo I, Alber F. The three-dimensional genome organization of Drosophila melanogaster through data integration. Genome Biol 2017; 18:145. [PMID: 28760140 PMCID: PMC5576134 DOI: 10.1186/s13059-017-1264-5] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2016] [Accepted: 06/26/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. RESULTS Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. CONCLUSIONS Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.
Collapse
Affiliation(s)
- Qingjiao Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Harianto Tjong
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Xiao Li
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Ke Gong
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, USA
| | - Irene Chiolo
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
| | - Frank Alber
- Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, 1050 Childs Way, Los Angeles, CA, 90089, USA.
- Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
25
|
Gürsoy G, Xu Y, Liang J. Spatial organization of the budding yeast genome in the cell nucleus and identification of specific chromatin interactions from multi-chromosome constrained chromatin model. PLoS Comput Biol 2017; 13:e1005658. [PMID: 28704374 PMCID: PMC5531658 DOI: 10.1371/journal.pcbi.1005658] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Revised: 07/27/2017] [Accepted: 06/28/2017] [Indexed: 12/22/2022] Open
Abstract
Nuclear landmarks and biochemical factors play important roles in the organization of the yeast genome. The interaction pattern of budding yeast as measured from genome-wide 3C studies are largely recapitulated by model polymer genomes subject to landmark constraints. However, the origin of inter-chromosomal interactions, specific roles of individual landmarks, and the roles of biochemical factors in yeast genome organization remain unclear. Here we describe a multi-chromosome constrained self-avoiding chromatin model (mC-SAC) to gain understanding of the budding yeast genome organization. With significantly improved sampling of genome structures, both intra- and inter-chromosomal interaction patterns from genome-wide 3C studies are accurately captured in our model at higher resolution than previous studies. We show that nuclear confinement is a key determinant of the intra-chromosomal interactions, and centromere tethering is responsible for the inter-chromosomal interactions. In addition, important genomic elements such as fragile sites and tRNA genes are found to be clustered spatially, largely due to centromere tethering. We uncovered previously unknown interactions that were not captured by genome-wide 3C studies, which are found to be enriched with tRNA genes, RNAPIII and TFIIS binding. Moreover, we identified specific high-frequency genome-wide 3C interactions that are unaccounted for by polymer effects under landmark constraints. These interactions are enriched with important genes and likely play biological roles. The architecture of the cell nucleus and the spatial organization of the genome are important in determining nuclear functions. Single-cell imaging techniques and chromosome conformation capture (3C) based methods have provided a wealth of information on the spatial organization of chromosomes. Here we describe a multi-chromosome ensemble model of chromatin chains for understanding the folding principles of budding yeast genome. By overcoming severe challenges in sampling self-avoiding chromatin chains in nuclear confinement, we succeed in generating a large number of model genomes of budding yeast. Our model predicts chromatin interactions that have good correlation with experimental measurements. Our results showed that the spatial confinement of cell nucleus and excluded-volume effect are key determinants of the folding behavior of yeast chromosomes, and largely account for the observed intra-chromosomal interactions. Furthermore, we determined the specific roles of individual nuclear landmarks and biochemical factors, and our analysis showed that centromere tethering largely determines inter-chromosomal interactions. In addition, we were able to infer biological properties from the organization of modeled genomes. We found that the spatial locations of important elements such as fragile sites and tRNA genes are largely determined by the tethering of centromeres to the Spindle Pole Body. We further showed that many of these spatial locations can be predicted by using the genomic distances to the centromeres. Overall, our results revealed important insight into the organizational principles of the budding yeast genome and predicted a number of important biological findings that are fully experimentally testable.
Collapse
Affiliation(s)
- Gamze Gürsoy
- The Richard and Loan Hill Department of Bioengineering, Program in Bioinformatics, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Yun Xu
- The Richard and Loan Hill Department of Bioengineering, Program in Bioinformatics, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Jie Liang
- The Richard and Loan Hill Department of Bioengineering, Program in Bioinformatics, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
26
|
Shukron O, Holcman D. Transient chromatin properties revealed by polymer models and stochastic simulations constructed from Chromosomal Capture data. PLoS Comput Biol 2017; 13:e1005469. [PMID: 28369076 PMCID: PMC5393903 DOI: 10.1371/journal.pcbi.1005469] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Revised: 04/17/2017] [Accepted: 03/20/2017] [Indexed: 12/28/2022] Open
Abstract
Chromatin organization can be probed by Chromosomal Capture (5C) data, from which the encounter probability (EP) between genomic sites is presented in a large matrix. This matrix is averaged over a large cell population, revealing diagonal blocks called Topological Associating Domains (TADs) that represent a sub-chromatin organization. To study the relation between chromatin organization and gene regulation, we introduce a computational procedure to construct a bead-spring polymer model based on the EP matrix. The model permits exploring transient properties constrained by the statistics of the 5C data. To construct the polymer model, we proceed in two steps: first, we introduce a minimal number of random connectors inside restricted regions to account for diagonal blocks. Second, we account for long-range frequent specific genomic interactions. Using the constructed polymer, we compute the first encounter time distribution and the conditional probability of three key genomic sites. By simulating single particle trajectories of loci located on the constructed polymers from 5C data, we found a large variability of the anomalous exponent, used to interpret live cell imaging trajectories. The present polymer construction provides a generic tool to study steady-state and transient properties of chromatin constrained by some physical properties embedded in 5C data. Chromatin organization remains poorly understood and polymer models are used to reconstruct such organization, to reveal hidden structures and to quantify genomic interactions. We use a generalized Rouse model (a linear chain of beads connected by springs) with additional interacting molecules that allow stable loop formation. The polymer models are constructed using the minimal number of binding molecules, positioned according to the encounter probability matrix obtained from experimental chromosomal capture data. We determine the conditional encounter probability of 3 key loci regulating gene inactivation from our calibrated polymer model. Using polymer simulations, we generate single particle trajectories and explore their transient properties. The present results suggest that the heterogeneity of anomalous exponents measured in live cell imaging is due to the large combinatorics in reconstructing the chromatin organization from Chromosomal Capture data. The present method and algorithms are generic and can be used to reconstruct a polymer model at a given scale from any Chromosomal Capture data.
Collapse
Affiliation(s)
- Ofir Shukron
- Institute of Biology, Ecole Normale Supérieure, Paris, France
| | - David Holcman
- Institute of Biology, Ecole Normale Supérieure, Paris, France
- Mathematical Institute, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
27
|
Carstens S, Nilges M, Habeck M. Inferential Structure Determination of Chromosomes from Single-Cell Hi-C Data. PLoS Comput Biol 2016; 12:e1005292. [PMID: 28027298 PMCID: PMC5226817 DOI: 10.1371/journal.pcbi.1005292] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 01/11/2017] [Accepted: 12/07/2016] [Indexed: 11/18/2022] Open
Abstract
Chromosome conformation capture (3C) techniques have revealed many fascinating insights into the spatial organization of genomes. 3C methods typically provide information about chromosomal contacts in a large population of cells, which makes it difficult to draw conclusions about the three-dimensional organization of genomes in individual cells. Recently it became possible to study single cells with Hi-C, a genome-wide 3C variant, demonstrating a high cell-to-cell variability of genome organization. In principle, restraint-based modeling should allow us to infer the 3D structure of chromosomes from single-cell contact data, but suffers from the sparsity and low resolution of chromosomal contacts. To address these challenges, we adapt the Bayesian Inferential Structure Determination (ISD) framework, originally developed for NMR structure determination of proteins, to infer statistical ensembles of chromosome structures from single-cell data. Using ISD, we are able to compute structural error bars and estimate model parameters, thereby eliminating potential bias imposed by ad hoc parameter choices. We apply and compare different models for representing the chromatin fiber and for incorporating singe-cell contact information. Finally, we extend our approach to the analysis of diploid chromosome data.
Collapse
Affiliation(s)
- Simeon Carstens
- Unité de Bioinformatique Structurale, Department of Structural Biology and Chemistry, Institut Pasteur, Paris, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, Department of Structural Biology and Chemistry, Institut Pasteur, Paris, France
| | - Michael Habeck
- Statistical Inverse Problems in Biophysics, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
- Felix Bernstein Institute for Mathematical Statistics in the Biosciences, University of Göttingen, Göttingen, Germany
| |
Collapse
|
28
|
Computational inference of physical spatial organization of eukaryotic genomes. QUANTITATIVE BIOLOGY 2016. [DOI: 10.1007/s40484-016-0082-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
29
|
Adhikari B, Trieu T, Cheng J. Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing. BMC Genomics 2016; 17:886. [PMID: 27821047 PMCID: PMC5100196 DOI: 10.1186/s12864-016-3210-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2016] [Accepted: 10/25/2016] [Indexed: 12/03/2022] Open
Abstract
Background Reconstructing three-dimensional structures of chromosomes is useful for visualizing their shapes in a cell and interpreting their function. In this work, we reconstruct chromosomal structures from Hi-C data by translating contact counts in Hi-C data into Euclidean distances between chromosomal regions and then satisfying these distances using a structure reconstruction method rigorously tested in the field of protein structure determination. Results We first evaluate the robustness of the overall reconstruction algorithm on noisy simulated data at various levels of noise by comparing with some of the state-of-the-art reconstruction methods. Then, using simulated data, we validate that Spearman’s rank correlation coefficient between pairwise distances in the reconstructed chromosomal structures and the experimental chromosomal contact counts can be used to find optimum conversion rules for transforming interaction frequencies to wish distances. This strategy is then applied to real Hi-C data at chromosome level for optimal transformation of interaction frequencies to wish distances and for ranking and selecting structures. The chromosomal structures reconstructed from a real-world human Hi-C dataset by our method were validated by the known two-compartment feature of the human chromosome organization. We also show that our method is robust with respect to the change of the granularity of Hi-C data, and consistently produces similar structures at different chromosomal resolutions. Conclusion Chromosome3D is a robust method of reconstructing chromosome three-dimensional models using distance restraints obtained from Hi-C interaction frequency data. It is available as a web application and as an open source tool at http://sysbio.rnet.missouri.edu/chromosome3d/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3210-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Badri Adhikari
- Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Tuan Trieu
- Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
30
|
Szałaj P, Tang Z, Michalski P, Pietal MJ, Luo OJ, Sadowski M, Li X, Radew K, Ruan Y, Plewczynski D. An integrated 3-Dimensional Genome Modeling Engine for data-driven simulation of spatial genome organization. Genome Res 2016; 26:1697-1709. [PMID: 27789526 PMCID: PMC5131821 DOI: 10.1101/gr.205062.116] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 10/20/2016] [Indexed: 02/03/2023]
Abstract
ChIA-PET is a high-throughput mapping technology that reveals long-range chromatin interactions and provides insights into the basic principles of spatial genome organization and gene regulation mediated by specific protein factors. Recently, we showed that a single ChIA-PET experiment provides information at all genomic scales of interest, from the high-resolution locations of binding sites and enriched chromatin interactions mediated by specific protein factors, to the low resolution of nonenriched interactions that reflect topological neighborhoods of higher-order chromosome folding. This multilevel nature of ChIA-PET data offers an opportunity to use multiscale 3D models to study structural-functional relationships at multiple length scales, but doing so requires a structural modeling platform. Here, we report the development of 3D-GNOME (3-Dimensional Genome Modeling Engine), a complete computational pipeline for 3D simulation using ChIA-PET data. 3D-GNOME consists of three integrated components: a graph-distance-based heat map normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures.
Collapse
Affiliation(s)
- Przemysław Szałaj
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland.,Centre for Innovative Research, Medical University of Bialystok, 15-089 Białystok, Poland.,I-BioStat, Hasselt University, BE3590 Hasselt, Belgium
| | - Zhonghui Tang
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Paul Michalski
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Michal J Pietal
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Oscar J Luo
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Michał Sadowski
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Xingwang Li
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA
| | - Kamen Radew
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland
| | - Yijun Ruan
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut 06032, USA.,Department of Genetics and Genome Sciences, UConn Health, Farmington, Connecticut 06032, USA
| | - Dariusz Plewczynski
- Centre of New Technologies, Warsaw University, 02-097 Warsaw, Poland.,Centre for Innovative Research, Medical University of Bialystok, 15-089 Białystok, Poland.,Faculty of Pharmacy, Medical University of Warsaw, 02-097 Warsaw, Poland
| |
Collapse
|
31
|
Szalaj P, Michalski PJ, Wróblewski P, Tang Z, Kadlof M, Mazzocco G, Ruan Y, Plewczynski D. 3D-GNOME: an integrated web service for structural modeling of the 3D genome. Nucleic Acids Res 2016; 44:W288-93. [PMID: 27185892 PMCID: PMC4987952 DOI: 10.1093/nar/gkw437] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 05/07/2016] [Indexed: 11/13/2022] Open
Abstract
Recent advances in high-throughput chromosome conformation capture (3C) technology, such as Hi-C and ChIA-PET, have demonstrated the importance of 3D genome organization in development, cell differentiation and transcriptional regulation. There is now a widespread need for computational tools to generate and analyze 3D structural models from 3C data. Here we introduce our 3D GeNOme Modeling Engine (3D-GNOME), a web service which generates 3D structures from 3C data and provides tools to visually inspect and annotate the resulting structures, in addition to a variety of statistical plots and heatmaps which characterize the selected genomic region. Users submit a bedpe (paired-end BED format) file containing the locations and strengths of long range contact points, and 3D-GNOME simulates the structure and provides a convenient user interface for further analysis. Alternatively, a user may generate structures using published ChIA-PET data for the GM12878 cell line by simply specifying a genomic region of interest. 3D-GNOME is freely available at http://3dgnome.cent.uw.edu.pl/.
Collapse
Affiliation(s)
- Przemyslaw Szalaj
- Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland Center for Bioinformatics and Data Analysis, Medical University of Bialystok, 15-089 Bialystok, Poland I-BioStat, Hasselt University, 3500 Hasselt, Belgium
| | - Paul J Michalski
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | | | - Zhonghui Tang
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Michal Kadlof
- Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Giovanni Mazzocco
- Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Yijun Ruan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA Department of Genetics and Genome Sciences, UConn Health, Farmington, CT 06030-6403, USA
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland Center for Bioinformatics and Data Analysis, Medical University of Bialystok, 15-089 Bialystok, Poland Faculty of Pharmacy, Medical University of Warsaw, 02-097 Warsaw, Poland
| |
Collapse
|
32
|
Brackley CA, Johnson J, Kelly S, Cook PR, Marenduzzo D. Simulated binding of transcription factors to active and inactive regions folds human chromosomes into loops, rosettes and topological domains. Nucleic Acids Res 2016; 44:3503-12. [PMID: 27060145 PMCID: PMC4856988 DOI: 10.1093/nar/gkw135] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Revised: 02/22/2016] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
Biophysicists are modeling conformations of interphase chromosomes, often basing the strengths of interactions between segments distant on the genetic map on contact frequencies determined experimentally. Here, instead, we develop a fitting-free, minimal model: bivalent or multivalent red and green 'transcription factors' bind to cognate sites in strings of beads ('chromatin') to form molecular bridges stabilizing loops. In the absence of additional explicit forces, molecular dynamic simulations reveal that bound factors spontaneously cluster-red with red, green with green, but rarely red with green-to give structures reminiscent of transcription factories. Binding of just two transcription factors (or proteins) to active and inactive regions of human chromosomes yields rosettes, topological domains and contact maps much like those seen experimentally. This emergent 'bridging-induced attraction' proves to be a robust, simple and generic force able to organize interphase chromosomes at all scales.
Collapse
Affiliation(s)
- Chris A Brackley
- SUPA, School of Physics & Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK
| | - James Johnson
- SUPA, School of Physics & Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK
| | - Steven Kelly
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford OX1 3RB, UK
| | - Peter R Cook
- Sir William Dunn School of Pathology, University of Oxford, South Parks Road, Oxford, OX1 3RE, UK
| | - Davide Marenduzzo
- SUPA, School of Physics & Astronomy, University of Edinburgh, Peter Guthrie Tait Road, Edinburgh, EH9 3FD, UK
| |
Collapse
|
33
|
Zou C, Zhang Y, Ouyang Z. HSA: integrating multi-track Hi-C data for genome-scale reconstruction of 3D chromatin structure. Genome Biol 2016; 17:40. [PMID: 26936376 PMCID: PMC4774023 DOI: 10.1186/s13059-016-0896-1] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 02/10/2016] [Indexed: 11/24/2022] Open
Abstract
Genome-wide 3C technologies (Hi-C) are being increasingly employed to study three-dimensional (3D) genome conformations. Existing computational approaches are unable to integrate accumulating data to facilitate studying 3D chromatin structure and function. We present HSA (http://ouyanglab.jax.org/hsa/), a flexible tool that jointly analyzes multiple contact maps to infer 3D chromatin structure at the genome scale. HSA globally searches the latent structure underlying different cleavage footprints. Its robustness and accuracy outperform or rival existing tools on extensive simulations and orthogonal experiment validations. Applying HSA to recent in situ Hi-C data, we found the 3D chromatin structures are highly conserved across various human cell types.
Collapse
Affiliation(s)
- Chenchen Zou
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA.
| | - Yuping Zhang
- Department of Statistics, University of Connecticut, Storrs, 06269, CT, USA. .,Institute for Systems Genomics, University of Connecticut, Farmington, 06030, CT, USA. .,Institute for Collaboration on Health, Intervention, and Policy, University of Connecticut, Storrs, 06269, CT, USA. .,Center for Quantitative Medicine, University of Connecticut, Farmington, 06030, CT, USA. .,The Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, Storrs, 06269, CT, USA.
| | - Zhengqing Ouyang
- The Jackson Laboratory for Genomic Medicine, Farmington, 06032, CT, USA. .,Institute for Systems Genomics, University of Connecticut, Farmington, 06030, CT, USA. .,Department of Biomedical Engineering, University of Connecticut, Storrs, 06269, CT, USA. .,Department of Genetics and Genome Sciences, University of Connecticut, Farmington, 06030, CT, USA.
| |
Collapse
|
34
|
Nowotny J, Wells A, Oluwadare O, Xu L, Cao R, Trieu T, He C, Cheng J. GMOL: An Interactive Tool for 3D Genome Structure Visualization. Sci Rep 2016; 6:20802. [PMID: 26868282 PMCID: PMC4751627 DOI: 10.1038/srep20802] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 01/12/2016] [Indexed: 02/06/2023] Open
Abstract
It has been shown that genome spatial structures largely affect both genome activity and DNA function. Knowing this, many researchers are currently attempting to accurately model genome structures. Despite these increased efforts there still exists a shortage of tools dedicated to visualizing the genome. Creating a tool that can accurately visualize the genome can aid researchers by highlighting structural relationships that may not be obvious when examining the sequence information alone. Here we present a desktop application, known as GMOL, designed to effectively visualize genome structures so that researchers may better analyze genomic data. GMOL was developed based upon our multi-scale approach that allows a user to scale between six separate levels within the genome. With GMOL, a user can choose any unit at any scale and scale it up or down to visualize its structure and retrieve corresponding genome sequences. Users can also interactively manipulate and measure the whole genome structure and extract static images and machine-readable data files in PDB format from the multi-scale structure. By using GMOL researchers will be able to better understand and analyze genome structure models and the impact their structural relations have on genome activity and DNA function.
Collapse
Affiliation(s)
- Jackson Nowotny
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | - Avery Wells
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | | | - Lingfei Xu
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | - Renzhi Cao
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | - Tuan Trieu
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | - Chenfeng He
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, MO 65211, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA.,C.S. Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
35
|
Trieu T, Cheng J. MOGEN: a tool for reconstructing 3D models of genomes from chromosomal conformation capturing data. Bioinformatics 2015; 32:1286-92. [PMID: 26722115 DOI: 10.1093/bioinformatics/btv754] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2015] [Accepted: 12/19/2015] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION The three-dimensional (3D) conformation of chromosomes and genomes play an important role in cellular processes such as gene regulation, DNA replication and genome methylation. Several methods have been developed to reconstruct 3D structures of individual chromosomes from chromosomal conformation capturing data such as Hi-C data. However, few methods can effectively reconstruct the 3D structures of an entire genome due to the difficulty of handling noisy and inconsistent inter-chromosomal contact data. RESULTS We generalized a 3D chromosome reconstruction method to make it capable of reconstructing 3D models of genomes from both intra- and inter-chromosomal Hi-C contact data and implemented it as a software tool called MOGEN. We validated MOGEN on synthetic datasets of a polymer worm-like chain model and a yeast genome at first, and then applied it to generate an ensemble of 3D structural models of the genome of human B-cells from a Hi-C dataset. These genome models not only were validated by some known structural patterns of the human genome, such as chromosome compartmentalization, chromosome territories, co-localization of small chromosomes in the nucleus center with the exception of chromosome 18, enriched center-toward inter-chromosomal interactions between elongated or telomere regions of chromosomes, but also demonstrated the intrinsically dynamic orientations between chromosomes. Therefore, MOGEN is a useful tool for converting chromosomal contact data into 3D genome models to provide a better view into the spatial organization of genomes. AVAILABILITY AND IMPLEMENTATION The software of MOGEN is available at: http://calla.rnet.missouri.edu/mogen/ CONTACT : chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tuan Trieu
- Computer Science Department, University of Missouri, Columbia, MO 65201, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, MO 65201, USA
| |
Collapse
|
36
|
Xu Z, Zhang G, Jin F, Chen M, Furey TS, Sullivan PF, Qin Z, Hu M, Li Y. A hidden Markov random field-based Bayesian method for the detection of long-range chromosomal interactions in Hi-C data. Bioinformatics 2015; 32:650-6. [PMID: 26543175 DOI: 10.1093/bioinformatics/btv650] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 10/30/2015] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Advances in chromosome conformation capture and next-generation sequencing technologies are enabling genome-wide investigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies between pairs of loci by sequencing DNA segments ligated from loci in close spatial proximity. One essential task in such studies is peak calling, that is, detecting non-random interactions between loci from the two-dimensional contact frequency matrix. Successful fulfillment of this task has many important implications including identifying long-range interactions that assist interpreting a sizable fraction of the results from genome-wide association studies. The task - distinguishing biologically meaningful chromatin interactions from massive numbers of random interactions - poses great challenges both statistically and computationally. Model-based methods to address this challenge are still lacking. In particular, no statistical model exists that takes the underlying dependency structure into consideration. RESULTS In this paper, we propose a hidden Markov random field (HMRF) based Bayesian method to rigorously model interaction probabilities in the two-dimensional space based on the contact frequency matrix. By borrowing information from neighboring loci pairs, our method demonstrates superior reproducibility and statistical power in both simulation studies and real data analysis. AVAILABILITY AND IMPLEMENTATION The Source codes can be downloaded at: http://www.unc.edu/∼yunmli/HMRFBayesHiC CONTACT: ming.hu@nyumc.org or yunli@med.unc.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| | - Guosheng Zhang
- Department of Computer Science, Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Fulai Jin
- Department of Genetics, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44016
| | - Mengjie Chen
- Department of Biostatistics, Department of Genetics
| | | | - Patrick F Sullivan
- Department of Genetics, Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA and
| | - Ming Hu
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, Department of Computer Science
| |
Collapse
|
37
|
Nowotny J, Ahmed S, Xu L, Oluwadare O, Chen H, Hensley N, Trieu T, Cao R, Cheng J. Iterative reconstruction of three-dimensional models of human chromosomes from chromosomal contact data. BMC Bioinformatics 2015; 16:338. [PMID: 26493399 PMCID: PMC4619219 DOI: 10.1186/s12859-015-0772-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/13/2015] [Indexed: 11/10/2022] Open
Abstract
Background The entire collection of genetic information resides within the chromosomes, which themselves reside within almost every cell nucleus of eukaryotic organisms. Each individual chromosome is found to have its own preferred three-dimensional (3D) structure independent of the other chromosomes. The structure of each chromosome plays vital roles in controlling certain genome operations, including gene interaction and gene regulation. As a result, knowing the structure of chromosomes assists in the understanding of how the genome functions. Fortunately, the 3D structure of chromosomes proves possible to construct through computational methods via contact data recorded from the chromosome. We developed a unique computational approach based on optimization procedures known as adaptation, simulated annealing, and genetic algorithm to construct 3D models of human chromosomes, using chromosomal contact data. Results Our models were evaluated using a percentage-based scoring function. Analysis of the scores of the final 3D models demonstrated their effective construction from our computational approach. Specifically, the models resulting from our approach yielded an average score of 80.41 %, with a high of 91 %, across models for all chromosomes of a normal human B-cell. Comparisons made with other methods affirmed the effectiveness of our strategy. Particularly, juxtaposition with models generated through the publicly available method Markov chain Monte Carlo 5C (MCMC5C) illustrated the outperformance of our approach, as seen through a higher average score for all chromosomes. Our methodology was further validated using two consistency checking techniques known as convergence testing and robustness checking, which both proved successful. Conclusions The pursuit of constructing accurate 3D chromosomal structures is fueled by the benefits revealed by the findings as well as any possible future areas of study that arise. This motivation has led to the development of our computational methodology. The implementation of our approach proved effective in constructing 3D chromosome models and proved consistent with, and more effective than, some other methods thereby achieving our goal of creating a tool to help advance certain research efforts. The source code, test data, test results, and documentation of our method, Gen3D, are available at our sourceforge site at: http://sourceforge.net/projects/gen3d/.
Collapse
Affiliation(s)
- Jackson Nowotny
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Sharif Ahmed
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Lingfei Xu
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Oluwatosin Oluwadare
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Hannah Chen
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Noelan Hensley
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Tuan Trieu
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Renzhi Cao
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
38
|
Shavit Y, Merelli I, Milanesi L, Lio’ P. How computer science can help in understanding the 3D genome architecture. Brief Bioinform 2015; 17:733-44. [DOI: 10.1093/bib/bbv085] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Indexed: 01/20/2023] Open
|
39
|
Ay F, Noble WS. Analysis methods for studying the 3D architecture of the genome. Genome Biol 2015; 16:183. [PMID: 26328929 PMCID: PMC4556012 DOI: 10.1186/s13059-015-0745-7] [Citation(s) in RCA: 96] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Accepted: 08/10/2015] [Indexed: 11/10/2022] Open
Abstract
The rapidly increasing quantity of genome-wide chromosome conformation capture data presents great opportunities and challenges in the computational modeling and interpretation of the three-dimensional genome. In particular, with recent trends towards higher-resolution high-throughput chromosome conformation capture (Hi-C) data, the diversity and complexity of biological hypotheses that can be tested necessitates rigorous computational and statistical methods as well as scalable pipelines to interpret these datasets. Here we review computational tools to interpret Hi-C data, including pipelines for mapping, filtering, and normalization, and methods for confidence estimation, domain calling, visualization, and three-dimensional modeling.
Collapse
Affiliation(s)
- Ferhat Ay
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Feinberg School of Medicine, Northwestern University, Chicago, 60661, IL, USA.
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. .,Department of Computer Science and Engineering, University of Washington, Seattle, 98195, WA, USA.
| |
Collapse
|
40
|
Chromosome dynamics and folding in eukaryotes: Insights from live cell microscopy. FEBS Lett 2015; 589:3014-22. [PMID: 26188544 DOI: 10.1016/j.febslet.2015.07.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Revised: 07/08/2015] [Accepted: 07/11/2015] [Indexed: 11/24/2022]
Abstract
How chromosomes are folded and how this folding relates to function remain fundamental questions. Answering them is rendered difficult by the stochasticity of chromatin fiber motion which inevitably results in heterogeneity of the populations analyzed. Even if single cell analyses are beginning to yield precious insights, how can we determine whether a snapshot of position is related to function of the probed locus or cell-type? Fluorescence labeling of DNA at single or multiple loci allows determination of their position relative to nuclear landmarks and to each other, enabling us to derive physical parameters of the underlying chromatin fiber. Here I review the contribution of quantitative spatial and temporal analysis of labeled DNA to our understanding of chromosome conformation in different cell types, highlighting live cell imaging techniques and large scale geometrical analysis of multiple loci in 3D.
Collapse
|
41
|
Diament A, Tuller T. Improving 3D Genome Reconstructions Using Orthologous and Functional Constraints. PLoS Comput Biol 2015; 11:e1004298. [PMID: 26000633 PMCID: PMC4441392 DOI: 10.1371/journal.pcbi.1004298] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Accepted: 04/24/2015] [Indexed: 11/19/2022] Open
Abstract
The study of the 3D architecture of chromosomes has been advancing rapidly in recent years. While a number of methods for 3D reconstruction of genomic models based on Hi-C data were proposed, most of the analyses in the field have been performed on different 3D representation forms (such as graphs). Here, we reproduce most of the previous results on the 3D genomic organization of the eukaryote Saccharomyces cerevisiae using analysis of 3D reconstructions. We show that many of these results can be reproduced in sparse reconstructions, generated from a small fraction of the experimental data (5% of the data), and study the properties of such models. Finally, we propose for the first time a novel approach for improving the accuracy of 3D reconstructions by introducing additional predicted physical interactions to the model, based on orthologous interactions in an evolutionary-related organism and based on predicted functional interactions between genes. We demonstrate that this approach indeed leads to the reconstruction of improved models. Understanding the importance of genome architecture, the arrangement of genes within the genome and how this organization evolved has been intensively studied in recent years. Despite rapid progress in the field, accurate 3D modeling of genome organization remains a challenge. While a number of methods for 3D reconstruction of genomic models based on genome-wide experimental data were proposed, most of the analyses in the field have been performed on different 3D representation forms (such as graphs). Here, we reproduce most of the previous results on the 3D genome organization of the eukaryote Saccharomyces cerevisiae using analysis of 3D reconstructions. We show that many of these results can be reproduced in sparse reconstructions, generated from a small fraction of the experimental data (5% of the data), and study the properties of such models. Finally, we propose for the first time a novel approach for improving the accuracy of 3D reconstructions by introducing additional predicted physical interactions to the model, based on orthologous interactions in a different organism and based on predicted functional interactions between genes. Our proposed approach can facilitate future studies of 3D genome organization via improved models.
Collapse
Affiliation(s)
- Alon Diament
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
| | - Tamir Tuller
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv, Israel
- The Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| |
Collapse
|
42
|
Butyaev A, Mavlyutov R, Blanchette M, Cudré-Mauroux P, Waldispühl J. A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data. Nucleic Acids Res 2015; 43:e103. [PMID: 25990738 PMCID: PMC4652742 DOI: 10.1093/nar/gkv476] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Accepted: 04/29/2015] [Indexed: 01/19/2023] Open
Abstract
Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data (3DBG), and a 3D genome browser to visualize and explore 3D genome structures (3DGB). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/.
Collapse
|
43
|
Wang S, Xu J, Zeng J. Inferential modeling of 3D chromatin structure. Nucleic Acids Res 2015; 43:e54. [PMID: 25690896 PMCID: PMC4417147 DOI: 10.1093/nar/gkv100] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2014] [Revised: 10/11/2014] [Accepted: 01/30/2015] [Indexed: 01/01/2023] Open
Abstract
For eukaryotic cells, the biological processes involving regulatory DNA elements play an important role in cell cycle. Understanding 3D spatial arrangements of chromosomes and revealing long-range chromatin interactions are critical to decipher these biological processes. In recent years, chromosome conformation capture (3C) related techniques have been developed to measure the interaction frequencies between long-range genome loci, which have provided a great opportunity to decode the 3D organization of the genome. In this paper, we develop a new Bayesian framework to derive the 3D architecture of a chromosome from 3C-based data. By modeling each chromosome as a polymer chain, we define the conformational energy based on our current knowledge on polymer physics and use it as prior information in the Bayesian framework. We also propose an expectation-maximization (EM) based algorithm to estimate the unknown parameters of the Bayesian model and infer an ensemble of chromatin structures based on interaction frequency data. We have validated our Bayesian inference approach through cross-validation and verified the computed chromatin conformations using the geometric constraints derived from fluorescence in situ hybridization (FISH) experiments. We have further confirmed the inferred chromatin structures using the known genetic interactions derived from other studies in the literature. Our test results have indicated that our Bayesian framework can compute an accurate ensemble of 3D chromatin conformations that best interpret the distance constraints derived from 3C-based data and also agree with other sources of geometric constraints derived from experimental evidence in the previous studies. The source code of our approach can be found in https://github.com/wangsy11/InfMod3DGen.
Collapse
Affiliation(s)
- Siyu Wang
- Department of Automation, Tsinghua University, Beijing 100084, P.R. China
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, 6045 S Kenwood, IL 60637, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, P.R. China MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, P.R. China
| |
Collapse
|
44
|
Liang J, Cao Y, Gürsoy G, Naveed H, Terebus A, Zhao J. Multiscale Modeling of Cellular Epigenetic States: Stochasticity in Molecular Networks, Chromatin Folding in Cell Nuclei, and Tissue Pattern Formation of Cells. Crit Rev Biomed Eng 2015; 43:323-46. [PMID: 27480462 PMCID: PMC4976639 DOI: 10.1615/critrevbiomedeng.2016016559] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome sequences provide the overall genetic blueprint of cells, but cells possessing the same genome can exhibit diverse phenotypes. There is a multitude of mechanisms controlling cellular epigenetic states and that dictate the behavior of cells. Among these, networks of interacting molecules, often under stochastic control, depending on the specific wirings of molecular components and the physiological conditions, can have a different landscape of cellular states. In addition, chromosome folding in three-dimensional space provides another important control mechanism for selective activation and repression of gene expression. Fully differentiated cells with different properties grow, divide, and interact through mechanical forces and communicate through signal transduction, resulting in the formation of complex tissue patterns. Developing quantitative models to study these multi-scale phenomena and to identify opportunities for improving human health requires development of theoretical models, algorithms, and computational tools. Here we review recent progress made in these important directions.
Collapse
Affiliation(s)
- Jie Liang
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Youfang Cao
- Theoretical Biology and Biophysics (T-6) and Center for Nonlinear Studies (CNLS), Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
| | - Gamze Gürsoy
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Hammad Naveed
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637, USA
| | - Anna Terebus
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Jieling Zhao
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| |
Collapse
|
45
|
Shavit Y, Hamey FK, Lio P. FisHiCal: an R package for iterative FISH-based calibration of Hi-C data. Bioinformatics 2014; 30:3120-2. [PMID: 25061071 PMCID: PMC4609013 DOI: 10.1093/bioinformatics/btu491] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 06/26/2014] [Accepted: 07/16/2014] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED The fluorescence in situ hybridization (FISH) method has been providing valuable information on physical distances between loci (via image analysis) for several decades. Recently, high-throughput data on nearby chemical contacts between and within chromosomes became available with the Hi-C method. Here, we present FisHiCal, an R package for an iterative FISH-based Hi-C calibration that exploits in full the information coming from these methods. We describe here our calibration model and present 3D inference methods that we have developed for increasing its usability, namely, 3D reconstruction through local stress minimization and detection of spatial inconsistencies. We next confirm our calibration across three human cell lines and explain how the output of our methods could inform our model, defining an iterative calibration pipeline, with applications for quality assessment and meta-analysis. AVAILABILITY AND IMPLEMENTATION FisHiCal v1.1 is available from http://cran.r-project.org/.
Collapse
Affiliation(s)
- Yoli Shavit
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| | - Fiona Kathryn Hamey
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| | - Pietro Lio
- Computer Laboratory, University of Cambridge, Cambridge CB3 0FD and Cambridge Systems Biology Centre, University of Cambridge, Cambridge CB2 1GA, UK
| |
Collapse
|
46
|
Lesne A, Riposo J, Roger P, Cournac A, Mozziconacci J. 3D genome reconstruction from chromosomal contacts. Nat Methods 2014; 11:1141-3. [PMID: 25240436 DOI: 10.1038/nmeth.3104] [Citation(s) in RCA: 170] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 08/06/2014] [Indexed: 12/29/2022]
Abstract
A computational challenge raised by chromosome conformation capture (3C) experiments is to reconstruct spatial distances and three-dimensional genome structures from observed contacts between genomic loci. We propose a two-step algorithm, ShRec3D, and assess its accuracy using both in silico data and human genome-wide 3C (Hi-C) data. This algorithm avoids convergence issues, accommodates sparse and noisy contact maps, and is orders of magnitude faster than existing methods.
Collapse
Affiliation(s)
- Annick Lesne
- 1] Laboratoire de Physique Théorique de la Matière Condensée, CNRS UMR 7600, Université Pierre et Marie Curie, Sorbonne Universités, Paris, France. [2] Institut de Génétique Moléculaire de Montpellier, CNRS UMR 5535, Université de Montpellier, Montpellier, France
| | - Julien Riposo
- Laboratoire de Physique Théorique de la Matière Condensée, CNRS UMR 7600, Université Pierre et Marie Curie, Sorbonne Universités, Paris, France
| | - Paul Roger
- Laboratoire de Physique Théorique de la Matière Condensée, CNRS UMR 7600, Université Pierre et Marie Curie, Sorbonne Universités, Paris, France
| | - Axel Cournac
- Institut Pasteur, Group Spatial Regulation of Genomes, Department of Genomes and Genetics, Paris, France
| | - Julien Mozziconacci
- Laboratoire de Physique Théorique de la Matière Condensée, CNRS UMR 7600, Université Pierre et Marie Curie, Sorbonne Universités, Paris, France
| |
Collapse
|
47
|
Dios F, Barturen G, Lebrón R, Rueda A, Hackenberg M, Oliver JL. DNA clustering and genome complexity. Comput Biol Chem 2014; 53 Pt A:71-8. [PMID: 25182383 DOI: 10.1016/j.compbiolchem.2014.08.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 01/08/2023]
Abstract
Early global measures of genome complexity (power spectra, the analysis of fluctuations in DNA walks or compositional segmentation) uncovered a high degree of complexity in eukaryotic genome sequences. The main evolutionary mechanisms leading to increases in genome complexity (i.e. gene duplication and transposon proliferation) can all potentially produce increases in DNA clustering. To quantify such clustering and provide a genome-wide description of the formed clusters, we developed GenomeCluster, an algorithm able to detect clusters of whatever genome element identified by chromosome coordinates. We obtained a detailed description of clusters for ten categories of human genome elements, including functional (genes, exons, introns), regulatory (CpG islands, TFBSs, enhancers), variant (SNPs) and repeat (Alus, LINE1) elements, as well as DNase hypersensitivity sites. For each category, we located their clusters in the human genome, then quantifying cluster length and composition, and estimated the clustering level as the proportion of clustered genome elements. In average, we found a 27% of elements in clusters, although a considerable variation occurs among different categories. Genes form the lowest number of clusters, but these are the longest ones, both in bp and the average number of components, while the shortest clusters are formed by SNPs. Functional and regulatory elements (genes, CpG islands, TFBSs, enhancers) show the highest clustering level, as compared to DNase sites, repeats (Alus, LINE1) or SNPs. Many of the genome elements we analyzed are known to be composed of clusters of low-level entities. In addition, we found here that the clusters generated by GenomeCluster can be in turn clustered into high-level super-clusters. The observation of 'clusters-within-clusters' parallels the 'domains within domains' phenomenon previously detected through global statistical methods in eukaryotic sequences, and reveals a complex human genome landscape dominated by hierarchical clustering.
Collapse
Affiliation(s)
- Francisco Dios
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Guillermo Barturen
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Ricardo Lebrón
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - Antonio Rueda
- Plataforma Andaluza de Genómica y Bioinformática (GBPA), Edificio INSUR, Calle Albert Einstein, 41092 Sevilla, Spain
| | - Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain
| | - José L Oliver
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain; Lab. de Bioinformática, Inst. de Biotecnología, Centro de Investigación Biomédica, 18100 Granada, Spain.
| |
Collapse
|