1
|
Lainscsek X, Taher L. ENT3C: an entropy-based similarity measure for Hi-C and micro-C derived contact matrices. NAR Genom Bioinform 2024; 6:lqae076. [PMID: 38962256 PMCID: PMC11217677 DOI: 10.1093/nargab/lqae076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 06/05/2024] [Accepted: 06/27/2024] [Indexed: 07/05/2024] Open
Abstract
Hi-C and micro-C sequencing have shed light on the profound importance of 3D genome organization in cellular function by probing 3D contact frequencies across the linear genome. The resulting contact matrices are extremely sparse and susceptible to technical- and sequence-based biases, making their comparison challenging. The development of reliable, robust and efficient methods for quantifying similarity between contact matrices is crucial for investigating variations in the 3D genome organization in different cell types or under different conditions, as well as evaluating experimental reproducibility. We present a novel method, ENT3C, which measures the change in pattern complexity in the vicinity of contact matrix diagonals to quantify their similarity. ENT3C provides a robust, user-friendly Hi-C or micro-C contact matrix similarity metric and a characteristic entropy signal that can be used to gain detailed biological insights into 3D genome organization.
Collapse
Affiliation(s)
- Xenia Lainscsek
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
| | - Leila Taher
- Institute of Biomedical Informatics, Graz University of Technology, Graz, Austria
| |
Collapse
|
2
|
Sun Y, Xu X, Lin L, Xu K, Zheng Y, Ren C, Tao H, Wang X, Zhao H, Tu W, Bai X, Wang J, Huang Q, Li Y, Chen H, Li H, Bo X. A graph neural network-based interpretable framework reveals a novel DNA fragility-associated chromatin structural unit. Genome Biol 2023; 24:90. [PMID: 37095580 PMCID: PMC10124043 DOI: 10.1186/s13059-023-02916-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 03/22/2023] [Indexed: 04/26/2023] Open
Abstract
BACKGROUND DNA double-strand breaks (DSBs) are among the most deleterious DNA lesions, and they can cause cancer if improperly repaired. Recent chromosome conformation capture techniques, such as Hi-C, have enabled the identification of relationships between the 3D chromatin structure and DSBs, but little is known about how to explain these relationships, especially from global contact maps, or their contributions to DSB formation. RESULTS Here, we propose a framework that integrates graph neural network (GNN) to unravel the relationship between 3D chromatin structure and DSBs using an advanced interpretable technique GNNExplainer. We identify a new chromatin structural unit named the DNA fragility-associated chromatin interaction network (FaCIN). FaCIN is a bottleneck-like structure, and it helps to reveal a universal form of how the fragility of a piece of DNA might be affected by the whole genome through chromatin interactions. Moreover, we demonstrate that neck interactions in FaCIN can serve as chromatin structural determinants of DSB formation. CONCLUSIONS Our study provides a more systematic and refined view enabling a better understanding of the mechanisms of DSB formation under the context of the 3D genome.
Collapse
Affiliation(s)
- Yu Sun
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Xiang Xu
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Lin Lin
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Kang Xu
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Yang Zheng
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Chao Ren
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Huan Tao
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Xu Wang
- 4Paradigm Inc, Beijing, China
| | | | | | - Xuemei Bai
- The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, China
| | - Junting Wang
- The First Affiliated Hospital of Harbin Medical University, Harbin, 150001, China
| | - Qiya Huang
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yaru Li
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China
| | - Hebing Chen
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Hao Li
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| | - Xiaochen Bo
- Institute of Health Service and Transfusion Medicine, Beijing, 100850, China.
| |
Collapse
|
3
|
Guha S, Mitra MK. Multivalent binding proteins can drive collapse and reswelling of chromatin in confinement. SOFT MATTER 2022; 19:153-163. [PMID: 36484149 DOI: 10.1039/d2sm00612j] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Collapsed conformations of chromatin have been long suspected of being mediated by interactions with multivalent binding proteins, which can bring together distant sections of the chromatin fiber. In this study, we use Langevin dynamics simulation of a coarse grained chromatin polymer to show that the role of binding proteins can be more nuanced than previously suspected. In particular, for chromatin polymer in confinement, entropic forces can drive reswelling of collapsed chromatin with increasing binder concentrations, and this reswelling transition happens at physiologically relevant binder concentrations. Both the extent of collapse, and also of reswelling depends on the strength of confinement. We also study the kinetics of collapse and reswelling and show that both processes occur in similar timescales. We characterise this reswelling of chromatin in biologically relevant regimes and discuss the non-trivial role of multivalent binding proteins in mediating the spatial organisation of the genome.
Collapse
Affiliation(s)
- Sougata Guha
- Department of Physics, Indian Institute of Technology Bombay, Mumbai 400076, India.
| | - Mithun K Mitra
- Department of Physics, Indian Institute of Technology Bombay, Mumbai 400076, India.
| |
Collapse
|
4
|
Xu Z, Lee DS, Chandran S, Le VT, Bump R, Yasis J, Dallarda S, Marcotte S, Clock B, Haghani N, Cho CY, Akdemir K, Tyndale S, Futreal PA, McVicker G, Wahl GM, Dixon JR. Structural variants drive context-dependent oncogene activation in cancer. Nature 2022; 612:564-572. [PMID: 36477537 PMCID: PMC9810360 DOI: 10.1038/s41586-022-05504-4] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 11/01/2022] [Indexed: 12/12/2022]
Abstract
Higher-order chromatin structure is important for the regulation of genes by distal regulatory sequences1,2. Structural variants (SVs) that alter three-dimensional (3D) genome organization can lead to enhancer-promoter rewiring and human disease, particularly in the context of cancer3. However, only a small minority of SVs are associated with altered gene expression4,5, and it remains unclear why certain SVs lead to changes in distal gene expression and others do not. To address these questions, we used a combination of genomic profiling and genome engineering to identify sites of recurrent changes in 3D genome structure in cancer and determine the effects of specific rearrangements on oncogene activation. By analysing Hi-C data from 92 cancer cell lines and patient samples, we identified loci affected by recurrent alterations to 3D genome structure, including oncogenes such as MYC, TERT and CCND1. By using CRISPR-Cas9 genome engineering to generate de novo SVs, we show that oncogene activity can be predicted by using 'activity-by-contact' models that consider partner region chromatin contacts and enhancer activity. However, activity-by-contact models are only predictive of specific subsets of genes in the genome, suggesting that different classes of genes engage in distinct modes of regulation by distal regulatory elements. These results indicate that SVs that alter 3D genome organization are widespread in cancer genomes and begin to illustrate predictive rules for the consequences of SVs on oncogene activation.
Collapse
Affiliation(s)
- Zhichao Xu
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA,These authors contributed equally
| | - Dong-Sung Lee
- Department of Life Sciences, University of Seoul, Seoul, South Korea,These authors contributed equally
| | - Sahaana Chandran
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Victoria T. Le
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Rosalind Bump
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Jean Yasis
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Sofia Dallarda
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Samantha Marcotte
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Benjamin Clock
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Nicholas Haghani
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Chae Yun Cho
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Kadir Akdemir
- Department of Genomic Medicine; UT MD Anderson Cancer Center; Houston, TX, 77030; USA
| | - Selene Tyndale
- Integrative Biology Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - P. Andrew Futreal
- Department of Genomic Medicine; UT MD Anderson Cancer Center; Houston, TX, 77030; USA
| | - Graham McVicker
- Integrative Biology Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Geoffrey M. Wahl
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA
| | - Jesse R. Dixon
- Gene Expression Laboratory; Salk Institute for Biological Studies; La Jolla, CA, 92037; USA,Correspondence:
| |
Collapse
|
5
|
Zhao C, Liu T, Wang Z. Functional Similarities of Protein-Coding Genes in Topologically Associating Domains and Spatially-Proximate Genomic Regions. Genes (Basel) 2022; 13:genes13030480. [PMID: 35328034 PMCID: PMC8951421 DOI: 10.3390/genes13030480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2022] [Revised: 02/26/2022] [Accepted: 03/05/2022] [Indexed: 02/01/2023] Open
Abstract
Topologically associating domains (TADs) are the structural and functional units of the genome. However, the functions of protein-coding genes existing in the same or different TADs have not been fully investigated. We compared the functional similarities of protein-coding genes existing in the same TAD and between different TADs, and also in the same gap region (the region between two consecutive TADs) and between different gap regions. We found that the protein-coding genes from the same TAD or gap region are more likely to share similar protein functions, and this trend is more obvious with TADs than the gap regions. We further created two types of gene–gene spatial interaction networks: the first type is based on Hi-C contacts, whereas the second type is based on both Hi-C contacts and the relationship of being in the same TAD. A graph auto-encoder was applied to learn the network topology, reconstruct the two types of networks, and predict the functions of the central genes/nodes based on the functions of the neighboring genes/nodes. It was found that better performance was achieved with the second type of network. Furthermore, we detected long-range spatially-interactive regions based on Hi-C contacts and calculated the functional similarities of the gene pairs from these regions.
Collapse
|
6
|
Hunt C, Montgomery S, Berkenpas JW, Sigafoos N, Oakley JC, Espinosa J, Justice N, Kishaba K, Hippe K, Si D, Hou J, Ding H, Cao R. Recent Progress of Machine Learning in Gene Therapy. Curr Gene Ther 2021; 22:132-143. [PMID: 34161210 DOI: 10.2174/1566523221666210622164133] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/15/2021] [Accepted: 04/02/2021] [Indexed: 11/22/2022]
Abstract
With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, it is increasingly cheaper to do whole genome sequencing, resulting in rapid advancement in gene therapy and editing in precision medicine. Thus, understanding the current industry and academic applications of gene therapy provides an important backdrop to future scientific developments. Additionally, machine learning and artificial intelligence techniques allow for the reduction of time and money spent in the development of new gene therapy products and techniques. In this paper, we survey the current progress of gene therapy treatments for several diseases and explore machine learning applications in gene therapy. We also discuss the ethical implications of gene therapy and the use of machine learning in precision medicine. Machine learning and gene therapy are both topics gaining popularity in various publications, and we conclude that there is still room for continued research and application of machine learning techniques in the gene therapy field.
Collapse
Affiliation(s)
- Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Sandra Montgomery
- Department of Physics, Pacific Lutheran University, Tacoma, WA, United States
| | | | - Noel Sigafoos
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - John Christian Oakley
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Jacob Espinosa
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA, United States
| | - Nicola Justice
- Department of Mathematics, Pacific Lutheran University, Tacoma, WA, United States
| | - Kiyomi Kishaba
- Department of Humanities, Pacific Lutheran University, Tacoma, WA, United States
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| | - Dong Si
- Division of Computing Software Systems, University of Washington-Bothell, Bothell, WA, United States
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, United States
| |
Collapse
|
7
|
Li T, Li R, Dong X, Shi L, Lin M, Peng T, Wu P, Liu Y, Li X, He X, Han X, Kang B, Wang Y, Liu Z, Chen Q, Shen Y, Feng M, Wang X, Wu D, Wang J, Li C. Integrative Analysis of Genome, 3D Genome, and Transcriptome Alterations of Clinical Lung Cancer Samples. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:741-753. [PMID: 34116262 PMCID: PMC9170781 DOI: 10.1016/j.gpb.2020.05.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2019] [Revised: 03/28/2020] [Accepted: 06/11/2020] [Indexed: 10/31/2022]
Abstract
Genomic studies of cancer cell alterations, such as mutations, copy number variations (CNVs), and translocations, greatly promote our understanding of the genesis and development of cancer. However, the 3D genome architecture of cancers remains less studied due to the complexity of cancer genomes and technical difficulties. To explore the 3D genome structure in clinical lung cancer, we performed Hi-C experiments using paired normal and tumor cells harvested from patients with lung cancer, combining with RNA-seq analysis. We demonstrated the feasibility of studying 3D genome of clinical lung cancer samples with a small number of cells (1 × 104), compared the genome architecture between clinical samples and cell lines of lung cancer, and identified conserved and changed spatial chromatin structures between normal and cancer samples. We also showed that Hi-C data can be used to infer CNVs and point mutations in cancer. By integrating those different types of cancer alterations, we showed significant associations between CNVs, 3D genome, and gene expression. We propose that 3D genome mediates the effects of cancer genomic alterations on gene expression through altering regulatory chromatin structures. Our study highlights the importance of analyzing 3D genomes of clinical cancer samples in addition to cancer cell lines and provides an integrative genomic analysis pipeline for future larger-scale studies in lung cancer and other cancers.
Collapse
Affiliation(s)
- Tingting Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China; State Key Laboratory of Proteomics, National Center of Biomedical Analysis, Institute of Basic Medical Sciences, Beijing 100850, China
| | - Ruifeng Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Xuan Dong
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Lin Shi
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China; Fudan University Center for Clinical Bioinformatics, Shanghai 200433, China
| | - Miao Lin
- Department of Thoracic Surgery, Zhongshan Hospital of Fudan University, Shanghai 200032, China
| | - Ting Peng
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Pengze Wu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Yuting Liu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Xiaoting Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China; School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xuheng He
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Xu Han
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Bin Kang
- BGI-Shenzhen, Shenzhen 518083, China; China National GeneBank, BGI-Shenzhen, Shenzhen 518083, China
| | - Yinan Wang
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Zhiheng Liu
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Qing Chen
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China
| | - Yue Shen
- BGI-Shenzhen, Shenzhen 518083, China; BGI-Qingdao, Qingdao 266426, China; Shenzhen Engineering Laboratory for Innovative Molecular Diagnostics, BGI-Shenzhen, Shenzhen 518083, China
| | - Mingxiang Feng
- Department of Thoracic Surgery, Zhongshan Hospital of Fudan University, Shanghai 200032, China
| | - Xiangdong Wang
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China; Fudan University Center for Clinical Bioinformatics, Shanghai 200433, China
| | - Duojiao Wu
- Zhongshan Hospital Institute of Clinical Science, Fudan University, Shanghai Institute of Clinical Bioinformatics, Shanghai 200433, China.
| | - Jian Wang
- iCarbonX, Shenzhen 518053, China; Digital Life Research Institute, Shenzhen 518110, China.
| | - Cheng Li
- Center for Bioinformatics, School of Life Sciences, Center for Statistical Science, Peking University, Beijing 100871, China.
| |
Collapse
|
8
|
Liu L, Zhang LR, Dao FY, Yang YC, Lin H. A computational framework for identifying the transcription factors involved in enhancer-promoter loop formation. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 23:347-354. [PMID: 33425492 PMCID: PMC7779541 DOI: 10.1016/j.omtn.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 11/11/2020] [Indexed: 12/30/2022]
Abstract
The pairwise interaction between transcription factors (TFs) plays an important role in enhancer-promoter loop formation. Although thousands of TFs in the human genome have been found, only a few TF pairs have been demonstrated to be related to loop formation. It is still a challenge to determine which TF pairs could be involved in the enhancer-promoter regulation network. This work describes a computational framework to identify TF pairs in enhancer-promoter regulation. By integrating different levels of data derived from Promoter Capture Hi-C, chromatin immunoprecipitation sequencing (ChIP-seq) of histone marks, RNA-seq, protein-protein interaction (PPI), and TF motif, we identified 361 significant TF pairs and constructed a TF interaction network. From the network, we found several hub-TFs, which may have important roles in the regulation of long-range interactions. Our studies extended TF pairs identified in other experimental and computational approaches. These findings will help the further study of long-range interactions between enhancers and promoters.
Collapse
Affiliation(s)
- Li Liu
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Li-Rong Zhang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Fu-Ying Dao
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yan-Chao Yang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
9
|
Abstract
BACKGROUND The genome architecture mapping (GAM) technique can capture genome-wide chromatin interactions. However, besides the known systematic biases in the raw GAM data, we have found a new type of systematic bias. It is necessary to develop and evaluate effective normalization methods to remove all systematic biases in the raw GAM data. RESULTS We have detected a new type of systematic bias, the fragment length bias, in the genome architecture mapping (GAM) data, which is significantly different from the bias of window detection frequency previously mentioned in the paper introducing the GAM method but is similar to the bias of distances between restriction sites existing in raw Hi-C data. We have found that the normalization method (a normalized variant of the linkage disequilibrium) used in the GAM paper is not able to effectively eliminate the new fragment length bias at 1 Mb resolution (slightly better at 30 kb resolution). We have developed an R package named normGAM for eliminating the new fragment length bias together with the other three biases existing in raw GAM data, which are the biases related to window detection frequency, mappability, and GC content. Five normalization methods have been implemented and included in the R package including Knight-Ruiz 2-norm (KR2, newly designed by us), normalized linkage disequilibrium (NLD), vanilla coverage (VC), sequential component normalization (SCN), and iterative correction and eigenvector decomposition (ICE). CONCLUSIONS Based on our evaluations, the five normalization methods can eliminate the four biases existing in raw GAM data, with VC and KR2 performing better than the others. We have observed that the KR2-normalized GAM data have a higher correlation with the KR-normalized Hi-C data on the same cell samples indicating that the KR-related methods are better than the others for keeping the consistency between the GAM and Hi-C experiments. Compared with the raw GAM data, the normalized GAM data are more consistent with the normalized distances from the fluorescence in situ hybridization (FISH) experiments. The source code of normGAM can be freely downloaded from http://dna.cs.miami.edu/normGAM/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL, 33124, USA.
| |
Collapse
|
10
|
Abstract
BACKGROUND Topologically associating domains (TADs) are genomic regions with varying lengths. The interactions within TADs are more frequent than those between different TADs. TADs or sub-TADs are considered the structural and functional units of the mammalian genomes. Although TADs are important for understanding how genomes function, we have limited knowledge about their 3D structural properties. RESULTS In this study, we designed and benchmarked three metrics for capturing the three-dimensional and two-dimensional structural signatures of TADs, which can help better understand TADs' structural properties and the relationships between structural properties and genetic and epigenetic features. The first metric for capturing 3D structural properties is radius of gyration, which in this study is used to measure the spatial compactness of TADs. The mass value of each DNA bead in a 3D structure is novelly defined as one or more genetic or epigenetic feature(s). The second metric is folding degree. The last metric is exponent parameter, which is used to capture the 2D structural properties based on TADs' Hi-C contact matrices. In general, we observed significant correlations between the three metrics and the genetic and epigenetic features. We made the same observations when using H3K4me3, transcription start sites, and RNA polymerase II to represent the mass value in the modified radius-of-gyration metric. Moreover, we have found that the TADs in the clusters of depleted chromatin states apparently correspond to smaller exponent parameters and larger radius of gyrations. In addition, a new objective function of multidimensional scaling for modelling chromatin or TADs 3D structures was designed and benchmarked, which can handle the DNA bead-pairs with zero Hi-C contact values. CONCLUSIONS The web server for reconstructing chromatin 3D structures using multiple different objective functions and the related source code are publicly available at http://dna.cs.miami.edu/3DChrom/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, P.O. Box 248154, Coral Gables, FL 33124 USA
| |
Collapse
|
11
|
Perrakis A, Bita CE, Arhondakis S, Krokida A, Mekkaoui K, Denic D, Blazakis KN, Kaloudas D, Kalaitzis P. Suppression of a Prolyl 4 Hydroxylase Results in Delayed Abscission of Overripe Tomato Fruits. FRONTIERS IN PLANT SCIENCE 2019; 10:348. [PMID: 30984217 PMCID: PMC6447859 DOI: 10.3389/fpls.2019.00348] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/07/2019] [Indexed: 05/03/2023]
Abstract
The tomato pedicel abscission zone (AZ) is considered a model system for flower and fruit abscission development, activation, and progression. O-glycosylated proteins such as the Arabidopsis IDA (INFLORESCENCE DEFICIENT IN ABSCISSION) peptide and Arabinogalactan proteins (AGPs) which undergo proline hydroxylation were demonstrated to participate in abscission regulation. Considering that the frequency of occurrence of proline hydroxylation might determine the structure as well the function of such proteins, the expression of a tomato prolyl 4 hydroxylase, SlP4H3 (Solanum lycopersicum Prolyl 4 Hydroxylase 3) was suppressed in order to investigate the physiological significance of this post-translational modification in tomato abscission. Silencing of SlP4H3 resulted in the delay of abscission progression in overripe tomato fruits 90 days after the breaker stage. The cause of this delay was attributed to the downregulation of the expression of cell wall hydrolases such as SlTAPGs (tomato abscission polygalacturonases) and cellulases as well as expansins. In addition, minor changes were observed in the mRNA levels of two SlAGPs and one extensin. Moreover, structural changes were observed in the silenced SlP4H3AZs. The fracture plane of the AZ was curved and not along a line as in wild type and there was a lack of lignin deposition in the AZs of overripe fruits 30 days after breaker. These results suggest that proline hydroxylation might play a role in the regulation of tomato pedicel abscission.
Collapse
|
12
|
Liu T, Wang Z. Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks. BMC Bioinformatics 2018; 19:496. [PMID: 30591009 PMCID: PMC6309071 DOI: 10.1186/s12859-018-2464-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Hi-C data have been widely used to reconstruct chromosomal three-dimensional (3D) structures. One of the key limitations of Hi-C is the unclear relationship between spatial distance and the number of Hi-C contacts. Many methods used a fixed parameter when converting the number of Hi-C contacts to wish distances. However, a single parameter cannot properly explain the relationship between wish distances and genomic distances or the locations of topologically associating domains (TADs). RESULTS We have addressed one of the key issues of using Hi-C data, that is, the unclear relationship between spatial distances and the number of Hi-C contacts, which is crucial to understand significant biological functions, such as the enhancer-promoter interactions. Specifically, we developed a new method to infer this converting parameter and pairwise Euclidean distances based on the topology of the Hi-C complex network (HiCNet). The inferred distances were modeled by clustering coefficient and multiple other types of constraints. We found that our inferred distances between bead-pairs within the same TAD were apparently smaller than those distances between bead-pairs from different TADs. Our inferred distances had a higher correlation with fluorescence in situ hybridization (FISH) data, fitted the localization patterns of Xist transcripts on DNA, and better matched 156 pairs of protein-enabled long-range chromatin interactions detected by ChIA-PET. Using the inferred distances and another round of optimization, we further reconstructed 40 kb high-resolution 3D chromosomal structures of mouse male ES cells. The high-resolution structures successfully illustrate TADs and DNA loops (peaks in Hi-C contact heatmaps) that usually indicate enhancer-promoter interactions. CONCLUSIONS We developed a novel method to infer the wish distances between DNA bead-pairs from Hi-C contacts. High-resolution 3D structures of chromosomes were built based on the newly-inferred wish distances. This whole process has been implemented as a tool named HiCNet, which is publicly available at http://dna.cs.miami.edu/HiCNet/ .
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, FL, 33124, USA.
| |
Collapse
|
13
|
Dixon JR, Xu J, Dileep V, Zhan Y, Song F, Le VT, Yardımcı GG, Chakraborty A, Bann DV, Wang Y, Clark R, Zhang L, Yang H, Liu T, Iyyanki S, An L, Pool C, Sasaki T, Rivera-Mulia JC, Ozadam H, Lajoie BR, Kaul R, Buckley M, Lee K, Diegel M, Pezic D, Ernst C, Hadjur S, Odom DT, Stamatoyannopoulos JA, Broach JR, Hardison RC, Ay F, Noble WS, Dekker J, Gilbert DM, Yue F. Integrative detection and analysis of structural variation in cancer genomes. Nat Genet 2018; 50:1388-1398. [PMID: 30202056 PMCID: PMC6301019 DOI: 10.1038/s41588-018-0195-8] [Citation(s) in RCA: 217] [Impact Index Per Article: 36.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 07/16/2018] [Indexed: 01/19/2023]
Abstract
Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging. Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Furthermore, we observe widespread structural variation events affecting the functions of noncoding sequences, including the deletion of distal regulatory sequences, alteration of DNA replication timing, and the creation of novel three-dimensional chromatin structural domains. Our results indicate that noncoding SVs may be underappreciated mutational drivers in cancer genomes.
Collapse
Affiliation(s)
- Jesse R Dixon
- Salk Institute for Biological Studies, La Jolla, CA, USA.
| | - Jie Xu
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Vishnu Dileep
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Ye Zhan
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Fan Song
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, USA
| | - Victoria T Le
- Salk Institute for Biological Studies, La Jolla, CA, USA
| | | | | | - Darrin V Bann
- Division of Otolaryngology, Head & Neck Surgery, Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Yanli Wang
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, USA
| | - Royden Clark
- Penn State College of Medicine, Informatics and Technology, Hershey, PA, USA
| | - Lijun Zhang
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Hongbo Yang
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Tingting Liu
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Sriranga Iyyanki
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Lin An
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, USA
| | - Christopher Pool
- Division of Otolaryngology, Head & Neck Surgery, Milton S. Hershey Medical Center, Hershey, PA, USA
| | - Takayo Sasaki
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | | | - Hakan Ozadam
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Bryan R Lajoie
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Rajinder Kaul
- Altius institute for Biomedical Sciences, Seattle, WA, USA
| | | | - Kristen Lee
- Altius institute for Biomedical Sciences, Seattle, WA, USA
| | - Morgan Diegel
- Altius institute for Biomedical Sciences, Seattle, WA, USA
| | - Dubravka Pezic
- Research Department of Cancer Biology, Cancer Institute, University College London, London, UK
| | - Christina Ernst
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Suzana Hadjur
- Research Department of Cancer Biology, Cancer Institute, University College London, London, UK
| | - Duncan T Odom
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- German Cancer Research Center (DKFZ), Division Signaling and Functional Genomics, Heidelberg, Germany
| | | | - James R Broach
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA
| | - Ross C Hardison
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, State College, PA, USA
| | - Ferhat Ay
- La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA.
- School of Medicine, University of California San Diego, La Jolla, CA, USA.
| | | | - Job Dekker
- Program in Systems Biology, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA.
| | - Feng Yue
- Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, PA, USA.
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, USA.
| |
Collapse
|
14
|
Diament A, Tuller T. Modeling three-dimensional genomic organization in evolution and pathogenesis. Semin Cell Dev Biol 2018; 90:78-93. [PMID: 30030143 DOI: 10.1016/j.semcdb.2018.07.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Accepted: 07/08/2018] [Indexed: 12/17/2022]
Abstract
The regulation of gene expression is mediated via the complex three-dimensional (3D) conformation of the genetic material and its interactions with various intracellular factors. Various experimental and computational approaches have been developed in recent years for understating the relation between the 3D conformation of the genome and the phenotypes of cells in normal condition and diseases. In this review, we will discuss novel approaches for analyzing and modeling the 3D genomic conformation, focusing on deciphering disease-causing mutations that affect gene expression. We conclude that as this is a very challenging mission, an important direction should involve the comparative analysis of various 3D models from various organisms or cells.
Collapse
Affiliation(s)
- Alon Diament
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tamir Tuller
- Dept. of Biomedical Engineering, Tel Aviv University, Tel Aviv 6997801, Israel; The Sagol School of Neuroscience, Tel-Aviv University, Tel Aviv 6997801, Israel.
| |
Collapse
|
15
|
Oluwadare O, Zhang Y, Cheng J. A maximum likelihood algorithm for reconstructing 3D structures of human chromosomes from chromosomal contact data. BMC Genomics 2018; 19:161. [PMID: 29471801 PMCID: PMC5824572 DOI: 10.1186/s12864-018-4546-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 02/13/2018] [Indexed: 01/07/2023] Open
Abstract
Background The development of chromosomal conformation capture techniques, particularly, the Hi-C technique, has made the analysis and study of the spatial conformation of a genome an important topic in bioinformatics and computational biology. Aided by high-throughput next generation sequencing techniques, the Hi-C technique can generate genome-wide, large-scale intra- and inter-chromosomal interaction data capable of describing in details the spatial interactions within a genome. These data can be used to reconstruct 3D structures of chromosomes that can be used to study DNA replication, gene regulation, genome interaction, genome folding, and genome function. Results Here, we introduce a maximum likelihood algorithm called 3DMax to construct the 3D structure of a chromosome from Hi-C data. 3DMax employs a maximum likelihood approach to infer the 3D structures of a chromosome, while automatically re-estimating the conversion factor (α) for converting Interaction Frequency (IF) to distance. Our results show that the models generated by 3DMax from a simulated Hi-C dataset match the true models better than most of the existing methods. 3DMax is more robust to structural variability and noise. Compared on a real Hi-C dataset, 3DMax constructs chromosomal models that fit the data better than most methods, and it is faster than all other methods. The models reconstructed by 3DMax were consistent with fluorescent in situ hybridization (FISH) experiments and existing knowledge about the organization of human chromosomes, such as chromosome compartmentalization. Conclusions 3DMax is an effective approach to reconstructing 3D chromosomal models. The results, and the models generated for the simulated and real Hi-C datasets are available here: http://sysbio.rnet.missouri.edu/bdm_download/3DMax/. The source code is available here: https://github.com/BDM-Lab/3DMax. A short video demonstrating how to use 3DMax can be found here: https://youtu.be/ehQUFWoHwfo.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Yuxiang Zhang
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering & Computer Science Department, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
16
|
Jia R, Chai P, Zhang H, Fan X. Novel insights into chromosomal conformations in cancer. Mol Cancer 2017; 16:173. [PMID: 29149895 PMCID: PMC5693495 DOI: 10.1186/s12943-017-0741-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Accepted: 11/06/2017] [Indexed: 12/20/2022] Open
Abstract
Exploring gene function is critical for understanding the complexity of life. DNA sequences and the three-dimensional organization of chromatin (chromosomal interactions) are considered enigmatic factors underlying gene function, and interactions between two distant fragments can regulate transactivation activity via mediator proteins. Thus, a series of chromosome conformation capture techniques have been developed, including chromosome conformation capture (3C), circular chromosome conformation capture (4C), chromosome conformation capture carbon copy (5C), and high-resolution chromosome conformation capture (Hi-C). The application of these techniques has expanded to various fields, but cancer remains one of the major topics. Interactions mediated by proteins or long noncoding RNAs (lncRNAs) are typically found using 4C-sequencing and chromatin interaction analysis by paired-end tag sequencing (ChIA-PET). Currently, Hi-C is used to identify chromatin loops between cancer risk-associated single-nucleotide polymorphisms (SNPs) found by genome-wide association studies (GWAS) and their target genes. Chromosomal conformations are responsible for altered gene regulation through several typical mechanisms and contribute to the biological behavior and malignancy of different tumors, particularly prostate cancer, breast cancer and hematologic neoplasms. Moreover, different subtypes may exhibit different 3D-chromosomal conformations. Thus, C-tech can be used to help diagnose cancer subtypes and alleviate cancer progression by destroying specific chromosomal conformations. Here, we review the fundamentals and improvements in chromosome conformation capture techniques and their clinical applications in cancer to provide insight for future research.
Collapse
Affiliation(s)
- Ruobing Jia
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, People's Republic of China
| | - Peiwei Chai
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China.,Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, People's Republic of China
| | - He Zhang
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China. .,Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, People's Republic of China.
| | - Xianqun Fan
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China. .,Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, People's Republic of China.
| |
Collapse
|
17
|
Oluwadare O, Cheng J. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data. BMC Bioinformatics 2017; 18:480. [PMID: 29137603 PMCID: PMC5686814 DOI: 10.1186/s12859-017-1931-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Accepted: 11/06/2017] [Indexed: 11/10/2022] Open
Abstract
Background With the development of chromosomal conformation capturing techniques, particularly, the Hi-C technique, the study of the spatial conformation of a genome is becoming an important topic in bioinformatics and computational biology. The Hi-C technique can generate genome-wide chromosomal interaction (contact) data, which can be used to investigate the higher-level organization of chromosomes, such as Topologically Associated Domains (TAD), i.e., locally packed chromosome regions bounded together by intra chromosomal contacts. The identification of the TADs for a genome is useful for studying gene regulation, genomic interaction, and genome function. Results Here, we formulate the TAD identification problem as an unsupervised machine learning (clustering) problem, and develop a new TAD identification method called ClusterTAD. We introduce a novel method to represent chromosomal contacts as features to be used by the clustering algorithm. Our results show that ClusterTAD can accurately predict the TADs on a simulated Hi-C data. Our method is also largely complementary and consistent with existing methods on the real Hi-C datasets of two mouse cells. The validation with the chromatin immunoprecipitation (ChIP) sequencing (ChIP-Seq) data shows that the domain boundaries identified by ClusterTAD have a high enrichment of CTCF binding sites, promoter-related marks, and enhancer-related histone modifications. Conclusions As ClusterTAD is based on a proven clustering approach, it opens a new avenue to apply a large array of clustering methods developed in the machine learning field to the TAD identification problem. The source code, the results, and the TADs generated for the simulated and real Hi-C datasets are available here: https://github.com/BDM-Lab/ClusterTAD.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
18
|
Flyamer IM, Gassler J, Imakaev M, Brandão HB, Ulianov SV, Abdennur N, Razin SV, Mirny LA, Tachibana-Konwalski K. Single-nucleus Hi-C reveals unique chromatin reorganization at oocyte-to-zygote transition. Nature 2017; 544:110-114. [PMID: 28355183 PMCID: PMC5639698 DOI: 10.1038/nature21711] [Citation(s) in RCA: 486] [Impact Index Per Article: 69.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 02/14/2017] [Indexed: 12/15/2022]
Abstract
Chromatin is reprogrammed after fertilization to produce a totipotent zygote with the potential to generate a new organism. The maternal genome inherited from the oocyte and the paternal genome provided by sperm coexist as separate haploid nuclei in the zygote. How these two epigenetically distinct genomes are spatially organized is poorly understood. Existing chromosome conformation capture-based methods are not applicable to oocytes and zygotes owing to a paucity of material. To study three-dimensional chromatin organization in rare cell types, we developed a single-nucleus Hi-C (high-resolution chromosome conformation capture) protocol that provides greater than tenfold more contacts per cell than the previous method. Here we show that chromatin architecture is uniquely reorganized during the oocyte-to-zygote transition in mice and is distinct in paternal and maternal nuclei within single-cell zygotes. Features of genomic organization including compartments, topologically associating domains (TADs) and loops are present in individual oocytes when averaged over the genome, but the presence of each feature at a locus varies between cells. At the sub-megabase level, we observed stochastic clusters of contacts that can occur across TAD boundaries but average into TADs. Notably, we found that TADs and loops, but not compartments, are present in zygotic maternal chromatin, suggesting that these are generated by different mechanisms. Our results demonstrate that the global chromatin organization of zygote nuclei is fundamentally different from that of other interphase cells. An understanding of this zygotic chromatin 'ground state' could potentially provide insights into reprogramming cells to a state of totipotency.
Collapse
Affiliation(s)
- Ilya M. Flyamer
- IMBA - Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna Biocenter (VBC), Vienna, Austria
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
- Present address: MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom
| | - Johanna Gassler
- IMBA - Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna Biocenter (VBC), Vienna, Austria
| | - Maxim Imakaev
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
- Department of Physics, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
| | - Hugo B. Brandão
- Harvard Program in Biophysics, Harvard University, Cambridge, Massachusetts, USA
| | - Sergey V. Ulianov
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Nezar Abdennur
- Computational and Systems Biology Program, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
| | - Sergey V. Razin
- Institute of Gene Biology, Russian Academy of Sciences, Moscow, Russia
- Faculty of Biology, Lomonosov Moscow State University, Moscow, Russia
| | - Leonid A. Mirny
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
- Department of Physics, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
- Harvard Program in Biophysics, Harvard University, Cambridge, Massachusetts, USA
| | - Kikuë Tachibana-Konwalski
- IMBA - Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Vienna Biocenter (VBC), Vienna, Austria
| |
Collapse
|
19
|
Cagnone G, Sirard MA. The embryonic stress response to in vitro culture: insight from genomic analysis. Reproduction 2016; 152:R247-R261. [DOI: 10.1530/rep-16-0391] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 09/05/2016] [Indexed: 12/18/2022]
Abstract
Recent genomic studies have shed light on the impact of in vitro culture (IVC) on embryonic homeostasis and the differential gene expression profiles associated with lower developmental competence. Consistently, the embryonic stress responses to IVC conditions correlate with transcriptomic changes in pathways related to energetic metabolism, extracellular matrix remodelling and inflammatory signalling. These changes appear to result from a developmental adaptation that enhances a Warburg-like effect known to occur naturally during blastulation. First discovered in cancer cells, the Warburg effect (increased glycolysis under aerobic conditions) is thought to result from mitochondrial dysfunction. In the case of IVC embryos, culture conditions may interfere with mitochondrial maturation and oxidative phosphorylation, forcing cells to rely on glycolysis in order to maintain energetic homeostasis. While beneficial in the short term, such adaptations may lead to epigenetic changes with potential long-term effects on implantation, foetal growth and post-natal health. We conclude that lessening the detrimental effects of IVC on mitochondrial activity would lead to significantly improved embryo quality.
Collapse
|
20
|
Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks. Sci Rep 2016; 6:19598. [PMID: 26797014 PMCID: PMC4726425 DOI: 10.1038/srep19598] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 12/14/2015] [Indexed: 11/09/2022] Open
Abstract
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Collapse
|
21
|
Cao R, Cheng J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 2016; 93:84-91. [PMID: 26370280 PMCID: PMC4894840 DOI: 10.1016/j.ymeth.2015.09.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 09/03/2015] [Accepted: 09/10/2015] [Indexed: 11/30/2022] Open
Abstract
MOTIVATIONS Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. RESULTS In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure.
Collapse
Affiliation(s)
- Renzhi Cao
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
22
|
Cao R, Cheng J. Deciphering the association between gene function and spatial gene-gene interactions in 3D human genome conformation. BMC Genomics 2015; 16:880. [PMID: 26511362 PMCID: PMC4625479 DOI: 10.1186/s12864-015-2093-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Accepted: 10/15/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND A number of factors have been investigated in the context of gene function prediction and analysis, such as sequence identity, gene expressions, and gene co-evolution. However, three-dimensional (3D) conformation of the genome has not been tapped to analyse gene function, probably largely due to lack of genome conformation data until recently. METHODS We construct the genome-wide spatial gene-gene interaction networks for three different human B-cells or cell lines from their chromosomal contact data generated by the Hi-C chromosome conformation capturing technique. The G-SESAME and Fast-SemSim are used to calculate function similarity between interacted / non-interacted genes. The Gene Ontology statistics computed from the gene-gene interaction networks is used for gene function prediction. RESULTS We compare the function similarity of gene pairs that do not spatially interact and that have interactions. We find that genes that have strong spatial interactions tend to have highly similar function in terms of biological process, molecular function and cellular component of the Gene Ontology. And even though the level of gene-gene interactions generally have no or weak correlation with either sequential genomic distance or sequence identity between genes, the interacted genes with high function similarity tend to have stronger interactions, somewhat shorter genomic distance and significantly higher sequence identity. And combining genomic distance or sequence identity with spatial gene-gene interaction information informs gene-gene function similarity much better than using either one of them alone, suggesting gene-gene interaction information is largely complementary with genomic distance and sequence identity in the context of gene function analysis. We develop and evaluate a new gene function prediction method based on gene-gene interacting networks, which can predict gene function well for a large number of human genes. CONCLUSIONS In this work, we demonstrate that the spatial conformation of the human genome is relevant to gene function similarity and is useful for gene function prediction.
Collapse
Affiliation(s)
- Renzhi Cao
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA.
| | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA. .,Christopher S. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA.
| |
Collapse
|
23
|
Nowotny J, Ahmed S, Xu L, Oluwadare O, Chen H, Hensley N, Trieu T, Cao R, Cheng J. Iterative reconstruction of three-dimensional models of human chromosomes from chromosomal contact data. BMC Bioinformatics 2015; 16:338. [PMID: 26493399 PMCID: PMC4619219 DOI: 10.1186/s12859-015-0772-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/13/2015] [Indexed: 11/10/2022] Open
Abstract
Background The entire collection of genetic information resides within the chromosomes, which themselves reside within almost every cell nucleus of eukaryotic organisms. Each individual chromosome is found to have its own preferred three-dimensional (3D) structure independent of the other chromosomes. The structure of each chromosome plays vital roles in controlling certain genome operations, including gene interaction and gene regulation. As a result, knowing the structure of chromosomes assists in the understanding of how the genome functions. Fortunately, the 3D structure of chromosomes proves possible to construct through computational methods via contact data recorded from the chromosome. We developed a unique computational approach based on optimization procedures known as adaptation, simulated annealing, and genetic algorithm to construct 3D models of human chromosomes, using chromosomal contact data. Results Our models were evaluated using a percentage-based scoring function. Analysis of the scores of the final 3D models demonstrated their effective construction from our computational approach. Specifically, the models resulting from our approach yielded an average score of 80.41 %, with a high of 91 %, across models for all chromosomes of a normal human B-cell. Comparisons made with other methods affirmed the effectiveness of our strategy. Particularly, juxtaposition with models generated through the publicly available method Markov chain Monte Carlo 5C (MCMC5C) illustrated the outperformance of our approach, as seen through a higher average score for all chromosomes. Our methodology was further validated using two consistency checking techniques known as convergence testing and robustness checking, which both proved successful. Conclusions The pursuit of constructing accurate 3D chromosomal structures is fueled by the benefits revealed by the findings as well as any possible future areas of study that arise. This motivation has led to the development of our computational methodology. The implementation of our approach proved effective in constructing 3D chromosome models and proved consistent with, and more effective than, some other methods thereby achieving our goal of creating a tool to help advance certain research efforts. The source code, test data, test results, and documentation of our method, Gen3D, are available at our sourceforge site at: http://sourceforge.net/projects/gen3d/.
Collapse
Affiliation(s)
- Jackson Nowotny
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Sharif Ahmed
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Lingfei Xu
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Oluwatosin Oluwadare
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Hannah Chen
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Noelan Hensley
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Tuan Trieu
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Renzhi Cao
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
24
|
Bergeron KF, Cardinal T, Touré AM, Béland M, Raiwet DL, Silversides DW, Pilon N. Male-biased aganglionic megacolon in the TashT mouse line due to perturbation of silencer elements in a large gene desert of chromosome 10. PLoS Genet 2015; 11:e1005093. [PMID: 25786024 PMCID: PMC4364714 DOI: 10.1371/journal.pgen.1005093] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2014] [Accepted: 02/23/2015] [Indexed: 01/13/2023] Open
Abstract
Neural crest cells (NCC) are a transient migratory cell population that generates diverse cell types such as neurons and glia of the enteric nervous system (ENS). Via an insertional mutation screen for loci affecting NCC development in mice, we identified one line—named TashT—that displays a partially penetrant aganglionic megacolon phenotype in a strong male-biased manner. Interestingly, this phenotype is highly reminiscent of human Hirschsprung’s disease, a neurocristopathy with a still unexplained male sex bias. In contrast to the megacolon phenotype, colonic aganglionosis is almost fully penetrant in homozygous TashT animals. The sex bias in megacolon expressivity can be explained by the fact that the male ENS ends, on average, around a “tipping point” of minimal colonic ganglionosis while the female ENS ends, on average, just beyond it. Detailed analysis of embryonic intestines revealed that aganglionosis in homozygous TashT animals is due to slower migration of enteric NCC. The TashT insertional mutation is localized in a gene desert containing multiple highly conserved elements that exhibit repressive activity in reporter assays. RNAseq analyses and 3C assays revealed that the TashT insertion results, at least in part, in NCC-specific relief of repression of the uncharacterized gene Fam162b; an outcome independently confirmed via transient transgenesis. The transcriptional signature of enteric NCC from homozygous TashT embryos is also characterized by the deregulation of genes encoding members of the most important signaling pathways for ENS formation—Gdnf/Ret and Edn3/Ednrb—and, intriguingly, the downregulation of specific subsets of X-linked genes. In conclusion, this study not only allowed the identification of Fam162b coding and regulatory sequences as novel candidate loci for Hirschsprung’s disease but also provides important new insights into its male sex bias. Hirschsprung’s disease (also known as aganglionic megacolon) is a severe congenital defect of the enteric nervous system (ENS) resulting in complete failure to pass stools. It is characterized by the absence of neural ganglia (aganglionosis) in the distal gut due to incomplete colonization of the embryonic intestines by neural crest cells (NCC), the ENS precursors. Hirschsprung’s disease has an incidence of 1 in 5000 newborns and a 4:1 male sex bias. Although many genes have been associated with this complex genetic disease, most of its heritability as well as its male sex bias remain unexplained. Here, we describe an insertional mutant mouse line (“TashT”) in which virtually all homozygotes display colonic aganglionosis due to defective migration of enteric NCC, but in which only a subset of homozygotes develops megacolon. Surprisingly, this group is almost exclusively male. The TashT ENS defect stems, at least in part, from the disruption of long-range interactions between evolutionarily conserved elements with silencer activity and Fam162b, resulting in NCC-specific upregulation of this uncharacterized protein coding gene. Global analysis of gene expression further revealed that several hundreds of genes are significantly deregulated in TashT enteric NCC. Interestingly, this dataset includes multiple X-linked candidate genes potentially underlying the male sex bias. Taken together, our data pave the way for a clearer understanding of the intriguing male sex bias of Hirschsprung’s disease.
Collapse
Affiliation(s)
- Karl-F. Bergeron
- Molecular Genetics of Development Laboratory, Department of Biological Sciences and BioMed Research Center, University of Quebec at Montreal (UQAM), Quebec, Canada
| | - Tatiana Cardinal
- Molecular Genetics of Development Laboratory, Department of Biological Sciences and BioMed Research Center, University of Quebec at Montreal (UQAM), Quebec, Canada
| | - Aboubacrine M. Touré
- Molecular Genetics of Development Laboratory, Department of Biological Sciences and BioMed Research Center, University of Quebec at Montreal (UQAM), Quebec, Canada
| | - Mélanie Béland
- Molecular Genetics of Development Laboratory, Department of Biological Sciences and BioMed Research Center, University of Quebec at Montreal (UQAM), Quebec, Canada
| | - Diana L. Raiwet
- Veterinary Genetics Laboratory, Faculty of Veterinary Medicine, University of Montreal, Quebec, Canada
| | - David W. Silversides
- Veterinary Genetics Laboratory, Faculty of Veterinary Medicine, University of Montreal, Quebec, Canada
| | - Nicolas Pilon
- Molecular Genetics of Development Laboratory, Department of Biological Sciences and BioMed Research Center, University of Quebec at Montreal (UQAM), Quebec, Canada
- * E-mail:
| |
Collapse
|
25
|
Merelli I, Tordini F, Drocco M, Aldinucci M, Liò P, Milanesi L. Integrating multi-omic features exploiting Chromosome Conformation Capture data. Front Genet 2015; 6:40. [PMID: 25717338 PMCID: PMC4324155 DOI: 10.3389/fgene.2015.00040] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 01/27/2015] [Indexed: 02/02/2023] Open
Abstract
The representation, integration, and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture allows the analysis of the chromosome organization in the cell’s natural state. While performed genome wide, this technique is usually called Hi–C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi–C data to describe the chromosomal neighborhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.
Collapse
Affiliation(s)
- Ivan Merelli
- Bioinformatics Unit, Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| | - Fabio Tordini
- Computer Science Department, University of Torino Torino, Italy
| | - Maurizio Drocco
- Computer Science Department, University of Torino Torino, Italy
| | - Marco Aldinucci
- Computer Science Department, University of Torino Torino, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge Cambridge, UK
| | - Luciano Milanesi
- Bioinformatics Unit, Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| |
Collapse
|
26
|
Trieu T, Cheng J. Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data. Nucleic Acids Res 2014; 42:e52. [PMID: 24465004 PMCID: PMC3985632 DOI: 10.1093/nar/gkt1411] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Chromosomes are not positioned randomly within a nucleus, but instead, they adopt preferred spatial conformations to facilitate necessary long-range gene–gene interactions and regulations. Thus, obtaining the 3D shape of chromosomes of a genome is critical for understanding how the genome folds, functions and how its genes interact and are regulated. Here, we describe a method to reconstruct preferred 3D structures of individual chromosomes of the human genome from chromosomal contact data generated by the Hi-C chromosome conformation capturing technique. A novel parameterized objective function was designed for modeling chromosome structures, which was optimized by a gradient descent method to generate chromosomal structural models that could satisfy as many intra-chromosomal contacts as possible. We applied the objective function and the corresponding optimization method to two Hi-C chromosomal data sets of both a healthy and a cancerous human B-cell to construct 3D models of individual chromosomes at resolutions of 1 MB and 200 KB, respectively. The parameters used with the method were calibrated according to an independent fluorescence in situ hybridization experimental data. The structural models generated by our method could satisfy a high percentage of contacts (pairs of loci in interaction) and non-contacts (pairs of loci not in interaction) and were compatible with the known two-compartment organization of human chromatin structures. Furthermore, structural models generated at different resolutions and from randomly permuted data sets were consistent.
Collapse
Affiliation(s)
- Tuan Trieu
- Computer Science Department, University of Missouri-Columbia, MO 65211, USA, Informatics Institute, University of Missouri-Columbia, MO 65211, USA and C. Bond Life Science Center, University of Missouri-Columbia, MO 65211, USA
| | | |
Collapse
|
27
|
Hoang SA, Bekiranov S. The network architecture of the Saccharomyces cerevisiae genome. PLoS One 2013; 8:e81972. [PMID: 24349163 PMCID: PMC3857230 DOI: 10.1371/journal.pone.0081972] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2013] [Accepted: 10/18/2013] [Indexed: 11/19/2022] Open
Abstract
We propose a network-based approach for surmising the spatial organization of genomes from high-throughput interaction data. Our strategy is based on methods for inferring architectural features of networks. Specifically, we employ a community detection algorithm to partition networks of genomic interactions. These community partitions represent an intuitive interpretation of genomic organization from interaction data. Furthermore, they are able to recapitulate known aspects of the spatial organization of the Saccharomyces cerevisiae genome, such as the rosette conformation of the genome, the clustering of centromeres, as well as tRNAs, and telomeres. We also demonstrate that simple architectural features of genomic interaction networks, such as cliques, can give meaningful insight into the functional role of the spatial organization of the genome. We show that there is a correlation between inter-chromosomal clique size and replication timing, as well as cohesin enrichment. Together, our network-based approach represents an effective and intuitive framework for interpreting high-throughput genomic interaction data. Importantly, there is a great potential for this strategy, given the rich literature and extensive set of existing tools in the field of network analysis.
Collapse
Affiliation(s)
- Stephen A. Hoang
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| | - Stefan Bekiranov
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia, United States of America
| |
Collapse
|
28
|
Hu M, Deng K, Qin Z, Liu JS. Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data. QUANTITATIVE BIOLOGY 2013; 1:156-174. [PMID: 26124977 DOI: 10.1007/s40484-013-0016-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research.
Collapse
Affiliation(s)
- Ming Hu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Ke Deng
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA ; Mathematical Sciences Center, Tsinghua University, Beijing 100084, China
| | - Zhaohui Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|