1
|
Banerjee A, Zhang S, Bahar I. Genome structural dynamics: insights from Gaussian network analysis of Hi-C data. Brief Funct Genomics 2024:elae014. [PMID: 38654598 DOI: 10.1093/bfgp/elae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/11/2024] [Accepted: 04/02/2024] [Indexed: 04/26/2024] Open
Abstract
Characterization of the spatiotemporal properties of the chromatin is essential to gaining insights into the physical bases of gene co-expression, transcriptional regulation and epigenetic modifications. The Gaussian network model (GNM) has proven in recent work to serve as a useful tool for modeling chromatin structural dynamics, using as input high-throughput chromosome conformation capture data. We focus here on the exploration of the collective dynamics of chromosomal structures at hierarchical levels of resolution, from single gene loci to topologically associating domains or entire chromosomes. The GNM permits us to identify long-range interactions between gene loci, shedding light on the role of cross-correlations between distal regions of the chromosomes in regulating gene expression. Notably, GNM analysis performed across diverse cell lines highlights the conservation of the global/cooperative movements of the chromatin across different types of cells. Variations driven by localized couplings between genomic loci, on the other hand, underlie cell differentiation, underscoring the significance of the four-dimensional properties of the genome in defining cellular identity. Finally, we demonstrate the close relation between the cell type-dependent mobility profiles of gene loci and their gene expression patterns, providing a clear demonstration of the role of chromosomal 4D features in defining cell-specific differential expression of genes.
Collapse
Affiliation(s)
- Anupam Banerjee
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
| | - She Zhang
- OpenEye, Cadence Molecular Sciences, Santa Fe, NM 87508, USA
| | - Ivet Bahar
- Laufer Center for Physical & Quantitative Biology, Stony Brook University, NY 11794, USA
- Department of Biochemistry and Cell Biology, Renaissance School of Medicine, Stony Brook University, NY 11794, USA
| |
Collapse
|
2
|
Wang L, Li LL, Chen L, Zhang RG, Zhao SW, Yan H, Gao J, Chen X, Si YJ, Chen Z, Liu H, Xie XM, Zhao W, Han B, Qin X, Jia KH. Telomere-to-telomere and haplotype-resolved genome assembly of the Chinese cork oak ( Quercus variabilis). FRONTIERS IN PLANT SCIENCE 2023; 14:1290913. [PMID: 38023918 PMCID: PMC10652414 DOI: 10.3389/fpls.2023.1290913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023]
Abstract
The Quercus variabilis, a deciduous broadleaved tree species, holds significant ecological and economical value. While a chromosome-level genome for this species has been made available, it remains riddled with unanchored sequences and gaps. In this study, we present a nearly complete comprehensive telomere-to-telomere (T2T) and haplotype-resolved reference genome for Q. variabilis. This was achieved through the integration of ONT ultra-long reads, PacBio HiFi long reads, and Hi-C data. The resultant two haplotype genomes measure 789 Mb and 768 Mb in length, with a contig N50 of 65 Mb and 56 Mb, and were anchored to 12 allelic chromosomes. Within this T2T haplotype-resolved assembly, we predicted 36,830 and 36,370 protein-coding genes, with 95.9% and 96.0% functional annotation for each haplotype genome. The availability of the T2T and haplotype-resolved reference genome lays a solid foundation, not only for illustrating genome structure and functional genomics studies but also to inform and facilitate genetic breeding and improvement of cultivated Quercus species.
Collapse
Affiliation(s)
- Longxin Wang
- School of Biological Science and Technology, University of Jinan, Jinan, China
| | - Lei-Lei Li
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Li Chen
- Shandong Saienfu Stem Cell Engineering Group Co., Ltd, Jinan, China
| | - Ren-Gang Zhang
- Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan, China
| | - Shi-Wei Zhao
- Department of Plant Physiology, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Han Yan
- The Second Affiliated Hospital of Shandong First Medical University, Taian, China
| | - Jie Gao
- Chinese Academy of Sciences (CAS), Key Laboratory of Tropical Forest Ecology, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Menglun, China
| | - Xue Chen
- Weifang Academy of Agricultural Sciences, Weifang, China
| | - Yu-Jun Si
- Weifang Academy of Agricultural Sciences, Weifang, China
| | - Zhe Chen
- InvoGenomics Biotechnology Co., Ltd., Jinan, China
| | - Haibo Liu
- Jinan Academy of Landscape and Forestry Science, Jinan, China
| | - Xiao-Man Xie
- Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China
| | - Wei Zhao
- Department of Ecology and Environmental Science, Umeå Plant Science Centre, Umeå University, Umeå, Sweden
| | - Biao Han
- Key Laboratory of State Forestry and Grassland Administration Conservation and Utilization of Warm Temperate Zone Forest and Grass Germplasm Resources, Shandong Provincial Center of Forest and Grass Germplasm Resources, Jinan, China
| | - Xiaochun Qin
- School of Biological Science and Technology, University of Jinan, Jinan, China
| | - Kai-Hua Jia
- Key Laboratory of Crop Genetic Improvement & Ecology and Physiology, Institute of Crop Germplasm Resources, Shandong Academy of Agricultural Sciences, Jinan, China
| |
Collapse
|
3
|
Wang X, Gu WC, Li J, Ma BG. EVRC: reconstruction of chromosome 3D structure models using error-vector resultant algorithm with clustering coefficient. BIOINFORMATICS (OXFORD, ENGLAND) 2023; 39:btad638. [PMID: 37847746 DOI: 10.1093/bioinformatics/btad638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 09/28/2023] [Accepted: 10/16/2023] [Indexed: 10/19/2023]
Abstract
MOTIVATION Reconstruction of 3D structure models is of great importance for the study of chromosome function. Software tools for this task are highly needed. RESULTS We present a novel reconstruction algorithm, called EVRC, which utilizes co-clustering coefficients and error-vector resultant for chromosome 3D structure reconstruction. As an update of our previous EVR algorithm, EVRC now can deal with both single and multiple chromosomes in structure modeling. To evaluate the effectiveness and accuracy of the EVRC algorithm, we applied it to simulation datasets and real Hi-C datasets. The results show that the reconstructed structures have high similarity to the original/real structures, indicating the effectiveness and robustness of the EVRC algorithm. Furthermore, we applied the algorithm to the 3D conformation reconstruction of the wild-type and mutant Arabidopsis thaliana chromosomes and demonstrated the differences in structural characteristics between different chromosomes. We also accurately showed the conformational change in the centromere region of the mutant compared with the wild-type of Arabidopsis chromosome 1. Our EVRC algorithm is a valuable software tool for the field of chromatin structure reconstruction, and holds great promise for advancing our understanding on the chromosome functions. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/mbglab/EVRC.
Collapse
Affiliation(s)
- Xiao Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wei-Cheng Gu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Jie Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
4
|
Zeng W, Liu Q, Yin Q, Jiang R, Wong WH. HiChIPdb: a comprehensive database of HiChIP regulatory interactions. Nucleic Acids Res 2022; 51:D159-D166. [PMID: 36215037 PMCID: PMC9825415 DOI: 10.1093/nar/gkac859] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 09/19/2022] [Accepted: 09/27/2022] [Indexed: 01/29/2023] Open
Abstract
Elucidating the role of 3D architecture of DNA in gene regulation is crucial for understanding cell differentiation, tissue homeostasis and disease development. Among various chromatin conformation capture methods, HiChIP has received increasing attention for its significant improvement over other methods in profiling of regulatory (e.g. H3K27ac) and structural (e.g. cohesin) interactions. To facilitate the studies of 3D regulatory interactions, we developed a HiChIP interactions database, HiChIPdb (http://health.tsinghua.edu.cn/hichipdb/). The current version of HiChIPdb contains ∼262M annotated HiChIP interactions from 200 high-throughput HiChIP samples across 108 cell types. The functionalities of HiChIPdb include: (i) standardized categorization of HiChIP interactions in a hierarchical structure based on organ, tissue and cell line and (ii) comprehensive annotations of HiChIP interactions with regulatory genes and GWAS Catalog SNPs. To the best of our knowledge, HiChIPdb is the first comprehensive database that utilizes a unified pipeline to map the functional interactions across diverse cell types and tissues in different resolutions. We believe this database has the potential to advance cutting-edge research in regulatory mechanisms in development and disease by removing the barrier in data aggregation, preprocessing, and analysis.
Collapse
Affiliation(s)
| | | | | | - Rui Jiang
- Correspondence may also be addressed to Rui Jiang. Tel: +86 10 6279 5578;
| | | |
Collapse
|
5
|
Collins B, Oluwadare O, Brown P. ChromeBat: A Bio-Inspired Approach to 3D Genome Reconstruction. Genes (Basel) 2021; 12:1757. [PMID: 34828363 PMCID: PMC8617892 DOI: 10.3390/genes12111757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 10/28/2021] [Accepted: 11/01/2021] [Indexed: 11/20/2022] Open
Abstract
With the advent of Next Generation Sequencing and the Hi-C experiment, high quality genome-wide contact data are becoming increasingly available. These data represents an empirical measure of how a genome interacts inside the nucleus. Genome conformation is of particular interest as it has been experimentally shown to be a driving force for many genomic functions from regulation to transcription. Thus, the Three Dimensional-Genome Reconstruction Problem (3D-GRP) seeks to take Hi-C data and produces a complete physical genome structure as it appears in the nucleus for genomic analysis. We propose and develop a novel method to solve the Chromosome and Genome Reconstruction problem based on the Bat Algorithm (BA) which we called ChromeBat. We demonstrate on real Hi-C data that ChromeBat is capable of state-of-the-art performance. Additionally, the domain of Genome Reconstruction has been criticized for lacking algorithmic diversity, and the bio-inspired nature of ChromeBat contributes algorithmic diversity to the problem domain. ChromeBat is an effective approach for solving the Genome Reconstruction Problem.
Collapse
Affiliation(s)
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO 80918, USA; (B.C.); (P.B.)
| | | |
Collapse
|
6
|
Peart CR, Williams C, Pophaly SD, Neely BA, Gulland FMD, Adams DJ, Ng BL, Cheng W, Goebel ME, Fedrigo O, Haase B, Mountcastle J, Fungtammasan A, Formenti G, Collins J, Wood J, Sims Y, Torrance J, Tracey A, Howe K, Rhie A, Hoffman JI, Johnson J, Jarvis ED, Breen M, Wolf JBW. Hi-C scaffolded short- and long-read genome assemblies of the California sea lion are broadly consistent for syntenic inference across 45 million years of evolution. Mol Ecol Resour 2021; 21:2455-2470. [PMID: 34097816 PMCID: PMC9732816 DOI: 10.1111/1755-0998.13443] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 12/13/2022]
Abstract
With the advent of chromatin-interaction maps, chromosome-level genome assemblies have become a reality for a wide range of organisms. Scaffolding quality is, however, difficult to judge. To explore this gap, we generated multiple chromosome-scale genome assemblies of an emerging wild animal model for carcinogenesis, the California sea lion (Zalophus californianus). Short-read assemblies were scaffolded with two independent chromatin interaction mapping data sets (Hi-C and Chicago), and long-read assemblies with three data types (Hi-C, optical maps and 10X linked reads) following the "Vertebrate Genomes Project (VGP)" pipeline. In both approaches, 18 major scaffolds recovered the karyotype (2n = 36), with scaffold N50s of 138 and 147 Mb, respectively. Synteny relationships at the chromosome level with other pinniped genomes (2n = 32-36), ferret (2n = 34), red panda (2n = 36) and domestic dog (2n = 78) were consistent across approaches and recovered known fissions and fusions. Comparative chromosome painting and multicolour chromosome tiling with a panel of 264 genome-integrated single-locus canine bacterial artificial chromosome probes provided independent evaluation of genome organization. Broad-scale discrepancies between the approaches were observed within chromosomes, most commonly in translocations centred around centromeres and telomeres, which were better resolved in the VGP assembly. Genomic and cytological approaches agreed on near-perfect synteny of the X chromosome, and in combination allowed detailed investigation of autosomal rearrangements between dog and sea lion. This study presents high-quality genomes of an emerging cancer model and highlights that even highly fragmented short-read assemblies scaffolded with Hi-C can yield reliable chromosome-level scaffolds suitable for comparative genomic analyses.
Collapse
Affiliation(s)
- Claire R. Peart
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany
| | - Christina Williams
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| | - Saurabh D. Pophaly
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany,Max Planck institute for Plant Breeding Research, Cologne, Germany
| | - Benjamin A. Neely
- National Institute of Standards and Technology, NIST Charleston, Charleston, South Carolina, USA
| | - Frances M. D. Gulland
- Karen Dryer Wildlife Health Center, University of California Davis, Davis, California, USA
| | - David J. Adams
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - Bee Ling Ng
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - William Cheng
- Cytometry Core Facility, Wellcome Sanger Institute, Cambridge, UK
| | - Michael E. Goebel
- Institute of Marine Science, University of California Santa Cruz, Santa Cruz, California, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA
| | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA
| | | | | | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA,Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, New York, USA
| | - Joanna Collins
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Jonathan Wood
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Ying Sims
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - James Torrance
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Alan Tracey
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Kerstin Howe
- Tree of Life Programme, Wellcome Sanger Institute, Cambridge, UK
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland, USA
| | - Joseph I. Hoffman
- Department of Animal Behaviour, Bielefeld University, Bielefeld, Germany,British Antarctic Survey, Cambridge, UK
| | - Jeremy Johnson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, USA
| | - Erich D. Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York City, New York, USA,Howard Hughes Medical Institute, Chevy Chase, Maryland, USA
| | - Matthew Breen
- Department of Molecular Biomedical Sciences, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA,Comparative Medicine Institute, North Carolina State University, Raleigh, North Carolina, USA
| | - Jochen B. W. Wolf
- Division of Evolutionary Biology, Faculty of Biology, LMU Munich, Munchen, Germany
| |
Collapse
|
7
|
Hovenga V, Oluwadare O. CBCR: A Curriculum Based Strategy For Chromosome Reconstruction. Int J Mol Sci 2021; 22:ijms22084140. [PMID: 33923653 PMCID: PMC8073114 DOI: 10.3390/ijms22084140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 04/12/2021] [Accepted: 04/13/2021] [Indexed: 11/30/2022] Open
Abstract
In this paper, we introduce a novel algorithm that aims to estimate chromosomes’ structure from their Hi-C contact data, called Curriculum Based Chromosome Reconstruction (CBCR). Specifically, our method performs this three dimensional reconstruction using cis-chromosomal interactions from Hi-C data. CBCR takes intra-chromosomal Hi-C interaction frequencies as an input and outputs a set of xyz coordinates that estimate the chromosome’s three dimensional structure in the form of a .pdb file. The algorithm relies on progressively training a distance-restraint-based algorithm with a strategy we refer to as curriculum learning. Curriculum learning divides the Hi-C data into classes based on contact frequency and progressively re-trains the distance-restraint algorithm based on the assumed importance of each curriculum in predicting the underlying chromosome structure. The distance-restraint algorithm relies on a modification of a Gaussian maximum likelihood function that scales probabilities based on the importance of features. We evaluate the performance of CBCR on both simulated and actual Hi-C data and perform validation on FISH, HiChIP, and ChIA-PET data as well. We also compare the performance of CBCR to several current methods. Our analysis shows that the use of curricula affects the rate of convergence of the optimization while decreasing the computational cost of our distance-restraint algorithm. Also, CBCR is more robust to increases in data resolution and therefore yields superior reconstruction accuracy of higher resolution data than all other methods in our comparison.
Collapse
Affiliation(s)
- Van Hovenga
- Department of Mathematics, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA;
| | - Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado Colorado Springs, Colorado Springs, CO 80918, USA
- Correspondence:
| |
Collapse
|
8
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|
9
|
Correction to: GSDB: a database of 3D chromosome and genome structures reconstructed from Hi-C data. BMC Mol Cell Biol 2020; 21:62. [PMID: 32811439 PMCID: PMC7436965 DOI: 10.1186/s12860-020-00305-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|