1
|
Banecki K, Korsak S, Plewczynski D. Advancements and future directions in single-cell Hi-C based 3D chromatin modeling. Comput Struct Biotechnol J 2024; 23:3549-3558. [PMID: 39963420 PMCID: PMC11832020 DOI: 10.1016/j.csbj.2024.09.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 09/27/2024] [Accepted: 09/29/2024] [Indexed: 02/20/2025] Open
Abstract
Single-cell Hi-C data provides valuable insights into the three-dimensional organization of chromatin within individual cells, yet modeling this data poses significant challenges due to its inherent sparsity and variability. This review comprehensively explores the predominant approaches to reconstructing 3D chromatin structures from single-cell Hi-C data, positioning these methods within the broader contexts of single-cell Hi-C research and bulk Hi-C data modeling. We categorize the modeling strategies based on their objective functions, which are framed in terms of force fields, potentials, cost functions, or likelihood probabilities. Despite their diverse methodologies, these approaches exhibit deep underlying similarities. We further dissect the basic components of these models, such as attractive restraint forces and repulsive forces, and discuss additional terms like fluid viscosity and variation penalties. The review also critically evaluates the current state of model validation, highlighting the inconsistencies across various studies and emphasizing the need for a comprehensive validation framework. We detail common validation techniques, including the comparison of distance matrices and the assessment of contact violations. We argue that the future of single-cell Hi-C modeling lies in integrating multiple data modalities and incorporating cell cycle trajectory information. Such integration could significantly advance our understanding of chromatin conformation dynamics during cell cycle progression and cell differentiation. We also foresee the continued growth of optimization-based and molecular dynamics approaches, supported by general molecular dynamics toolkits.
Collapse
Affiliation(s)
- Krzysztof Banecki
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
2
|
Kadlof M, Banecki K, Chiliński M, Plewczynski D. Chromatin image-driven modelling. Methods 2024; 226:54-60. [PMID: 38636797 DOI: 10.1016/j.ymeth.2024.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/13/2024] [Accepted: 04/05/2024] [Indexed: 04/20/2024] Open
Abstract
The challenge of modelling the spatial conformation of chromatin remains an open problem. While multiple data-driven approaches have been proposed, each has limitations. This work introduces two image-driven modelling methods based on the Molecular Dynamics Flexible Fitting (MDFF) approach: the force method and the correlational method. Both methods have already been used successfully in protein modelling. We propose a novel way to employ them for building chromatin models directly from 3D images. This approach is termed image-driven modelling. Additionally, we introduce the initial structure generator, a tool designed to generate optimal starting structures for the proposed algorithms. The methods are versatile and can be applied to various data types, with minor modifications to accommodate new generation imaging techniques.
Collapse
Affiliation(s)
- Michał Kadlof
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
| | - Krzysztof Banecki
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Mateusz Chiliński
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland; Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland; Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
3
|
Li Z, Schlick T. Hi-BDiSCO: folding 3D mesoscale genome structures from Hi-C data using brownian dynamics. Nucleic Acids Res 2024; 52:583-599. [PMID: 38015443 PMCID: PMC10810283 DOI: 10.1093/nar/gkad1121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/12/2023] [Accepted: 11/22/2023] [Indexed: 11/29/2023] Open
Abstract
The structure and dynamics of the eukaryotic genome are intimately linked to gene regulation and transcriptional activity. Many chromosome conformation capture experiments like Hi-C have been developed to detect genome-wide contact frequencies and quantify loop/compartment structures for different cellular contexts and time-dependent processes. However, a full understanding of these events requires explicit descriptions of representative chromatin and chromosome configurations. With the exponentially growing amount of data from Hi-C experiments, many methods for deriving 3D structures from contact frequency data have been developed. Yet, most reconstruction methods use polymer models with low resolution to predict overall genome structure. Here we present a Brownian Dynamics (BD) approach termed Hi-BDiSCO for producing 3D genome structures from Hi-C and Micro-C data using our mesoscale-resolution chromatin model based on the Discrete Surface Charge Optimization (DiSCO) model. Our approach integrates reconstruction with chromatin simulations at nucleosome resolution with appropriate biophysical parameters. Following a description of our protocol, we present applications to the NXN, HOXC, HOXA and Fbn2 mouse genes ranging in size from 50 to 100 kb. Such nucleosome-resolution genome structures pave the way for pursuing many biomedical applications related to the epigenomic regulation of chromatin and control of human disease.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, 100 Washington Square East, Silver Building, New York University, New York, NY 10003, USA
- Simons Center for Computational Physical Chemistry, 24 Waverly Place, Silver Building, New York University, New York, NY 10003, USA
| | - Tamar Schlick
- Department of Chemistry, 100 Washington Square East, Silver Building, New York University, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, NY 10012, USA
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200122, China
- Simons Center for Computational Physical Chemistry, 24 Waverly Place, Silver Building, New York University, New York, NY 10003, USA
| |
Collapse
|
4
|
Caudai C, Salerno E. Complementing Hi-C information for 3D chromatin reconstruction by ChromStruct. FRONTIERS IN BIOINFORMATICS 2024; 3:1287168. [PMID: 38318534 PMCID: PMC10840501 DOI: 10.3389/fbinf.2023.1287168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 12/20/2023] [Indexed: 02/07/2024] Open
Abstract
A multiscale method proposed elsewhere for reconstructing plausible 3D configurations of the chromatin in cell nuclei is recalled, based on the integration of contact data from Hi-C experiments and additional information coming from ChIP-seq, RNA-seq and ChIA-PET experiments. Provided that the additional data come from independent experiments, this kind of approach is supposed to leverage them to complement possibly noisy, biased or missing Hi-C records. When the different data sources are mutually concurrent, the resulting solutions are corroborated; otherwise, their validity would be weakened. Here, a problem of reliability arises, entailing an appropriate choice of the relative weights to be assigned to the different informational contributions. A series of experiments is presented that help to quantify the advantages and the limitations offered by this strategy. Whereas the advantages in accuracy are not always significant, the case of missing Hi-C data demonstrates the effectiveness of additional information in reconstructing the highly packed segments of the structure.
Collapse
Affiliation(s)
- Claudia Caudai
- Institute of Information Science and Technologies, National Research Council of Italy, Pisa, Italy
| | | |
Collapse
|
5
|
Liu T, Qiu QT, Hua KJ, Ma BG. Chromosome structure modeling tools and their evaluation in bacteria. Brief Bioinform 2024; 25:bbae044. [PMID: 38385874 PMCID: PMC10883143 DOI: 10.1093/bib/bbae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/31/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024] Open
Abstract
The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.
Collapse
Affiliation(s)
- Tong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qin-Tian Qiu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Kang-Jian Hua
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
6
|
Li Z, Portillo-Ledesma S, Schlick T. Techniques for and challenges in reconstructing 3D genome structures from 2D chromosome conformation capture data. Curr Opin Cell Biol 2023; 83:102209. [PMID: 37506571 PMCID: PMC10529954 DOI: 10.1016/j.ceb.2023.102209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
Chromosome conformation capture technologies that provide frequency information for contacts between genomic regions have been crucial for increasing our understanding of genome folding and regulation. However, such data do not provide direct evidence of the spatial 3D organization of chromatin. In this opinion article, we discuss the development and application of computational methods to reconstruct chromatin 3D structures from experimental 2D contact data, highlighting how such modeling provides biological insights and can suggest mechanisms anchored to experimental data. By applying different reconstruction methods to the same contact data, we illustrate some state-of-the-art of these techniques and discuss our gene resolution approach based on Brownian dynamics and Monte Carlo sampling.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Stephanie Portillo-Ledesma
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, 10012, NY, USA; New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Room 340, Geography Building, 3663 North Zhongshan Road, Shanghai, 200122, China; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA.
| |
Collapse
|
7
|
Cosma MP, Neguembor MV. The magic of unraveling genome architecture and function. Cell Rep 2023; 42:112361. [PMID: 37059093 DOI: 10.1016/j.celrep.2023.112361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 02/20/2023] [Accepted: 03/22/2023] [Indexed: 04/16/2023] Open
Abstract
Over the last decades, technological breakthroughs in super-resolution microscopy have allowed us to reach molecular resolution and design experiments of unprecedented complexity. Investigating how chromatin is folded in 3D, from the nucleosome level up to the entire genome, is becoming possible by "magic" (imaging genomic), i.e., the combination of imaging and genomic approaches. This offers endless opportunities to delve into the relationship between genome structure and function. Here, we review recently achieved objectives and the conceptual and technical challenges the field of genome architecture is currently undertaking. We discuss what we have learned so far and where we are heading. We elucidate how the different super-resolution microscopy approaches and, more specifically, live-cell imaging have contributed to the understanding of genome folding. Moreover, we discuss how future technical developments could address remaining open questions.
Collapse
Affiliation(s)
- Maria Pia Cosma
- Medical Research Institute, Guangdong Provincial People's Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, 106 Zhongshan Er Road, Yuexiu District, 510080 Guangzhou, China; Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain; Universitat Pompeu Fabra (UPF), Dr Aiguader 88, 08003 Barcelona, Spain; ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain.
| | - Maria Victoria Neguembor
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, 08003 Barcelona, Spain.
| |
Collapse
|
8
|
Varoquaux N, Noble WS, Vert JP. Inference of 3D genome architecture by modeling overdispersion of Hi-C data. Bioinformatics 2023; 39:btac838. [PMID: 36594573 PMCID: PMC9857972 DOI: 10.1093/bioinformatics/btac838] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 11/16/2022] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION We address the challenge of inferring a consensus 3D model of genome architecture from Hi-C data. Existing approaches most often rely on a two-step algorithm: first, convert the contact counts into distances, then optimize an objective function akin to multidimensional scaling (MDS) to infer a 3D model. Other approaches use a maximum likelihood approach, modeling the contact counts between two loci as a Poisson random variable whose intensity is a decreasing function of the distance between them. However, a Poisson model of contact counts implies that the variance of the data is equal to the mean, a relationship that is often too restrictive to properly model count data. RESULTS We first confirm the presence of overdispersion in several real Hi-C datasets, and we show that the overdispersion arises even in simulated datasets. We then propose a new model, called Pastis-NB, where we replace the Poisson model of contact counts by a negative binomial one, which is parametrized by a mean and a separate dispersion parameter. The dispersion parameter allows the variance to be adjusted independently from the mean, thus better modeling overdispersed data. We compare the results of Pastis-NB to those of several previously published algorithms, both MDS-based and statistical methods. We show that the negative binomial inference yields more accurate structures on simulated data, and more robust structures than other models across real Hi-C replicates and across different resolutions. AVAILABILITY AND IMPLEMENTATION A Python implementation of Pastis-NB is available at https://github.com/hiclib/pastis under the BSD license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelle Varoquaux
- TIMC, Université Grenoble Alpes, CNRS, Grenoble INP, Grenoble 38000, France
| | - William S Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Jean-Philippe Vert
- Brain Team, Google Research, Paris 75009, France
- Centre for Computational Biology , MINES ParisTech, PSL University, Paris 75006, France
| |
Collapse
|
9
|
Vadnais D, Middleton M, Oluwadare O. ParticleChromo3D: a Particle Swarm Optimization algorithm for chromosome 3D structure prediction from Hi-C data. BioData Min 2022; 15:19. [PMID: 36131326 PMCID: PMC9494900 DOI: 10.1186/s13040-022-00305-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Accepted: 08/31/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
The three-dimensional (3D) structure of chromatin has a massive effect on its function. Because of this, it is desirable to have an understanding of the 3D structural organization of chromatin. To gain greater insight into the spatial organization of chromosomes and genomes and the functions they perform, chromosome conformation capture (3C) techniques, particularly Hi-C, have been developed. The Hi-C technology is widely used and well-known because of its ability to profile interactions for all read pairs in an entire genome. The advent of Hi-C has greatly expanded our understanding of the 3D genome, genome folding, gene regulation and has enabled the development of many 3D chromosome structure reconstruction methods.
Results
Here, we propose a novel approach for 3D chromosome and genome structure reconstruction from Hi-C data using Particle Swarm Optimization (PSO) approach called ParticleChromo3D. This algorithm begins with a grouping of candidate solution locations for each chromosome bin, according to the particle swarm algorithm, and then iterates its position towards a global best candidate solution. While moving towards the optimal global solution, each candidate solution or particle uses its own local best information and a randomizer to choose its path. Using several metrics to validate our results, we show that ParticleChromo3D produces a robust and rigorous representation of the 3D structure for input Hi-C data. We evaluated our algorithm on simulated and real Hi-C data in this work. Our results show that ParticleChromo3D is more accurate than most of the existing algorithms for 3D structure reconstruction.
Conclusions
Our results also show that constructed ParticleChromo3D structures are very consistent, hence indicating that it will always arrive at the global solution at every iteration. The source code for ParticleChromo3D, the simulated and real Hi-C datasets, and the models generated for these datasets are available here: https://github.com/OluwadareLab/ParticleChromo3D
Collapse
|
10
|
Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat Methods 2022; 19:938-949. [PMID: 35817938 PMCID: PMC9349046 DOI: 10.1038/s41592-022-01527-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 05/18/2022] [Indexed: 02/07/2023]
Abstract
A multitude of sequencing-based and microscopy technologies provide the means to unravel the relationship between the three-dimensional organization of genomes and key regulatory processes of genome function. Here, we develop a multimodal data integration approach to produce populations of single-cell genome structures that are highly predictive for nuclear locations of genes and nuclear bodies, local chromatin compaction and spatial segregation of functionally related chromatin. We demonstrate that multimodal data integration can compensate for systematic errors in some of the data and can greatly increase accuracy and coverage of genome structure models. We also show that alternative combinations of different orthogonal data sources can converge to models with similar predictive power. Moreover, our study reveals the key contributions of low-frequency (‘rare’) interchromosomal contacts to accurately predicting the global nuclear architecture, including the positioning of genes and chromosomes. Overall, our results highlight the benefits of multimodal data integration for genome structure analysis, available through the Integrative Genome Modeling software package. The Integrative Genome Modeling platform is a tool for population-based three-dimensional genome structure modeling and analysis by integrating various experimental data sources.
Collapse
|
11
|
Yildirim A, Boninsegna L, Zhan Y, Alber F. Uncovering the Principles of Genome Folding by 3D Chromatin Modeling. Cold Spring Harb Perspect Biol 2022; 14:a039693. [PMID: 34400556 PMCID: PMC9248826 DOI: 10.1101/cshperspect.a039693] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Our understanding of how genomic DNA is tightly packed inside the nucleus, yet is still accessible for vital cellular processes, has grown dramatically over recent years with advances in microscopy and genomics technologies. Computational methods have played a pivotal role in the structural interpretation of experimental data, which helped unravel some organizational principles of genome folding. Here, we give an overview of current computational efforts in mechanistic and data-driven 3D chromatin structure modeling. We discuss strengths and limitations of different methods and evaluate the added value and benefits of computational approaches to infer the 3D structural and dynamic properties of the genome and its underlying mechanisms at different scales and resolution, ranging from the dynamic formation of chromatin loops and topological associated domains to nuclear compartmentalization of chromatin and nuclear bodies.
Collapse
Affiliation(s)
- Asli Yildirim
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
12
|
Monaco A, Pantaleo E, Amoroso N, Lacalamita A, Lo Giudice C, Fonzino A, Fosso B, Picardi E, Tangaro S, Pesole G, Bellotti R. A primer on machine learning techniques for genomic applications. Comput Struct Biotechnol J 2021; 19:4345-4359. [PMID: 34429852 PMCID: PMC8365460 DOI: 10.1016/j.csbj.2021.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/28/2022] Open
Abstract
High throughput sequencing technologies have enabled the study of complex biological aspects at single nucleotide resolution, opening the big data era. The analysis of large volumes of heterogeneous "omic" data, however, requires novel and efficient computational algorithms based on the paradigm of Artificial Intelligence. In the present review, we introduce and describe the most common machine learning methodologies, and lately deep learning, applied to a variety of genomics tasks, trying to emphasize capabilities, strengths and limitations through a simple and intuitive language. We highlight the power of the machine learning approach in handling big data by means of a real life example, and underline how described methods could be relevant in all cases in which large amounts of multimodal genomic data are available.
Collapse
Affiliation(s)
- Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy
| | - Ester Pantaleo
- Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Antonio Lacalamita
- National Institute of Gastroenterology "S. de Bellis", Research Hospital, 70013 Castellana Grotte (Bari), Italy
| | - Claudio Lo Giudice
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Adriano Fonzino
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
| | - Bruno Fosso
- Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Ernesto Picardi
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Sabina Tangaro
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari "Aldo Moro", Bari, Via G. Amendola 165, 70125 Bari, Italy
| | - Graziano Pesole
- Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
| |
Collapse
|
13
|
Liang J, Perez-Rathke A. Minimalistic 3D chromatin models: Sparse interactions in single cells drive the chromatin fold and form many-body units. Curr Opin Struct Biol 2021; 71:200-214. [PMID: 34399301 DOI: 10.1016/j.sbi.2021.06.017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/27/2021] [Accepted: 06/29/2021] [Indexed: 11/26/2022]
Abstract
Computational three-dimensional chromatin modeling has helped uncover principles of genome organization. Here, we discuss methods for modeling three-dimensional chromatin structures, with focus on a minimalistic polymer model which inverts population Hi-C into single-cell conformations. Utilizing only basic physical properties, this model reveals that a few specific Hi-C interactions can fold chromatin into conformations consistent with single-cell imaging, Dip-C, and FISH measurements. Aggregated single-cell chromatin conformations also reproduce Hi-C frequencies. This approach allows quantification of structural heterogeneity and discovery of many-body interaction units and has revealed additional insights, including (1) topologically associating domains as a byproduct of folding driven by specific interactions, (2) cell subpopulations with different structural scaffolds are developmental stage dependent, and (3) the functional landscape of many-body units within enhancer-rich regions. We also discuss these findings in relation to the genome structure-function relationship.
Collapse
Affiliation(s)
- Jie Liang
- Center for Bioinformatics and Quantitative Biology & Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, 60612, USA.
| | - Alan Perez-Rathke
- Center for Bioinformatics and Quantitative Biology & Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL, 60612, USA
| |
Collapse
|
14
|
Jerkovic I, Cavalli G. Understanding 3D genome organization by multidisciplinary methods. Nat Rev Mol Cell Biol 2021; 22:511-528. [PMID: 33953379 DOI: 10.1038/s41580-021-00362-w] [Citation(s) in RCA: 181] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/16/2021] [Indexed: 02/03/2023]
Abstract
Understanding how chromatin is folded in the nucleus is fundamental to understanding its function. Although 3D genome organization has been historically difficult to study owing to a lack of relevant methodologies, major technological breakthroughs in genome-wide mapping of chromatin contacts and advances in imaging technologies in the twenty-first century considerably improved our understanding of chromosome conformation and nuclear architecture. In this Review, we discuss methods of 3D genome organization analysis, including sequencing-based techniques, such as Hi-C and its derivatives, Micro-C, DamID and others; microscopy-based techniques, such as super-resolution imaging coupled with fluorescence in situ hybridization (FISH), multiplex FISH, in situ genome sequencing and live microscopy methods; and computational and modelling approaches. We describe the most commonly used techniques and their contribution to our current knowledge of nuclear architecture and, finally, we provide a perspective on up-and-coming methods that open possibilities for future major discoveries.
Collapse
Affiliation(s)
- Ivana Jerkovic
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France
| | - Giacomo Cavalli
- Institute of Human Genetics, CNRS, University of Montpellier, Montpellier, France.
| |
Collapse
|
15
|
Oliveira Junior AB, Estrada CP, Aiden EL, Contessoto VG, Onuchic JN. Chromosome Modeling on Downsampled Hi-C Maps Enhances the Compartmentalization Signal. J Phys Chem B 2021; 125:8757-8767. [PMID: 34319725 DOI: 10.1021/acs.jpcb.1c04174] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The human genome is organized within a nucleus where chromosomes fold into an ensemble of different conformations. Chromosome conformation capture techniques such as Hi-C provide information about the genome architecture by creating a 2D heat map. Initially, Hi-C map experiments were performed in human interphase cell lines. Recently, efforts were expanded to several different organisms, cell lines, tissues, and cell cycle phases where obtaining high-quality maps is challenging. Poor sampled Hi-C maps present high sparse matrices where compartments located far from the main diagonal are difficult to observe. Aided by recently developed models for chromatin folding and dynamics investigation, we introduce a framework to enhance the compartments' information far from the diagonal observed in experimental sparse matrices. The simulations were performed using the Open-MiChroM platform aided by new trained parameters in the minimal chromatin model (MiChroM) energy function. The simulations optimized on a downsampled experimental map (10% of the original data) allow the prediction of a contact frequency similar to that of the complete (100%) experimental Hi-C. The modeling results open a discussion on how simulations and modeling can increase the statistics and help fill in some Hi-C regions not captured by poor sampling experiments. Open-MiChroM simulations allow us to explore the 3D genome organization of different organisms, cell lines, and cell phases that often do not produce high-quality Hi-C maps.
Collapse
Affiliation(s)
| | - Cynthia Perez Estrada
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States
| | - Vinícius G Contessoto
- Instituto de Biociências, Letras e Ciências Exatas, UNESP - Univ. Estadual Paulista, Departamento de Física, São José do Rio Preto, SP, Brazil
| | | |
Collapse
|
16
|
Lin X, Qi Y, Latham AP, Zhang B. Multiscale modeling of genome organization with maximum entropy optimization. J Chem Phys 2021; 155:010901. [PMID: 34241389 PMCID: PMC8253599 DOI: 10.1063/5.0044150] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Accepted: 04/28/2021] [Indexed: 12/15/2022] Open
Abstract
Three-dimensional (3D) organization of the human genome plays an essential role in all DNA-templated processes, including gene transcription, gene regulation, and DNA replication. Computational modeling can be an effective way of building high-resolution genome structures and improving our understanding of these molecular processes. However, it faces significant challenges as the human genome consists of over 6 × 109 base pairs, a system size that exceeds the capacity of traditional modeling approaches. In this perspective, we review the progress that has been made in modeling the human genome. Coarse-grained models parameterized to reproduce experimental data via the maximum entropy optimization algorithm serve as effective means to study genome organization at various length scales. They have provided insight into the principles of whole-genome organization and enabled de novo predictions of chromosome structures from epigenetic modifications. Applications of these models at a near-atomistic resolution further revealed physicochemical interactions that drive the phase separation of disordered proteins and dictate chromatin stability in situ. We conclude with an outlook on the opportunities and challenges in studying chromosome dynamics.
Collapse
Affiliation(s)
- Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Yifeng Qi
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Andrew P. Latham
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bin Zhang
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
17
|
MacKay K, Kusalik A. Computational methods for predicting 3D genomic organization from high-resolution chromosome conformation capture data. Brief Funct Genomics 2021; 19:292-308. [PMID: 32353112 PMCID: PMC7388788 DOI: 10.1093/bfgp/elaa004] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Revised: 01/30/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022] Open
Abstract
The advent of high-resolution chromosome conformation capture assays (such as 5C, Hi-C and Pore-C) has allowed for unprecedented sequence-level investigations into the structure-function relationship of the genome. In order to comprehensively understand this relationship, computational tools are required that utilize data generated from these assays to predict 3D genome organization (the 3D genome reconstruction problem). Many computational tools have been developed that answer this need, but a comprehensive comparison of their underlying algorithmic approaches has not been conducted. This manuscript provides a comprehensive review of the existing computational tools (from November 2006 to September 2019, inclusive) that can be used to predict 3D genome organizations from high-resolution chromosome conformation capture data. Overall, existing tools were found to use a relatively small set of algorithms from one or more of the following categories: dimensionality reduction, graph/network theory, maximum likelihood estimation (MLE) and statistical modeling. Solutions in each category are far from maturity, and the breadth and depth of various algorithmic categories have not been fully explored. While the tools for predicting 3D structure for a genomic region or single chromosome are diverse, there is a general lack of algorithmic diversity among computational tools for predicting the complete 3D genome organization from high-resolution chromosome conformation capture data.
Collapse
|
18
|
Zha M, Wang N, Zhang C, Wang Z. Inferring Single-Cell 3D Chromosomal Structures Based on the Lennard-Jones Potential. Int J Mol Sci 2021; 22:ijms22115914. [PMID: 34072879 PMCID: PMC8199262 DOI: 10.3390/ijms22115914] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 05/23/2021] [Accepted: 05/28/2021] [Indexed: 11/16/2022] Open
Abstract
Reconstructing three-dimensional (3D) chromosomal structures based on single-cell Hi-C data is a challenging scientific problem due to the extreme sparseness of the single-cell Hi-C data. In this research, we used the Lennard-Jones potential to reconstruct both 500 kb and high-resolution 50 kb chromosomal structures based on single-cell Hi-C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse single-cell Hi-C contact matrices. We designed a novel loss function based on the Lennard-Jones potential, in which the ε value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have single-cell Hi-C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolis-Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3D-FISH data.
Collapse
Affiliation(s)
- Mengsheng Zha
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA;
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
- Correspondence:
| |
Collapse
|
19
|
Gong H, Yang Y, Zhang S, Li M, Zhang X. Application of Hi-C and other omics data analysis in human cancer and cell differentiation research. Comput Struct Biotechnol J 2021; 19:2070-2083. [PMID: 33995903 PMCID: PMC8086027 DOI: 10.1016/j.csbj.2021.04.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Revised: 04/04/2021] [Accepted: 04/04/2021] [Indexed: 02/07/2023] Open
Abstract
With the development of 3C (chromosome conformation capture) and its derivative technology Hi-C (High-throughput chromosome conformation capture) research, the study of the spatial structure of the genomic sequence in the nucleus helps researchers understand the functions of biological processes such as gene transcription, replication, repair, and regulation. In this paper, we first introduce the research background and purpose of Hi-C data visualization analysis. After that, we discuss the Hi-C data analysis methods from genome 3D structure, A/B compartment, TADs (topologically associated domain), and loop detection. We also discuss how to apply genome visualization technologies to the identification of chromosome feature structures. We continue with a review of correlation analysis differences among multi-omics data, and how to apply Hi-C and other omics data analysis into cancer and cell differentiation research. Finally, we summarize the various problems in joint analyses based on Hi-C and other multi-omics data. We believe this review can help researchers better understand the progress and applications of 3D genome technology.
Collapse
Affiliation(s)
- Haiyan Gong
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| | - Yi Yang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Sichen Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Minghong Li
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiaotong Zhang
- Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing 100083, China
- Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
- Shunde Graduate School of University of Science and Technology Beijing, Foshan 528000, China
| |
Collapse
|
20
|
Fatima N, Rueda L. iSOM-GSN: an integrative approach for transforming multi-omic data into gene similarity networks via self-organizing maps. Bioinformatics 2021; 36:4248-4254. [PMID: 32407457 DOI: 10.1093/bioinformatics/btaa500] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 04/27/2020] [Accepted: 05/07/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION One of the main challenges in applying graph convolutional neural networks (CNNs) on gene-interaction data is the lack of understanding of the vector space to which they belong, and also the inherent difficulties involved in representing those interactions on a significantly lower dimension, viz Euclidean spaces. The challenge becomes more prevalent when dealing with various types of heterogeneous data. We introduce a systematic, generalized method, called iSOM-GSN, used to transform 'multi-omic' data with higher dimensions onto a 2D grid. Afterwards, we apply a CNN to predict disease states of various types. Based on the idea of Kohonen's self-organizing map, we generate a 2D grid for each sample for a given set of genes that represent a gene similarity network. RESULTS We have tested the model to predict breast and prostate cancer using gene expression, DNA methylation and copy number alteration. Prediction accuracies in the 94-98% range were obtained for tumor stages of breast cancer and calculated Gleason scores of prostate cancer with just 14 input genes for both cases. The scheme not only outputs nearly perfect classification accuracy, but also provides an enhanced scheme for representation learning, visualization, dimensionality reduction and interpretation of multi-omic data. AVAILABILITY AND IMPLEMENTATION The source code and sample data are available via a Github project at https://github.com/NaziaFatima/iSOM_GSN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nazia Fatima
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| | - Luis Rueda
- School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada
| |
Collapse
|
21
|
Meluzzi D, Arya G. Computational approaches for inferring 3D conformations of chromatin from chromosome conformation capture data. Methods 2020; 181-182:24-34. [PMID: 31470090 PMCID: PMC7044057 DOI: 10.1016/j.ymeth.2019.08.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 06/24/2019] [Accepted: 08/23/2019] [Indexed: 02/08/2023] Open
Abstract
Chromosome conformation capture (3C) and its variants are powerful experimental techniques for probing intra- and inter-chromosomal interactions within cell nuclei at high resolution and in a high-throughput, quantitative manner. The contact maps derived from such experiments provide an avenue for inferring the 3D spatial organization of the genome. This review provides an overview of the various computational methods developed in the past decade for addressing the very important but challenging problem of deducing the detailed 3D structure or structure population of chromosomal domains, chromosomes, and even entire genomes from 3C contact maps.
Collapse
Affiliation(s)
- Dario Meluzzi
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| | - Gaurav Arya
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, United States.
| |
Collapse
|
22
|
Oluwadare O, Highsmith M, Turner D, Lieberman Aiden E, Cheng J. GSDB: a database of 3D chromosome and genome structures reconstructed from Hi-C data. BMC Mol Cell Biol 2020; 21:60. [PMID: 32758136 PMCID: PMC7405446 DOI: 10.1186/s12860-020-00304-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 07/29/2020] [Indexed: 11/10/2022] Open
Abstract
Advances in the study of chromosome conformation capture technologies, such as Hi-C technique - capable of capturing chromosomal interactions in a genome-wide scale - have led to the development of three-dimensional chromosome and genome structure reconstruction methods from Hi-C data. The three dimensional genome structure is important because it plays a role in a variety of important biological activities such as DNA replication, gene regulation, genome interaction, and gene expression. In recent years, numerous Hi-C datasets have been generated, and likewise, a number of genome structure construction algorithms have been developed. In this work, we outline the construction of a novel Genome Structure Database (GSDB) to create a comprehensive repository that contains 3D structures for Hi-C datasets constructed by a variety of 3D structure reconstruction tools. The GSDB contains over 50,000 structures from 12 state-of-the-art Hi-C data structure prediction algorithms for 32 Hi-C datasets. GSDB functions as a centralized collection of genome structures which will enable the exploration of the dynamic architectures of chromosomes and genomes for biomedical research. GSDB is accessible at http://sysbio.rnet.missouri.edu/3dgenome/GSDB
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Douglass Turner
- Elastic Image Software LLC, 21 Walnut Street, Lexington, MA, 02421, USA
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
23
|
Salameh TJ, Wang X, Song F, Zhang B, Wright SM, Khunsriraksakul C, Ruan Y, Yue F. A supervised learning framework for chromatin loop detection in genome-wide contact maps. Nat Commun 2020; 11:3428. [PMID: 32647330 PMCID: PMC7347923 DOI: 10.1038/s41467-020-17239-9] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Accepted: 06/18/2020] [Indexed: 01/26/2023] Open
Abstract
Accurately predicting chromatin loops from genome-wide interaction matrices such as Hi-C data is critical to deepening our understanding of proper gene regulation. Current approaches are mainly focused on searching for statistically enriched dots on a genome-wide map. However, given the availability of orthogonal data types such as ChIA-PET, HiChIP, Capture Hi-C, and high-throughput imaging, a supervised learning approach could facilitate the discovery of a comprehensive set of chromatin interactions. Here, we present Peakachu, a Random Forest classification framework that predicts chromatin loops from genome-wide contact maps. We compare Peakachu with current enrichment-based approaches, and find that Peakachu identifies a unique set of short-range interactions. We show that our models perform well in different platforms, across different sequencing depths, and across different species. We apply this framework to predict chromatin loops in 56 Hi-C datasets, and release the results at the 3D Genome Browser.
Collapse
Affiliation(s)
- Tarik J Salameh
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Xiaotao Wang
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
| | - Fan Song
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Bo Zhang
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Sage M Wright
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Chachrit Khunsriraksakul
- Bioinformatics and Genomics Program, The Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Yijun Ruan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Department of Genetics and Genome Sciences, University of Connecticut Health Center, Farmington, CT, USA
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA.
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, USA.
| |
Collapse
|
24
|
Advances in technologies for 3D genomics research. SCIENCE CHINA-LIFE SCIENCES 2020; 63:811-824. [PMID: 32394244 DOI: 10.1007/s11427-019-1704-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 04/21/2020] [Indexed: 01/08/2023]
Abstract
The spatial structure of the orderly organized chromatin in the nucleus has important roles in maintaining normal cell function and in regulation of gene expression, and the high-throughput Hi-C and ChIA-PET methods have been widely used in various biological studies for determining potential spatial genome structures and their functions. However, there are still great difficulties and challenges in three-dimensional (3D) genomics research. More efficient, economical, and unbiased approaches to studying 3D genomics need to be developed for more widespread and easier applications. Here, we review the most recent studies on new 3D genomics research technologies, such as improvements of the traditional Hi-C and ChIA-PET methods, new approaches based on non-proximal-ligation strategies, and imaging-based methods improved in recent years. Especially, we review the CRISPR-based methods for functional validations in 3D genomics, which could be the forthcoming directions. We hope this review can show some insights into the potential improvements for future 3D genomics.
Collapse
|
25
|
Bayesian inference of chromatin structure ensembles from population-averaged contact data. Proc Natl Acad Sci U S A 2020; 117:7824-7830. [PMID: 32193349 DOI: 10.1073/pnas.1910364117] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Mounting experimental evidence suggests a role for the spatial organization of chromatin in crucial processes of the cell nucleus such as transcription regulation. Chromosome conformation capture techniques allow us to characterize chromatin structure by mapping contacts between chromosomal loci on a genome-wide scale. The most widespread modality is to measure contact frequencies averaged over a population of cells. Single-cell variants exist, but suffer from low contact numbers and have not yet gained the same resolution as population methods. While intriguing biological insights have already been garnered from ensemble-averaged data, information about three-dimensional (3D) genome organization in the underlying individual cells remains largely obscured because the contact maps show only an average over a huge population of cells. Moreover, computational methods for structure modeling of chromatin have mostly focused on fitting a single consensus structure, thereby ignoring any cell-to-cell variability in the model itself. Here, we propose a fully Bayesian method to infer ensembles of chromatin structures and to determine the optimal number of states in a principled, objective way. We illustrate our approach on simulated data and compute multistate models of chromatin from chromosome conformation capture carbon copy (5C) data. Comparison with independent data suggests that the inferred ensembles represent the underlying sample population faithfully. Harnessing the rich information contained in multistate models, we investigate cell-to-cell variability of chromatin organization into topologically associating domains, thus highlighting the ability of our approach to deliver insights into chromatin organization of great biological relevance.
Collapse
|
26
|
Belokopytova PS, Nuriddinov MA, Mozheiko EA, Fishman D, Fishman V. Quantitative prediction of enhancer-promoter interactions. Genome Res 2019; 30:72-84. [PMID: 31804952 PMCID: PMC6961579 DOI: 10.1101/gr.249367.119] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/25/2019] [Indexed: 11/24/2022]
Abstract
Recent experimental and computational efforts have provided large data sets describing three-dimensional organization of mouse and human genomes and showed the interconnection between the expression profile, epigenetic state, and spatial interactions of loci. These interconnections were utilized to infer the spatial organization of chromatin, including enhancer–promoter contacts, from one-dimensional epigenetic marks. Here, we show that the predictive power of some of these algorithms is overestimated due to peculiar properties of the biological data. We propose an alternative approach, which provides high-quality predictions of chromatin interactions using information on gene expression and CTCF-binding alone. Using multiple metrics, we confirmed that our algorithm could efficiently predict the three-dimensional architecture of both normal and rearranged genomes.
Collapse
Affiliation(s)
- Polina S Belokopytova
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| | | | | | - Daniil Fishman
- Novosibirsk State University, Novosibirsk, Russia 630090
| | - Veniamin Fishman
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| |
Collapse
|
27
|
Ross BC, Costello JC. Improved inference of chromosome conformation from images of labeled loci. F1000Res 2019; 7. [PMID: 31363407 PMCID: PMC6644830 DOI: 10.12688/f1000research.16252.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/26/2019] [Indexed: 11/29/2022] Open
Abstract
We previously published a method that infers chromosome conformation from images of fluorescently-tagged genomic loci, for the case when there are many loci labeled with each distinguishable color. Here we build on our previous work and improve the reconstruction algorithm to address previous limitations. We show that these improvements 1) increase the reconstruction accuracy and 2) allow the method to be used on large-scale problems involving several hundred labeled loci. Simulations indicate that full-chromosome reconstructions at 1/2 Mb resolution are possible using existing labeling and imaging technologies. The updated reconstruction code and the script files used for this paper are available at:
https://github.com/heltilda/align3d.
Collapse
Affiliation(s)
- Brian C Ross
- Computational Bioscience Program, Department of Pharmacology, University of Colorado, Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - James C Costello
- Computational Bioscience Program, Department of Pharmacology, University of Colorado, Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
28
|
Liu L, Kim MH, Hyeon C. Heterogeneous Loop Model to Infer 3D Chromosome Structures from Hi-C. Biophys J 2019; 117:613-625. [PMID: 31337548 DOI: 10.1016/j.bpj.2019.06.032] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 05/22/2019] [Accepted: 06/25/2019] [Indexed: 10/26/2022] Open
Abstract
Adapting a well-established formalism in polymer physics, we develop a minimalist approach to infer three-dimensional folding of chromatin from Hi-C data. The three-dimensional chromosome structures generated from our heterogeneous loop model (HLM) are used to visualize chromosome organizations that can substantiate the measurements from fluorescence in situ hybridization, chromatin interaction analysis by paired-end tag sequencing, and RNA-seq signals. We demonstrate the utility of the HLM with several case studies. Specifically, the HLM-generated chromosome structures, which reproduce the spatial distribution of topologically associated domains from fluorescence in situ hybridization measurement, show the phase segregation between two types of topologically associated domains explicitly. We discuss the origin of cell-type-dependent gene-expression level by modeling the chromatin globules of α-globin and SOX2 gene loci for two different cell lines. We also use the HLM to discuss how the chromatin folding and gene-expression level of Pax6 loci, associated with mouse neural development, are modulated by interactions with two enhancers. Finally, HLM-generated structures of chromosome 19 of mouse embryonic stem cells, based on single-cell Hi-C data collected over each cell-cycle phase, visualize changes in chromosome conformation along the cell-cycle. Given a contact frequency map between chromatic loci supplied from Hi-C, HLM is a computationally efficient and versatile modeling tool to generate chromosome structures that can complement interpreting other experimental data.
Collapse
Affiliation(s)
- Lei Liu
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Republic of Korea
| | - Min Hyeok Kim
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Republic of Korea
| | - Changbong Hyeon
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Republic of Korea.
| |
Collapse
|
29
|
Abbas A, He X, Niu J, Zhou B, Zhu G, Ma T, Song J, Gao J, Zhang MQ, Zeng J. Integrating Hi-C and FISH data for modeling of the 3D organization of chromosomes. Nat Commun 2019; 10:2049. [PMID: 31053705 PMCID: PMC6499832 DOI: 10.1038/s41467-019-10005-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 04/12/2019] [Indexed: 12/13/2022] Open
Abstract
The new advances in various experimental techniques that provide complementary information about the spatial conformations of chromosomes have inspired researchers to develop computational methods to fully exploit the merits of individual data sources and combine them to improve the modeling of chromosome structure. Here we propose GEM-FISH, a method for reconstructing the 3D models of chromosomes through systematically integrating both Hi-C and FISH data with the prior biophysical knowledge of a polymer model. Comprehensive tests on a set of chromosomes, for which both Hi-C and FISH data are available, demonstrate that GEM-FISH can outperform previous chromosome structure modeling methods and accurately capture the higher order spatial features of chromosome conformations. Moreover, our reconstructed 3D models of chromosomes revealed interesting patterns of spatial distributions of super-enhancers which can provide useful insights into understanding the functional roles of these super-enhancers in gene regulation.
Collapse
Affiliation(s)
- Ahmed Abbas
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Xuan He
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Jing Niu
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Bin Zhou
- School of Life Science, Tsinghua University, Beijing, 100084, China
| | - Guangxiang Zhu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China
| | - Tszshan Ma
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
| | - Jiangpeikun Song
- School of Life Science, Tsinghua University, Beijing, 100084, China
| | - Juntao Gao
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China
| | - Michael Q Zhang
- Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing, 100084, China
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China
- Department of Biological Sciences, Center for Systems Biology, the University of Texas at Dallas, Richardson, TX, 75080-3021, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, 100084, China.
- MOE Key Laboratory of Bioinformatics; Bioinformatics Division, Center for Synthetic and Systems Biology, BNRist; Department of Automation, Tsinghua University; Center for Synthetic and Systems Biology, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
30
|
Oluwadare O, Highsmith M, Cheng J. An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data. Biol Proced Online 2019; 21:7. [PMID: 31049033 PMCID: PMC6482566 DOI: 10.1186/s12575-019-0094-0] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 04/01/2019] [Indexed: 01/08/2023] Open
Abstract
Over the past decade, methods for predicting three-dimensional (3-D) chromosome and genome structures have proliferated. This has been primarily due to the development of high-throughput, next-generation chromosome conformation capture (3C) technologies, which have provided next-generation sequencing data about chromosome conformations in order to map the 3-D genome structure. The introduction of the Hi-C technique-a variant of the 3C method-has allowed researchers to extract the interaction frequency (IF) for all loci of a genome at high-throughput and at a genome-wide scale. In this review we describe, categorize, and compare the various methods developed to map chromosome and genome structures from 3C data-particularly Hi-C data. We summarize the improvements introduced by these methods, describe the approach used for method evaluation, and discuss how these advancements shape the future of genome structure construction.
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
31
|
Greenwald WW, Li H, Benaglio P, Jakubosky D, Matsui H, Schmitt A, Selvaraj S, D'Antonio M, D'Antonio-Chronowska A, Smith EN, Frazer KA. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat Commun 2019; 10:1054. [PMID: 30837461 PMCID: PMC6401380 DOI: 10.1038/s41467-019-08940-5] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 02/04/2019] [Indexed: 12/13/2022] Open
Abstract
While genetic variation at chromatin loops is relevant for human disease, the relationships between contact propensity (the probability that loci at loops physically interact), genetics, and gene regulation are unclear. We quantitatively interrogate these relationships by comparing Hi-C and molecular phenotype data across cell types and haplotypes. While chromatin loops consistently form across different cell types, they have subtle quantitative differences in contact frequency that are associated with larger changes in gene expression and H3K27ac. For the vast majority of loci with quantitative differences in contact frequency across haplotypes, the changes in magnitude are smaller than those across cell types; however, the proportional relationships between contact propensity, gene expression, and H3K27ac are consistent. These findings suggest that subtle changes in contact propensity have a biologically meaningful role in gene regulation and could be a mechanism by which regulatory genetic variants in loop anchors mediate effects on expression.
Collapse
Affiliation(s)
- William W Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | - He Li
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Paola Benaglio
- Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, CA, 92093, USA
| | - David Jakubosky
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Department of Biomedical Sciences, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Hiroko Matsui
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | | | - Matteo D'Antonio
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093, USA
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | - Erin N Smith
- Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, CA, 92093, USA.
| | - Kelly A Frazer
- Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, 92093, USA.
- Department of Pediatrics and Rady Children's Hospital, University of California, San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
32
|
Guilbaud S, Salomé L, Destainville N, Manghi M, Tardin C. Dependence of DNA Persistence Length on Ionic Strength and Ion Type. PHYSICAL REVIEW LETTERS 2019; 122:028102. [PMID: 30720315 DOI: 10.1103/physrevlett.122.028102] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Indexed: 06/09/2023]
Abstract
Even though the persistence length L_{P} of double-stranded DNA plays a pivotal role in cell biology and nanotechnologies, its dependence on ionic strength I lacks a consensual description. Using a high-throughput single-molecule technique and statistical physics modeling, we measure L_{P} in the presence of monovalent (Li^{+}, Na^{+}, K^{+}) and divalent (Mg^{2+}, Ca^{2+}) metallic and alkyl ammonium ions, over a large range 0.5 mM≤I≤5 M. We show that linear Debye-Hückel-type theories do not describe even part of these data. By contrast, the Netz-Orland and Trizac-Shen formulas, two approximate theories including nonlinear electrostatic effects and the finite DNA radius, fit our data with divalent and monovalent ions, respectively, over the whole I range. Furthermore, the metallic ion type does not influence L_{P}(I), in contrast to alkyl ammonium monovalent ions at high I.
Collapse
Affiliation(s)
- Sébastien Guilbaud
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, 31 077 Toulouse, France
| | - Laurence Salomé
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, 31 077 Toulouse, France
| | - Nicolas Destainville
- Laboratoire de Physique Théorique (IRSAMC), Université de Toulouse, CNRS, UPS, 31 062 Toulouse, France
| | - Manoel Manghi
- Laboratoire de Physique Théorique (IRSAMC), Université de Toulouse, CNRS, UPS, 31 062 Toulouse, France
| | - Catherine Tardin
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, 31 077 Toulouse, France
| |
Collapse
|