1
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
2
|
Shi Z, Wu H. CTPredictor: A comprehensive and robust framework for predicting cell types by integrating multi-scale features from single-cell Hi-C data. Comput Biol Med 2024; 173:108336. [PMID: 38513390 DOI: 10.1016/j.compbiomed.2024.108336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/01/2024] [Accepted: 03/17/2024] [Indexed: 03/23/2024]
Abstract
Single-cell Hi-C (scHi-C) has emerged as a powerful technology for deciphering cell-to-cell variability in three-dimensional (3D) chromatin organization, providing insights into genome-wide chromatin interactions and their correlation with cellular functions. Nevertheless, the accurate identification of cell types across different datasets remains a formidable challenge, hindering comprehensive investigations into genome structure. In response, we introduce CTPredictor, an innovative computational method that integrates multi-scale features to accurately predict cell types in various datasets. CTPredictor strategically incorporates three distinct feature sets, namely, small intra-domain contact probability (SICP), smoothed small intra-domain contact probability (SSICP), and smoothed bin contact probability (SBCP). The resulting fusion classification model significantly enhances the accuracy of cell type prediction based on single-cell Hi-C data (scHi-C). Rigorous benchmarking against established methods and three conventional machine learning approaches demonstrates the robust performance of CTPredictor, positioning it as an advanced tool for cell type prediction within scHi-C data. Beyond its prediction capabilities, CTPredictor holds promise in illuminating 3D genome structures and their functional significance across a wide array of biological processes.
Collapse
Affiliation(s)
- Zhenqi Shi
- School of Software, Shandong University, 250100, Jinan, China
| | - Hao Wu
- School of Software, Shandong University, 250100, Jinan, China.
| |
Collapse
|
3
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
4
|
Li Z, Schlick T. Hi-BDiSCO: folding 3D mesoscale genome structures from Hi-C data using brownian dynamics. Nucleic Acids Res 2024; 52:583-599. [PMID: 38015443 PMCID: PMC10810283 DOI: 10.1093/nar/gkad1121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 10/12/2023] [Accepted: 11/22/2023] [Indexed: 11/29/2023] Open
Abstract
The structure and dynamics of the eukaryotic genome are intimately linked to gene regulation and transcriptional activity. Many chromosome conformation capture experiments like Hi-C have been developed to detect genome-wide contact frequencies and quantify loop/compartment structures for different cellular contexts and time-dependent processes. However, a full understanding of these events requires explicit descriptions of representative chromatin and chromosome configurations. With the exponentially growing amount of data from Hi-C experiments, many methods for deriving 3D structures from contact frequency data have been developed. Yet, most reconstruction methods use polymer models with low resolution to predict overall genome structure. Here we present a Brownian Dynamics (BD) approach termed Hi-BDiSCO for producing 3D genome structures from Hi-C and Micro-C data using our mesoscale-resolution chromatin model based on the Discrete Surface Charge Optimization (DiSCO) model. Our approach integrates reconstruction with chromatin simulations at nucleosome resolution with appropriate biophysical parameters. Following a description of our protocol, we present applications to the NXN, HOXC, HOXA and Fbn2 mouse genes ranging in size from 50 to 100 kb. Such nucleosome-resolution genome structures pave the way for pursuing many biomedical applications related to the epigenomic regulation of chromatin and control of human disease.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, 100 Washington Square East, Silver Building, New York University, New York, NY 10003, USA
- Simons Center for Computational Physical Chemistry, 24 Waverly Place, Silver Building, New York University, New York, NY 10003, USA
| | - Tamar Schlick
- Department of Chemistry, 100 Washington Square East, Silver Building, New York University, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, NY 10012, USA
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200122, China
- Simons Center for Computational Physical Chemistry, 24 Waverly Place, Silver Building, New York University, New York, NY 10003, USA
| |
Collapse
|
5
|
Liu T, Qiu QT, Hua KJ, Ma BG. Chromosome structure modeling tools and their evaluation in bacteria. Brief Bioinform 2024; 25:bbae044. [PMID: 38385874 PMCID: PMC10883143 DOI: 10.1093/bib/bbae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/31/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024] Open
Abstract
The three-dimensional (3D) structure of bacterial chromosomes is crucial for understanding chromosome function. With the growing availability of high-throughput chromosome conformation capture (3C/Hi-C) data, the 3D structure reconstruction algorithms have become powerful tools to study bacterial chromosome structure and function. It is highly desired to have a recommendation on the chromosome structure reconstruction tools to facilitate the prokaryotic 3D genomics. In this work, we review existing chromosome 3D structure reconstruction algorithms and classify them based on their underlying computational models into two categories: constraint-based modeling and thermodynamics-based modeling. We briefly compare these algorithms utilizing 3C/Hi-C datasets and fluorescence microscopy data obtained from Escherichia coli and Caulobacter crescentus, as well as simulated datasets. We discuss current challenges in the 3D reconstruction algorithms for bacterial chromosomes, primarily focusing on software usability. Finally, we briefly prospect future research directions for bacterial chromosome structure reconstruction algorithms.
Collapse
Affiliation(s)
- Tong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qin-Tian Qiu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Kang-Jian Hua
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Bin-Guang Ma
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| |
Collapse
|
6
|
Scalvini B, Mashaghi A. Circuit Topology Analysis of Single-Cell HiC Data. Methods Mol Biol 2024; 2819:27-38. [PMID: 39028500 DOI: 10.1007/978-1-0716-3930-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
The 3D fold structure of the genome is intricately linked to its function. As a result, descriptors of 3D genome conformation are becoming increasingly important as markers for disease and therapeutic responses. Circuit topology, a theory of folds, formalizes the arrangement of contacts in an entangled chain. It is uniquely suited for the topological description of the cellular genome and changes to genomic architecture during physiological processes like cellular differentiation or pathological and therapeutic alterations. In this discussion, we will explore circuit topology and its ability to extract topological information from single-cell HiC data.
Collapse
Affiliation(s)
- Barbara Scalvini
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands
- Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Leiden, The Netherlands
| | - Alireza Mashaghi
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.
- Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
7
|
Rothörl J, Brems MA, Stevens TJ, Virnau P. Reconstructing diploid 3D chromatin structures from single cell Hi-C data with a polymer-based approach. FRONTIERS IN BIOINFORMATICS 2023; 3:1284484. [PMID: 38148761 PMCID: PMC10750380 DOI: 10.3389/fbinf.2023.1284484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/24/2023] [Indexed: 12/28/2023] Open
Abstract
Detailed understanding of the 3D structure of chromatin is a key ingredient to investigate a variety of processes inside the cell. Since direct methods to experimentally ascertain these structures lack the desired spatial fidelity, computational inference methods based on single cell Hi-C data have gained significant interest. Here, we develop a progressive simulation protocol to iteratively improve the resolution of predicted interphase structures by maximum-likelihood association of ambiguous Hi-C contacts using lower-resolution predictions. Compared to state-of-the-art methods, our procedure is not limited to haploid cell data and allows us to reach a resolution of up to 5,000 base pairs per bead. High resolution chromatin models grant access to a multitude of structural phenomena. Exemplarily, we verify the formation of chromosome territories and holes near aggregated chromocenters as well as the inversion of the CpG content for rod photoreceptor cells.
Collapse
Affiliation(s)
- Jan Rothörl
- Institute of Physics, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Maarten A. Brems
- Institute of Physics, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Peter Virnau
- Institute of Physics, Johannes Gutenberg-Universität Mainz, Mainz, Germany
| |
Collapse
|
8
|
Zhou Y, Li T, Choppavarapu L, Jin VX. Integration of scHi-C and scRNA-seq data defines distinct 3D-regulated and biological-context dependent cell subpopulations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.29.560193. [PMID: 37873257 PMCID: PMC10592853 DOI: 10.1101/2023.09.29.560193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
An integration of 3D chromatin structure and gene expression at single-cell resolution has yet been demonstrated. Here, we develop a computational method, a multiomic data integration (MUDI) algorithm, which integrates scHi-C and scRNA-seq data to precisely define the 3D-regulated and biological-context dependent cell subpopulations or topologically integrated subpopulations (TISPs). We demonstrate its algorithmic utility on the publicly available and newly generated scHi-C and scRNA-seq data. We then test and apply MUDI in a breast cancer cell model system to demonstrate its biological-context dependent utility. We found the newly defined topologically conserved associating domain (CAD) is the characteristic single-cell 3D chromatin structure and better characterizes chromatin domains in single-cell resolution. We further identify 20 TISPs uniquely characterizing 3D-regulated breast cancer cellular states. We reveal two of TISPs are remarkably resemble to high cycling breast cancer persister cells and chromatin modifying enzymes might be functional regulators to drive the alteration of the 3D chromatin structures. Our comprehensive integration of scHi-C and scRNA-seq data in cancer cells at single-cell resolution provides mechanistic insights into 3D-regulated heterogeneity of developing drug-tolerant cancer cells.
Collapse
|
9
|
Wang Y, Guo Z, Cheng J. Single-cell Hi-C data enhancement with deep residual and generative adversarial networks. Bioinformatics 2023; 39:btad458. [PMID: 37498561 PMCID: PMC10403428 DOI: 10.1093/bioinformatics/btad458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/19/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION The spatial genome organization of a eukaryotic cell is important for its function. The development of single-cell technologies for probing the 3D genome conformation, especially single-cell chromosome conformation capture techniques, has enabled us to understand genome function better than before. However, due to extreme sparsity and high noise associated with single-cell Hi-C data, it is still difficult to study genome structure and function using the HiC-data of one single cell. RESULTS In this work, we developed a deep learning method ScHiCEDRN based on deep residual networks and generative adversarial networks for the imputation and enhancement of Hi-C data of a single cell. In terms of both image evaluation and Hi-C reproducibility metrics, ScHiCEDRN outperforms the four deep learning methods (DeepHiC, HiCPlus, HiCSR, and Loopenhance) on enhancing the raw single-cell Hi-C data of human and Drosophila. The experiments also show that it can generate single-cell Hi-C data more suitable for identifying topologically associating domain boundaries and reconstructing 3D chromosome structures than the existing methods. Moreover, ScHiCEDRN's performance generalizes well across different single cells and cell types, and it can be applied to improving population Hi-C data. AVAILABILITY AND IMPLEMENTATION The source code of ScHiCEDRN is available at the GitHub repository: https://github.com/BioinfoMachineLearning/ScHiCEDRN.
Collapse
Affiliation(s)
- Yanli Wang
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, United States
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
10
|
Li Z, Portillo-Ledesma S, Schlick T. Techniques for and challenges in reconstructing 3D genome structures from 2D chromosome conformation capture data. Curr Opin Cell Biol 2023; 83:102209. [PMID: 37506571 PMCID: PMC10529954 DOI: 10.1016/j.ceb.2023.102209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/07/2023] [Accepted: 06/26/2023] [Indexed: 07/30/2023]
Abstract
Chromosome conformation capture technologies that provide frequency information for contacts between genomic regions have been crucial for increasing our understanding of genome folding and regulation. However, such data do not provide direct evidence of the spatial 3D organization of chromatin. In this opinion article, we discuss the development and application of computational methods to reconstruct chromatin 3D structures from experimental 2D contact data, highlighting how such modeling provides biological insights and can suggest mechanisms anchored to experimental data. By applying different reconstruction methods to the same contact data, we illustrate some state-of-the-art of these techniques and discuss our gene resolution approach based on Brownian dynamics and Monte Carlo sampling.
Collapse
Affiliation(s)
- Zilong Li
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Stephanie Portillo-Ledesma
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 100 Washington Square East, Silver Building, New York, 10003, NY, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St., New York, 10012, NY, USA; New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Room 340, Geography Building, 3663 North Zhongshan Road, Shanghai, 200122, China; Simons Center for Computational Physical Chemistry, New York University, 24 Waverly Place, Silver Building, New York, NY, 10003, USA.
| |
Collapse
|
11
|
Habeck M. Bayesian methods in integrative structure modeling. Biol Chem 2023; 404:741-754. [PMID: 37505205 DOI: 10.1515/hsz-2023-0145] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 07/07/2023] [Indexed: 07/29/2023]
Abstract
There is a growing interest in characterizing the structure and dynamics of large biomolecular assemblies and their interactions within the cellular environment. A diverse array of experimental techniques allows us to study biomolecular systems on a variety of length and time scales. These techniques range from imaging with light, X-rays or electrons, to spectroscopic methods, cross-linking mass spectrometry and functional genomics approaches, and are complemented by AI-assisted protein structure prediction methods. A challenge is to integrate all of these data into a model of the system and its functional dynamics. This review focuses on Bayesian approaches to integrative structure modeling. We sketch the principles of Bayesian inference, highlight recent applications to integrative modeling and conclude with a discussion of current challenges and future perspectives.
Collapse
Affiliation(s)
- Michael Habeck
- Microscopic Image Analysis Group, Jena University Hospital, D-07743 Jena, Germany
- Max Planck Institute for Multidisciplinary Sciences, d-37077 Göttingen, Germany
| |
Collapse
|
12
|
Integrative genome modeling platform reveals essentiality of rare contact events in 3D genome organizations. Nat Methods 2022; 19:938-949. [PMID: 35817938 PMCID: PMC9349046 DOI: 10.1038/s41592-022-01527-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 05/18/2022] [Indexed: 02/07/2023]
Abstract
A multitude of sequencing-based and microscopy technologies provide the means to unravel the relationship between the three-dimensional organization of genomes and key regulatory processes of genome function. Here, we develop a multimodal data integration approach to produce populations of single-cell genome structures that are highly predictive for nuclear locations of genes and nuclear bodies, local chromatin compaction and spatial segregation of functionally related chromatin. We demonstrate that multimodal data integration can compensate for systematic errors in some of the data and can greatly increase accuracy and coverage of genome structure models. We also show that alternative combinations of different orthogonal data sources can converge to models with similar predictive power. Moreover, our study reveals the key contributions of low-frequency (‘rare’) interchromosomal contacts to accurately predicting the global nuclear architecture, including the positioning of genes and chromosomes. Overall, our results highlight the benefits of multimodal data integration for genome structure analysis, available through the Integrative Genome Modeling software package. The Integrative Genome Modeling platform is a tool for population-based three-dimensional genome structure modeling and analysis by integrating various experimental data sources.
Collapse
|
13
|
Yildirim A, Boninsegna L, Zhan Y, Alber F. Uncovering the Principles of Genome Folding by 3D Chromatin Modeling. Cold Spring Harb Perspect Biol 2022; 14:a039693. [PMID: 34400556 PMCID: PMC9248826 DOI: 10.1101/cshperspect.a039693] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Our understanding of how genomic DNA is tightly packed inside the nucleus, yet is still accessible for vital cellular processes, has grown dramatically over recent years with advances in microscopy and genomics technologies. Computational methods have played a pivotal role in the structural interpretation of experimental data, which helped unravel some organizational principles of genome folding. Here, we give an overview of current computational efforts in mechanistic and data-driven 3D chromatin structure modeling. We discuss strengths and limitations of different methods and evaluate the added value and benefits of computational approaches to infer the 3D structural and dynamic properties of the genome and its underlying mechanisms at different scales and resolution, ranging from the dynamic formation of chromatin loops and topological associated domains to nuclear compartmentalization of chromatin and nuclear bodies.
Collapse
Affiliation(s)
- Asli Yildirim
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, California 90095, USA
- Quantitative and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
14
|
HiCImpute: A Bayesian hierarchical model for identifying structural zeros and enhancing single cell Hi-C data. PLoS Comput Biol 2022; 18:e1010129. [PMID: 35696429 PMCID: PMC9232133 DOI: 10.1371/journal.pcbi.1010129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 06/24/2022] [Accepted: 04/21/2022] [Indexed: 11/19/2022] Open
Abstract
Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicating the matter further is the fact that not all zeros are created equal: some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros); others are indeed due to insufficient sequencing depth (sampling zeros or dropouts), especially for loci that interact infrequently. Differentiating between structural zeros and dropouts is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchical model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data have led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.
Collapse
|
15
|
Segal MR. Can 3D diploid genome reconstruction from unphased Hi-C data be salvaged? NAR Genom Bioinform 2022; 4:lqac038. [PMID: 35571676 PMCID: PMC9097817 DOI: 10.1093/nargab/lqac038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/31/2022] [Accepted: 04/29/2022] [Indexed: 11/13/2022] Open
Abstract
The three-dimensional (3D) configuration of chromatin impacts numerous cellular processes. However, directly observing chromatin architecture at high resolution is challenging. Accordingly, inferring 3D structure utilizing chromatin conformation capture assays, notably Hi-C, has received considerable attention, with a multitude of reconstruction algorithms advanced. While these have enhanced appreciation of chromatin organization, most suffer from a serious shortcoming when faced with diploid genomes: inability to disambiguate contacts between corresponding loci on homologous chromosomes, making attendant reconstructions potentially meaningless. Three recent proposals offer a computational way forward at the expense of strong assumptions. Here, we show that making plausible assumptions about the components of homologous chromosome contacts provides a basis for rescuing conventional consensus-based, unphased reconstruction. This would be consequential since not only are assumptions needed for diploid reconstruction considerable, but the sophistication of select unphased algorithms affords substantive advantages with regard resolution and folding complexity. Rather than presuming that the requisite salvaging assumptions are met, we exploit a recent imaging technology, in situ genome sequencing (IGS), to comprehensively evaluate their reasonableness. We analogously use IGS to assess assumptions underpinning diploid reconstruction algorithms. Results convincingly demonstrate that, in all instances, assumptions are not met, making further algorithm development, potentially informed by IGS data, essential.
Collapse
Affiliation(s)
- Mark R Segal
- Department of Epidemiology and Biostatistics, University of California, 550 16th Street, San Francisco, CA 94143-0560, USA
| |
Collapse
|
16
|
Mapping nucleosome and chromatin architectures: A survey of computational methods. Comput Struct Biotechnol J 2022; 20:3955-3962. [PMID: 35950186 PMCID: PMC9340519 DOI: 10.1016/j.csbj.2022.07.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 07/22/2022] [Accepted: 07/22/2022] [Indexed: 11/21/2022] Open
Abstract
With ever-growing genomic sequencing data, the data variabilities and the underlying biases of the sequencing technologies pose significant computational challenges ranging from the need for accurately detecting the nucleosome positioning or chromatin interaction to the need for developing normalization methods to eliminate systematic biases. This review mainly surveys the computational methods for mapping the higher-resolution nucleosome and higher-order chromatin architectures. While a detailed discussion of the underlying algorithms is beyond the scope of our survey, we have discussed the methods and tools that can detect the nucleosomes in the genome, then demonstrated the computational methods for identifying 3D chromatin domains and interactions. We further illustrated computational approaches for integrating multi-omics data with Hi-C data and the advance of single-cell (sc)Hi-C data analysis. Our survey provides a comprehensive and valuable resource for biomedical scientists interested in studying nucleosome organization and chromatin structures as well as for computational scientists who are interested in improving upon them.
Collapse
|
17
|
Boninsegna L, Yildirim A, Zhan Y, Alber F. Integrative approaches in genome structure analysis. Structure 2021; 30:24-36. [PMID: 34963059 DOI: 10.1016/j.str.2021.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 11/13/2021] [Accepted: 12/01/2021] [Indexed: 12/17/2022]
Abstract
New technological advances in integrated imaging, sequencing-based assays, and computational analysis have revolutionized our view of genomes in terms of their structure and dynamics in space and time. These advances promise a deeper understanding of genome functions and mechanistic insights into how the nucleus is spatially organized and functions. These wide arrays of complementary data provide an opportunity to produce quantitative integrative models of nuclear organization. In this article, we highlight recent key developments and discuss the outlook for these fields.
Collapse
Affiliation(s)
- Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Asli Yildirim
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
18
|
Wu H, Wu Y, Jiang Y, Zhou B, Zhou H, Chen Z, Xiong Y, Liu Q, Zhang H. scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Brief Bioinform 2021; 23:6374065. [PMID: 34553746 DOI: 10.1093/bib/bbab396] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yuhong Jiang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhongli Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
19
|
Zha M, Wang N, Zhang C, Wang Z. Inferring Single-Cell 3D Chromosomal Structures Based on the Lennard-Jones Potential. Int J Mol Sci 2021; 22:ijms22115914. [PMID: 34072879 PMCID: PMC8199262 DOI: 10.3390/ijms22115914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 05/23/2021] [Accepted: 05/28/2021] [Indexed: 11/16/2022] Open
Abstract
Reconstructing three-dimensional (3D) chromosomal structures based on single-cell Hi-C data is a challenging scientific problem due to the extreme sparseness of the single-cell Hi-C data. In this research, we used the Lennard-Jones potential to reconstruct both 500 kb and high-resolution 50 kb chromosomal structures based on single-cell Hi-C data. A chromosome was represented by a string of 500 kb or 50 kb DNA beads and put into a 3D cubic lattice for simulations. A 2D Gaussian function was used to impute the sparse single-cell Hi-C contact matrices. We designed a novel loss function based on the Lennard-Jones potential, in which the ε value, i.e., the well depth, was used to indicate how stable the binding of every pair of beads is. For the bead pairs that have single-cell Hi-C contacts and their neighboring bead pairs, the loss function assigns them stronger binding stability. The Metropolis-Hastings algorithm was used to try different locations for the DNA beads, and simulated annealing was used to optimize the loss function. We proved the correctness and validness of the reconstructed 3D structures by evaluating the models according to multiple criteria and comparing the models with 3D-FISH data.
Collapse
Affiliation(s)
- Mengsheng Zha
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Nan Wang
- Department of Computer Science, New Jersey City University, 2039 Kennedy Blvd, Jersey City, NJ 07305, USA;
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, University of Southern Mississippi, 118 College Dr, Hattiesburg, MS 39406, USA; (M.Z.); (C.Z.)
| | - Zheng Wang
- Department of Computer Science, University of Miami, 1364 Memorial Drive, Coral Gables, FL 33124, USA
- Correspondence:
| |
Collapse
|
20
|
Soto CJ, Zhao PA, Klein KN, Gilbert DM, Srivastava A. STATISTICAL COMPARISONS OF CHROMOSOMAL SHAPE POPULATIONS. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2021; 2021:788-791. [PMID: 35165532 PMCID: PMC8840943 DOI: 10.1109/isbi48211.2021.9433812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper develops statistical tools for testing differences in shapes of chromosomes resulting from certain gene knockouts (KO), specifically RIF1 gene KO (RKO) and the cohesin subunit RAD21 gene KO (CKO). It utilizes a two-sample test for comparing shapes of KO chromosomes with wild type (WT) at two levels: (1) Coarse shape analysis, where one compares shapes of full or large parts of chromosomes, and (2) Fine shape analysis, where chromosomes are first segmented into (TAD-based) pieces and then the corresponding pieces are compared across populations. The shape comparisons - coarse and fine - are based on an elastic shape metric for comparing shapes of 3D curves. The experiments show that the KO populations, RKO and CKO, have statistically significant differences from WT at both coarse and fine levels. Furthermore, this framework highlights local regions where these differences are most prominent.
Collapse
Affiliation(s)
- Carlos J Soto
- Department of Statistics, Pennsylvania State University, State College, PA, USA
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Peiyao A Zhao
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Kyle N Klein
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, FL, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| |
Collapse
|
21
|
Soto C, Bryner D, Neretti N, Srivastava A. Toward a Three-Dimensional Chromosome Shape Alphabet. J Comput Biol 2021; 28:601-618. [PMID: 33720766 DOI: 10.1089/cmb.2020.0383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The study of the three-dimensional (3D) structure of chromosomes-the largest macromolecules in biology-is one of the most challenging to date in structural biology. Here, we develop a novel representation of 3D chromosome structures, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet from an ensemble of chromosome 3D structures inferred from Hi-C data-via SIMBA3D or other methods-by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs' 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof of concept for these CSLs by reconstructing independent test curves by using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize shapes in an ensemble of chromosome 3D structures by using generalized sequence logos.
Collapse
Affiliation(s)
- Carlos Soto
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| | - Darshan Bryner
- Naval Surface Warfare Center Panama City Division, Panama City, Florida, USA
| | - Nicola Neretti
- Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Anuj Srivastava
- Department of Statistics, Florida State University, Tallahassee, Florida, USA
| |
Collapse
|
22
|
Tuzhilina E, Hastie TJ, Segal MR. Principal curve approaches for inferring 3D chromatin architecture. Biostatistics 2020; 23:626-642. [PMID: 33221831 DOI: 10.1093/biostatistics/kxaa046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 09/26/2020] [Accepted: 09/29/2020] [Indexed: 11/13/2022] Open
Abstract
Three-dimensional (3D) genome spatial organization is critical for numerous cellular processes, including transcription, while certain conformation-driven structural alterations are frequently oncogenic. Genome architecture had been notoriously difficult to elucidate, but the advent of the suite of chromatin conformation capture assays, notably Hi-C, has transformed understanding of chromatin structure and provided downstream biological insights. Although many findings have flowed from direct analysis of the pairwise proximity data produced by these assays, there is added value in generating corresponding 3D reconstructions deriving from superposing genomic features on the reconstruction. Accordingly, many methods for inferring 3D architecture from proximity data have been advanced. However, none of these approaches exploit the fact that single chromosome solutions constitute a one-dimensional (1D) curve in 3D. Rather, this aspect has either been addressed by imposition of constraints, which is both computationally burdensome and cell type specific, or ignored with contiguity imposed after the fact. Here, we target finding a 1D curve by extending principal curve methodology to the metric scaling problem. We illustrate how this approach yields a sequence of candidate solutions, indexed by an underlying smoothness or degrees-of-freedom parameter, and propose methods for selection from this sequence. We apply the methodology to Hi-C data obtained on IMR90 cells and so are positioned to evaluate reconstruction accuracy by referencing orthogonal imaging data. The results indicate the utility and reproducibility of our principal curve approach in the face of underlying structural variation.
Collapse
Affiliation(s)
- Elena Tuzhilina
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| | - Trevor J Hastie
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| | - Mark R Segal
- Department of Statistics, Stanford University, Stanford, CA 94305, USA and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94143, USA
| |
Collapse
|
23
|
Oluwadare O, Highsmith M, Turner D, Lieberman Aiden E, Cheng J. GSDB: a database of 3D chromosome and genome structures reconstructed from Hi-C data. BMC Mol Cell Biol 2020; 21:60. [PMID: 32758136 PMCID: PMC7405446 DOI: 10.1186/s12860-020-00304-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 07/29/2020] [Indexed: 11/10/2022] Open
Abstract
Advances in the study of chromosome conformation capture technologies, such as Hi-C technique - capable of capturing chromosomal interactions in a genome-wide scale - have led to the development of three-dimensional chromosome and genome structure reconstruction methods from Hi-C data. The three dimensional genome structure is important because it plays a role in a variety of important biological activities such as DNA replication, gene regulation, genome interaction, and gene expression. In recent years, numerous Hi-C datasets have been generated, and likewise, a number of genome structure construction algorithms have been developed. In this work, we outline the construction of a novel Genome Structure Database (GSDB) to create a comprehensive repository that contains 3D structures for Hi-C datasets constructed by a variety of 3D structure reconstruction tools. The GSDB contains over 50,000 structures from 12 state-of-the-art Hi-C data structure prediction algorithms for 32 Hi-C datasets. GSDB functions as a centralized collection of genome structures which will enable the exploration of the dynamic architectures of chromosomes and genomes for biomedical research. GSDB is accessible at http://sysbio.rnet.missouri.edu/3dgenome/GSDB
Collapse
Affiliation(s)
- Oluwatosin Oluwadare
- Department of Computer Science, University of Colorado, Colorado Springs, CO, 80918, USA
| | - Max Highsmith
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Douglass Turner
- Elastic Image Software LLC, 21 Walnut Street, Lexington, MA, 02421, USA
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|