1
|
龚 海, 麻 付, 张 晓. [Advances in methods and applications of single-cell Hi-C data analysis]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2023; 40:1033-1039. [PMID: 37879935 PMCID: PMC10600426 DOI: 10.7507/1001-5515.202303046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 08/29/2023] [Indexed: 10/27/2023]
Abstract
Chromatin three-dimensional genome structure plays a key role in cell function and gene regulation. Single-cell Hi-C techniques can capture genomic structure information at the cellular level, which provides an opportunity to study changes in genomic structure between different cell types. Recently, some excellent computational methods have been developed for single-cell Hi-C data analysis. In this paper, the available methods for single-cell Hi-C data analysis were first reviewed, including preprocessing of single-cell Hi-C data, multi-scale structure recognition based on single-cell Hi-C data, bulk-like Hi-C contact matrix generation based on single-cell Hi-C data sets, pseudo-time series analysis, and cell classification. Then the application of single-cell Hi-C data in cell differentiation and structural variation was described. Finally, the future development direction of single-cell Hi-C data analysis was also prospected.
Collapse
Affiliation(s)
- 海燕 龚
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 付强 麻
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
| | - 晓彤 张
- 北京科技大学 新材料技术研究院 (北京 100083)Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing 100083, P. R. China
- 北京科技大学 计算机与通信工程学院(北京 100083)School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, P. R. China
| |
Collapse
|
2
|
Fan S, Dang D, Ye Y, Zhang SW, Gao L, Zhang S. scHi-CSim: a flexible simulator that generates high-fidelity single-cell Hi-C data for benchmarking. J Mol Cell Biol 2023; 15:mjad003. [PMID: 36708167 PMCID: PMC10308180 DOI: 10.1093/jmcb/mjad003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 09/18/2022] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Single-cell Hi-C technology provides an unprecedented opportunity to reveal chromatin structure in individual cells. However, high sequencing cost impedes the generation of biological Hi-C data with high sequencing depths and multiple replicates for downstream analysis. Here, we developed a single-cell Hi-C simulator (scHi-CSim) that generates high-fidelity data for benchmarking. scHi-CSim merges neighboring cells to overcome the sparseness of data, samples interactions in distance-stratified chromosomes to maintain the heterogeneity of single cells, and estimates the empirical distribution of restriction fragments to generate simulated data. We demonstrated that scHi-CSim can generate high-fidelity data by comparing the performance of single-cell clustering and detection of chromosomal high-order structures with raw data. Furthermore, scHi-CSim is flexible to change sequencing depth and the number of simulated replicates. We showed that increasing sequencing depth could improve the accuracy of detecting topologically associating domains. We also used scHi-CSim to generate a series of simulated datasets with different sequencing depths to benchmark scHi-C clustering methods.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Dachang Dang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an 710072, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
- Key Laboratory of Systems Health Science of Zhejiang Province, School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, China
| |
Collapse
|
3
|
Ye Y, Zhang S, Gao L, Zhu Y, Zhang J. Deciphering Hierarchical Chromatin Domains and Preference of Genomic Position Forming Boundaries in Single Mouse Embryonic Stem Cells. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2205162. [PMID: 36658736 PMCID: PMC10015865 DOI: 10.1002/advs.202205162] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 12/15/2022] [Indexed: 06/17/2023]
Abstract
The exploration of single-cell 3D genome maps reveals that chromatin domains are indeed physical structures presenting in single cells, and domain boundaries vary from cell to cell. However, systematic analysis of the association between regulatory factor binding and elements and the formation of chromatin domains in single cells has not yet emerged. To this end, a hierarchical chromatin domain structure identification algorithm (named as HiCS) is first developed from individual single-cell Hi-C maps, with superior performance in both accuracy and efficiency. The results suggest that in addition to the known CTCF-cohesin complex, Polycomb, TrxG, pluripotent protein families, and other multiple factors also contribute to shaping chromatin domain boundaries in single embryonic stem cells. Different cooperation patterns of these regulatory factors drive genomic position categories with differential preferences forming boundaries, and the most extensive six types of retrotransposons are differentially distributed in these genomic position categories with preferential localization. The above results suggest that these different retrotransposons within genomic regions interplay with regulatory factors navigating the preference of genomic positions forming boundaries, driving the formation of higher-order chromatin structures, and thus regulating cell functions in single mouse embryonic stem cells.
Collapse
Affiliation(s)
- Yusen Ye
- School of Computer Science and TechnologyXidian UniversityXi'anShaanxi710071P. R. China
| | - Shihua Zhang
- NCMISCEMSRCSDSAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190P. R. China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049P. R. China
- Center for Excellence in Animal Evolution and GeneticsChinese Academy of SciencesKunming650223P. R. China
| | - Lin Gao
- School of Computer Science and TechnologyXidian UniversityXi'anShaanxi710071P. R. China
| | - Yuqing Zhu
- Center for Stem Cell and Translational MedicineSchool of Life SciencesAnhui UniversityHefeiAnhui230601P. R. China
| | - Jin Zhang
- Center for Stem Cell and Regenerative MedicineDepartment of Basic Medical Sciences, and Bone Marrow Transplantation Center of the First Affiliated HospitalZhejiang University School of MedicineHangzhouZhejiang310003P. R. China
- Zhejiang Laboratory for Systems and Precision MedicineZhejiang University Medical CenterHangzhouZhejiang311121P. R. China
- Institute of HematologyZhejiang UniversityHangzhouZhejiang310058P. R. China
- Center of Gene/Cell Engineering and Genome MedicineHangzhouZhejiang310058P. R. China
| |
Collapse
|
4
|
Lyu H, Liu E, Wu Z, Li Y, Liu Y, Yin X. scHiCPTR: unsupervised pseudotime inference through dual graph refinement for single-cell Hi-C data. Bioinformatics 2022; 38:5151-5159. [PMID: 36205615 DOI: 10.1093/bioinformatics/btac670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 08/25/2022] [Accepted: 10/05/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The emerging single-cell Hi-C technology provides opportunities to study dynamics of chromosomal organization. How to construct a pseudotime path using single-cell Hi-C contact matrices to order cells along developmental trajectory is a challenging topic, since these matrices produced by the technology are inherently high dimensional and sparse, they suffer from noises and biases, and the topology of trajectory underlying them may be diverse. RESULTS We present scHiCPTR, an unsupervised graph-based pipeline to infer pseudotime from single-cell Hi-C contact matrices. It provides a workflow consisting of imputation and embedding, graph construction, dual graph refinement, pseudotime calculation and result visualization. Beyond the few existing methods, scHiCPTR ties to optimize graph structure by two parallel procedures of graph pruning, which help reduce the spurious cell links resulted from noises and determine a global developmental directionality. Besides, it has an ability to handle developmental trajectories with multiple topologies, including linear, bifurcated and circular ones, and is competitive with methods developed for single-cell RNA-seq data. The comparative results tell that our scHiCPTR can achieve higher performance in pseudotime inference, and the inferred developmental trajectory exhibit a reasonable biological significance. AVAILABILITY AND IMPLEMENTATION scHiCPTR is freely available at https://github.com/lhqxinghun/scHiCPTR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hongqiang Lyu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Erhu Liu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Zhifang Wu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Yao Li
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Yuan Liu
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Shaanxi 710049, China
| | - Xiaoran Yin
- Department of Oncology, The Second Affiliated Hospital of Xi'an Jiaotong University, Shaanxi 710004, China
| |
Collapse
|
5
|
Integrated investigation of DNA methylation, gene expression and immune cell population revealed immune cell infiltration associated with atherosclerotic plaque formation. BMC Med Genomics 2022; 15:108. [PMID: 35534881 PMCID: PMC9082837 DOI: 10.1186/s12920-022-01259-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 05/03/2022] [Indexed: 11/30/2022] Open
Abstract
Background The clinical consequences of atherosclerosis are significant source of morbidity and mortality throughout the world, while the molecular mechanisms of the pathogenesis of atherosclerosis are largely unknown. Methods In this study, we integrated the DNA methylation and gene expression data in atherosclerotic plaque samples to decipher the underlying association between epigenetic and transcriptional regulation. Immune cell classification was performed on the basis of the expression pattern of detected genes. Finally, we selected ten genes with dysregulated methylation and expression levels for RT-qPCR validation. Results Global DNA methylation profile showed obvious changes between normal aortic and atherosclerotic lesion tissues. We found that differentially methylated genes (DMGs) and differentially expressed genes (DEGs) were highly associated with atherosclerosis by being enriched in atherosclerotic plaque formation-related pathways, including cell adhesion and extracellular matrix organization. Immune cell fraction analysis revealed that a large number of immune cells, especially macrophages, activated mast cells, NK cells, and Tfh cells, were specifically enriched in the plaque. DEGs associated with immune cell fraction change showed that they were mainly related to the level of macrophages, monocytes, resting NK cells, activated CD4 memory T cells, and gamma delta T cells. These genes were highly enriched in multiple pathways of atherosclerotic plaque formation, including blood vessel remodeling, collagen fiber organization, cell adhesion, collagen catalogic process, extractable matrix assembly, and platelet activation. We also validated the expression alteration of ten genes associated with infiltrating immune cells in atherosclerosis. Conclusions In conclusion, these findings provide new evidence for understanding the mechanisms of atherosclerotic plaque formation, and provide a new and valuable research direction based on immune cell infiltration. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01259-z.
Collapse
|
6
|
Boninsegna L, Yildirim A, Zhan Y, Alber F. Integrative approaches in genome structure analysis. Structure 2021; 30:24-36. [PMID: 34963059 DOI: 10.1016/j.str.2021.12.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 11/13/2021] [Accepted: 12/01/2021] [Indexed: 12/17/2022]
Abstract
New technological advances in integrated imaging, sequencing-based assays, and computational analysis have revolutionized our view of genomes in terms of their structure and dynamics in space and time. These advances promise a deeper understanding of genome functions and mechanistic insights into how the nucleus is spatially organized and functions. These wide arrays of complementary data provide an opportunity to produce quantitative integrative models of nuclear organization. In this article, we highlight recent key developments and discuss the outlook for these fields.
Collapse
Affiliation(s)
- Lorenzo Boninsegna
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Asli Yildirim
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Yuxiang Zhan
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Frank Alber
- Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA.
| |
Collapse
|
7
|
Wu H, Wu Y, Jiang Y, Zhou B, Zhou H, Chen Z, Xiong Y, Liu Q, Zhang H. scHiCStackL: a stacking ensemble learning-based method for single-cell Hi-C classification using cell embedding. Brief Bioinform 2021; 23:6374065. [PMID: 34553746 DOI: 10.1093/bib/bbab396] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell Hi-C data are a common data source for studying the differences in the three-dimensional structure of cell chromosomes. The development of single-cell Hi-C technology makes it possible to obtain batches of single-cell Hi-C data. How to quickly and effectively discriminate cell types has become one hot research field. However, the existing computational methods to predict cell types based on Hi-C data are found to be low in accuracy. Therefore, we propose a high accuracy cell classification algorithm, called scHiCStackL, based on single-cell Hi-C data. In our work, we first improve the existing data preprocessing method for single-cell Hi-C data, which allows the generated cell embedding better to represent cells. Then, we construct a two-layer stacking ensemble model for classifying cells. Experimental results show that the cell embedding generated by our data preprocessing method increases by 0.23, 1.22, 1.46 and 1.61$\%$ comparing with the cell embedding generated by the previously published method scHiCluster, in terms of the Acc, MCC, F1 and Precision confidence intervals, respectively, on the task of classifying human cells in the ML1 and ML3 datasets. When using the two-layer stacking ensemble framework with the cell embedding, scHiCStackL improves by 13.33, 19, 19.27 and 14.5 over the scHiCluster, in terms of the Acc, ARI, NMI and F1 confidence intervals, respectively. In summary, scHiCStackL achieves superior performance in predicting cell types using the single-cell Hi-C data. The webserver and source code of scHiCStackL are freely available at http://hww.sdu.edu.cn:8002/scHiCStackL/ and https://github.com/HaoWuLab-Bioinformatics/scHiCStackL, respectively.
Collapse
Affiliation(s)
- Hao Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China.,School of Software, Shandong University, Jinan, 250101, Shandong, China
| | - Yingfu Wu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yuhong Jiang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Bing Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Haoru Zhou
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Zhongli Chen
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 200240, Shanghai, China
| | - Quanzhong Liu
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| | - Hongming Zhang
- College of Information Engineering, Northwest A&F University, Yangling, 712100, Shaanxi, China
| |
Collapse
|
8
|
Todorov H, Cannoodt R, Saelens W, Saeys Y. TinGa: fast and flexible trajectory inference with Growing Neural Gas. Bioinformatics 2021; 36:i66-i74. [PMID: 32657409 PMCID: PMC7355244 DOI: 10.1093/bioinformatics/btaa463] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
MOTIVATION During the last decade, trajectory inference (TI) methods have emerged as a novel framework to model cell developmental dynamics, most notably in the area of single-cell transcriptomics. At present, more than 70 TI methods have been published, and recent benchmarks showed that even state-of-the-art methods only perform well for certain trajectory types but not others. RESULTS In this work, we present TinGa, a new TI model that is fast and flexible, and that is based on Growing Neural Graphs. We performed an extensive comparison of TinGa to five state-of-the-art methods for TI on a set of 250 datasets, including both synthetic as well as real datasets. Overall, TinGa improves the state-of-the-art by producing accurate models (comparable to or an improvement on the state-of-the-art) on the whole spectrum of data complexity, from the simplest linear datasets to the most complex disconnected graphs. In addition, TinGa obtained the fastest execution times, showing that our method is thus one of the most versatile methods up to date. AVAILABILITY AND IMPLEMENTATION R scripts for running TinGa, comparing it to top existing methods and generating the figures of this article are available at https://github.com/Helena-todd/TinGa.
Collapse
Affiliation(s)
- Helena Todorov
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent 9052, Belgium.,Centre International de recherche en Infectiologie, Université de Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, 69007 Lyon, France
| | - Robrecht Cannoodt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent 9052, Belgium
| | - Wouter Saelens
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent 9052, Belgium
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent 9000, Belgium.,Data Mining and Modeling for Biomedicine, VIB Center for Inflammation Research, Ghent 9052, Belgium
| |
Collapse
|