1
|
Arthur TD, Nguyen JP, D'Antonio-Chronowska A, Matsui H, Silva NS, Joshua IN, Luchessi AD, Greenwald WWY, D'Antonio M, Pera MF, Frazer KA. Complex regulatory networks influence pluripotent cell state transitions in human iPSCs. Nat Commun 2024; 15:1664. [PMID: 38395976 PMCID: PMC10891157 DOI: 10.1038/s41467-024-45506-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Accepted: 01/26/2024] [Indexed: 02/25/2024] Open
Abstract
Stem cells exist in vitro in a spectrum of interconvertible pluripotent states. Analyzing hundreds of hiPSCs derived from different individuals, we show the proportions of these pluripotent states vary considerably across lines. We discover 13 gene network modules (GNMs) and 13 regulatory network modules (RNMs), which are highly correlated with each other suggesting that the coordinated co-accessibility of regulatory elements in the RNMs likely underlie the coordinated expression of genes in the GNMs. Epigenetic analyses reveal that regulatory networks underlying self-renewal and pluripotency are more complex than previously realized. Genetic analyses identify thousands of regulatory variants that overlapped predicted transcription factor binding sites and are associated with chromatin accessibility in the hiPSCs. We show that the master regulator of pluripotency, the NANOG-OCT4 Complex, and its associated network are significantly enriched for regulatory variants with large effects, suggesting that they play a role in the varying cellular proportions of pluripotency states between hiPSCs. Our work bins tens of thousands of regulatory elements in hiPSCs into discrete regulatory networks, shows that pluripotency and self-renewal processes have a surprising level of regulatory complexity, and suggests that genetic factors may contribute to cell state transitions in human iPSC lines.
Collapse
Affiliation(s)
- Timothy D Arthur
- Biomedical Sciences Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Jennifer P Nguyen
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | | | - Hiroko Matsui
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - Nayara S Silva
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
| | - Isaac N Joshua
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | - André D Luchessi
- Northeast Biotechnology Network (RENORBIO), Graduate Program in Biotechnology, Federal University of Rio Grande do Norte, Natal, Brazil
- Department of Clinical and Toxicological Analysis, Federal University of Rio Grande do Norte, Natal, Brazil
| | - William W Young Greenwald
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA, 92093, USA
| | - Matteo D'Antonio
- Division of Biomedical Informatics, University of California, San Diego, La Jolla, CA, 92093, USA
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA
| | | | - Kelly A Frazer
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
- Institute of Genomic Medicine, University of California San Diego, 9500 Gilman Dr, La Jolla, CA, 92093, USA.
| |
Collapse
|
2
|
Li C, Chen X, Chen S, Jiang R, Zhang X. simCAS: an embedding-based method for simulating single-cell chromatin accessibility sequencing data. Bioinformatics 2023; 39:btad453. [PMID: 37494428 PMCID: PMC10394124 DOI: 10.1093/bioinformatics/btad453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/25/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Single-cell chromatin accessibility sequencing (scCAS) technology provides an epigenomic perspective to characterize gene regulatory mechanisms at single-cell resolution. With an increasing number of computational methods proposed for analyzing scCAS data, a powerful simulation framework is desirable for evaluation and validation of these methods. However, existing simulators generate synthetic data by sampling reads from real data or mimicking existing cell states, which is inadequate to provide credible ground-truth labels for method evaluation. RESULTS We present simCAS, an embedding-based simulator, for generating high-fidelity scCAS data from both cell- and peak-wise embeddings. We demonstrate simCAS outperforms existing simulators in resembling real data and show that simCAS can generate cells of different states with user-defined cell populations and differentiation trajectories. Additionally, simCAS can simulate data from different batches and encode user-specified interactions of chromatin regions in the synthetic data, which provides ground-truth labels more than cell states. We systematically demonstrate that simCAS facilitates the benchmarking of four core tasks in downstream analysis: cell clustering, trajectory inference, data integration, and cis-regulatory interaction inference. We anticipate simCAS will be a reliable and flexible simulator for evaluating the ongoing computational methods applied on scCAS data. AVAILABILITY AND IMPLEMENTATION simCAS is freely available at https://github.com/Chen-Li-17/simCAS.
Collapse
Affiliation(s)
- Chen Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
- Center for Synthetic and Systems Biology, School of Life Sciences and School of Medicine, Tsinghua University, Beijing 100084, China
| |
Collapse
|
3
|
scEpiLock: A Weakly Supervised Learning Framework for cis-Regulatory Element Localization and Variant Impact Quantification for Single-Cell Epigenetic Data. Biomolecules 2022; 12:biom12070874. [PMID: 35883430 PMCID: PMC9312957 DOI: 10.3390/biom12070874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/16/2022] [Accepted: 06/16/2022] [Indexed: 02/04/2023] Open
Abstract
Recent advances in single-cell transposase-accessible chromatin using a sequencing assay (scATAC-seq) allow cellular heterogeneity dissection and regulatory landscape reconstruction with an unprecedented resolution. However, compared to bulk-sequencing, its ultra-high missingness remarkably reduces usable reads in each cell type, resulting in broader, fuzzier peak boundary definitions and limiting our ability to pinpoint functional regions and interpret variant impacts precisely. We propose a weakly supervised learning method, scEpiLock, to directly identify core functional regions from coarse peak labels and quantify variant impacts in a cell-type-specific manner. First, scEpiLock uses a multi-label classifier to predict chromatin accessibility via a deep convolutional neural network. Then, its weakly supervised object detection module further refines the peak boundary definition using gradient-weighted class activation mapping (Grad-CAM). Finally, scEpiLock provides cell-type-specific variant impacts within a given peak region. We applied scEpiLock to various scATAC-seq datasets and found that it achieves an area under receiver operating characteristic curve (AUC) of ~0.9 and an area under precision recall (AUPR) above 0.7. Besides, scEpiLock’s object detection condenses coarse peaks to only ⅓ of their original size while still reporting higher conservation scores. In addition, we applied scEpiLock on brain scATAC-seq data and reported several genome-wide association studies (GWAS) variants disrupting regulatory elements around known risk genes for Alzheimer’s disease, demonstrating its potential to provide cell-type-specific biological insights in disease studies.
Collapse
|
4
|
Shan X, Chen J, Dong K, Zhou W, Zhang S. Deciphering the Spatial Modular Patterns of Tissues by Integrating Spatial and Single-Cell Transcriptomic Data. J Comput Biol 2022; 29:650-663. [PMID: 35727094 DOI: 10.1089/cmb.2021.0617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) provides a powerful tool to analyze the expression level of tissues at a cellular resolution. However, it could not capture the spatial organization of cells in a tissue. The spatially resolved transcriptomics technologies (ST) have been developed to address this issue. However, the emerging STs are still inefficient at single-cell resolution and/or fail to capture the sufficient reads. To this end, we adopted a partial least squares-based method (spatial modular patterns [SpaMOD]) to simultaneously integrate the two data modalities, as well as the networks related to cells and spots, to identify the cell-spot comodules for deciphering the SpaMOD of tissues. We applied SpaMOD to three paired scRNA-seq and ST datasets, derived from the mouse brain, granuloma, and pancreatic ductal adenocarcinoma, respectively. The identified cell-spot comodules provide detailed biological insights into the spatial relationships between cell populations and their spatial locations in the tissue.
Collapse
Affiliation(s)
- Xu Shan
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Jinyu Chen
- College of Statistics and Data Science, Faculty of Science, Beijing University of Technology, Beijing, China
| | - Kangning Dong
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Wei Zhou
- Department of Software Engineering, Yunnan University, Kunming, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.,Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
5
|
Shen Q, Zhang S. Approximate distance correlation for selecting highly interrelated genes across datasets. PLoS Comput Biol 2021; 17:e1009548. [PMID: 34752449 PMCID: PMC8604336 DOI: 10.1371/journal.pcbi.1009548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Revised: 11/19/2021] [Accepted: 10/10/2021] [Indexed: 12/13/2022] Open
Abstract
With the rapid accumulation of biological omics datasets, decoding the underlying relationships of cross-dataset genes becomes an important issue. Previous studies have attempted to identify differentially expressed genes across datasets. However, it is hard for them to detect interrelated ones. Moreover, existing correlation-based algorithms can only measure the relationship between genes within a single dataset or two multi-modal datasets from the same samples. It is still unclear how to quantify the strength of association of the same gene across two biological datasets with different samples. To this end, we propose Approximate Distance Correlation (ADC) to select interrelated genes with statistical significance across two different biological datasets. ADC first obtains the k most correlated genes for each target gene as its approximate observations, and then calculates the distance correlation (DC) for the target gene across two datasets. ADC repeats this process for all genes and then performs the Benjamini-Hochberg adjustment to control the false discovery rate. We demonstrate the effectiveness of ADC with simulation data and four real applications to select highly interrelated genes across two datasets. These four applications including 21 cancer RNA-seq datasets of different tissues; six single-cell RNA-seq (scRNA-seq) datasets of mouse hematopoietic cells across six different cell types along the hematopoietic cell lineage; five scRNA-seq datasets of pancreatic islet cells across five different technologies; coupled single-cell ATAC-seq (scATAC-seq) and scRNA-seq data of peripheral blood mononuclear cells (PBMC). Extensive results demonstrate that ADC is a powerful tool to uncover interrelated genes with strong biological implications and is scalable to large-scale datasets. Moreover, the number of such genes can serve as a metric to measure the similarity between two datasets, which could characterize the relative difference of diverse cell types and technologies.
Collapse
Affiliation(s)
- Qunlun Shen
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou, China
| |
Collapse
|
6
|
Pandey N, Omkar Chandra, Mishra S, Kumar V. Improving Chromatin-Interaction Prediction Using Single-Cell Open-Chromatin Profiles and Making Insight Into the Cis-Regulatory Landscape of the Human Brain. Front Genet 2021; 12:738194. [PMID: 34691152 PMCID: PMC8533004 DOI: 10.3389/fgene.2021.738194] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell open-chromatin profiles have the potential to reveal the pattern of chromatin-interaction in a cell type. However, currently available cis-regulatory network prediction methods using single-cell open-chromatin profiles focus more on local chromatin interactions despite the fact that long-range interactions among genomic sites play a significant role in gene regulation. Here, we propose a method that predicts both short and long-range interactions among genomic sites using single-cell open chromatin profiles. Our method, termed as single-cell epigenome based chromatin-interaction analysis (scEChIA) exploits signal imputation and refined L1 regularization. For a few single-cell open-chromatin profiles, scEChIA outperformed other tools even in terms of accuracy of prediction. Using scEChIA, we predicted almost 0.7 million interactions among genomic sites across seven cell types in the human brain. Further analysis revealed cell type for connection between genes and expression quantitative trait locus (eQTL) in the human brain and making insight about target genes of human-accelerated-elements and disease-associated mutations. Our analysis enabled by scEChIA also hints about the possible action of a few transcription factors (TFs), especially through long-range interaction in brain endothelial cells.
Collapse
Affiliation(s)
- Neetesh Pandey
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Omkar Chandra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shreya Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Vibhor Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
7
|
Li Y, Ma A, Mathé EA, Li L, Liu B, Ma Q. Elucidation of Biological Networks across Complex Diseases Using Single-Cell Omics. Trends Genet 2020; 36:951-966. [PMID: 32868128 DOI: 10.1016/j.tig.2020.08.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2020] [Revised: 07/29/2020] [Accepted: 08/04/2020] [Indexed: 12/14/2022]
Abstract
Single-cell multimodal omics (scMulti-omics) technologies have made it possible to trace cellular lineages during differentiation and to identify new cell types in heterogeneous cell populations. The derived information is especially promising for computing cell-type-specific biological networks encoded in complex diseases and improving our understanding of the underlying gene regulatory mechanisms. The integration of these networks could, therefore, give rise to a heterogeneous regulatory landscape (HRL) in support of disease diagnosis and drug therapeutics. In this review, we provide an overview of this field and pay particular attention to how diverse biological networks can be inferred in a specific cell type based on integrative methods. Then, we discuss how HRL can advance our understanding of regulatory mechanisms underlying complex diseases and aid in the prediction of prognosis and therapeutic responses. Finally, we outline challenges and future trends that will be central to bringing the field of HRL in complex diseases forward.
Collapse
Affiliation(s)
- Yang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Ewy A Mathé
- Division of Preclinical Innovation, National Center for Advancing Translational Sciences, National Institutes of Health (NIH), Rockville, MD, 20892, USA
| | - Lang Li
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|