1
|
Cardamone F, Piva A, Löser E, Eichenberger B, Romero-Mulero MC, Zenk F, Shields EJ, Cabezas-Wallscheid N, Bonasio R, Tiana G, Zhan Y, Iovino N. Chromatin landscape at cis-regulatory elements orchestrates cell fate decisions in early embryogenesis. Nat Commun 2025; 16:3007. [PMID: 40148291 PMCID: PMC11950382 DOI: 10.1038/s41467-025-57719-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2024] [Accepted: 03/03/2025] [Indexed: 03/29/2025] Open
Abstract
The establishment of germ layers during early development is crucial for body formation. The Drosophila zygote serves as a model for investigating these transitions in relation to the chromatin landscape. However, the cellular heterogeneity of the blastoderm embryo poses a challenge for gaining mechanistic insights. Using 10× Multiome, we simultaneously analyzed the in vivo epigenomic and transcriptomic states of wild-type, E(z)-, and CBP-depleted embryos during zygotic genome activation at single-cell resolution. We found that pre-zygotic H3K27me3 safeguards tissue-specific gene expression by modulating cis-regulatory elements. Furthermore, we demonstrate that CBP is essential for cell fate specification functioning as a transcriptional activator by stabilizing transcriptional factors binding at key developmental genes. Surprisingly, while CBP depletion leads to transcriptional arrest, chromatin accessibility continues to progress independently through the retention of stalled RNA Polymerase II. Our study reveals fundamental principles of chromatin-mediated gene regulation essential for establishing and maintaining cellular identities during early embryogenesis.
Collapse
Affiliation(s)
- Francesco Cardamone
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
- Faculty of Biology, University of Freiburg, Freiburg, Germany
- International Max Planck Research School of Immunobiology, Epigenetics and Metabolism (IMPRS-IEM), Freiburg, Germany
| | - Annamaria Piva
- Department of Experimental Oncology, European Institute of Oncology, IRCCS, Milan, Italy
| | - Eva Löser
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
| | - Bastian Eichenberger
- Department of Experimental Oncology, European Institute of Oncology, IRCCS, Milan, Italy
| | - Mari Carmen Romero-Mulero
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
- Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Fides Zenk
- Epigenomics of Neurodevelopment, Brain Mind Institute, School of Life Sciences, EPFL - Ecole Polytechnique Federal Lusanne, Ecublens, Switzerland
| | - Emily J Shields
- Epigenetics Institute, Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Urology and Institute of Neuropathology, Medical Center-University of Freiburg, Freiburg, Germany
| | - Nina Cabezas-Wallscheid
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany
- Laboratory of Stem Cell Biology and Ageing, Department of Health Sciences and Technology, Swiss Federal Institute of Technology (ETH Zürich), Zürich, Switzerland
- Centre for Integrative Biological Signalling Studies (CIBSS), Freiburg, Germany
| | - Roberto Bonasio
- Epigenetics Institute, Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Cell and Developmental Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Guido Tiana
- Università degli Studi di Milano and INFN, Milan, Italy
| | - Yinxiu Zhan
- Department of Experimental Oncology, European Institute of Oncology, IRCCS, Milan, Italy.
| | - Nicola Iovino
- Max Planck Institute of Immunobiology and Epigenetics, Freiburg, Germany.
| |
Collapse
|
2
|
Braccioli L, van den Brand T, Alonso Saiz N, Fountas C, Celie PHN, Kazokaitė-Adomaitienė J, de Wit E. Identifying cross-lineage dependencies of cell-type-specific regulators in mouse gastruloids. Dev Cell 2025:S1534-5807(25)00118-2. [PMID: 40101716 DOI: 10.1016/j.devcel.2025.02.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/09/2024] [Accepted: 02/21/2025] [Indexed: 03/20/2025]
Abstract
Correct gene expression levels are crucial for normal development. Advances in genomics enable the inference of gene regulatory programs active during development but cannot capture the complex multicellular interactions occurring during mammalian embryogenesis in utero. In vitro models of mammalian development, like gastruloids, can overcome this limitation. Using time-resolved single-cell chromatin accessibility analysis, we delineated the regulatory profile during mouse gastruloid development, identifying critical drivers of developmental transitions. Gastruloids develop from bipotent progenitor cells driven by the transcription factors (TFs) OCT4, SOX2, and TBXT, differentiating into the mesoderm (characterized by the mesogenin 1 [MSGN1]) and spinal cord (characterized by CDX2). ΔCDX gastruloids fail to form spinal cord, while Msgn1 ablation inhibits paraxial mesoderm and spinal cord development. Chimeric gastruloids with ΔMSGN1 and wild-type cells formed both tissues, indicating that inter-tissue communication is necessary for spinal cord formation. Our work has important implications for studying inter-tissue communication and gene regulatory programs in development.
Collapse
Affiliation(s)
- Luca Braccioli
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands.
| | - Teun van den Brand
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Noemi Alonso Saiz
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Charis Fountas
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands
| | - Patrick H N Celie
- Protein Facility, Division of Biochemistry, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands; Oncode Institute, 3521 AL Utrecht, the Netherlands
| | - Justina Kazokaitė-Adomaitienė
- Protein Facility, Division of Biochemistry, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands; Oncode Institute, 3521 AL Utrecht, the Netherlands
| | - Elzo de Wit
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, the Netherlands.
| |
Collapse
|
3
|
Liu C, Li X, Hu Q, Jia Z, Ye Q, Wang X, Zhao K, Liu L, Wang M. Decoding the blueprints of embryo development with single-cell and spatial omics. Semin Cell Dev Biol 2025; 167:22-39. [PMID: 39889540 DOI: 10.1016/j.semcdb.2025.01.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 01/18/2025] [Accepted: 01/18/2025] [Indexed: 02/03/2025]
Abstract
Embryonic development is a complex and intricately regulated process that encompasses precise control over cell differentiation, morphogenesis, and the underlying gene expression changes. Recent years have witnessed a remarkable acceleration in the development of single-cell and spatial omic technologies, enabling high-throughput profiling of transcriptomic and other multi-omic information at the individual cell level. These innovations offer fresh and multifaceted perspectives for investigating the intricate cellular and molecular mechanisms that govern embryonic development. In this review, we provide an in-depth exploration of the latest technical advancements in single-cell and spatial multi-omic methodologies and compile a systematic catalog of their applications in the field of embryonic development. We deconstruct the research strategies employed by recent studies that leverage single-cell sequencing techniques and underscore the unique advantages of spatial transcriptomics. Furthermore, we delve into both the current applications, data analysis algorithms and the untapped potential of these technologies in advancing our understanding of embryonic development. With the continuous evolution of multi-omic technologies, we anticipate their widespread adoption and profound contributions to unraveling the intricate molecular foundations underpinning embryo development in the foreseeable future.
Collapse
Affiliation(s)
- Chang Liu
- BGI Research, Hangzhou 310030, China; BGI Research, Shenzhen 518083, China; Shanxi Medical University-BGI Collaborative Center for Future Medicine, Shanxi Medical University, Taiyuan 030001, China; Shenzhen Proof-of-Concept Center of Digital Cytopathology, BGI Research, Shenzhen 518083, China
| | | | - Qinan Hu
- Shenzhen Key Laboratory of Gene Regulation and Systems Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518005, China; Department of Pharmacology, School of Medicine, Southern University of Science and Technology, Shenzhen 518005, China
| | - Zihan Jia
- BGI Research, Hangzhou 310030, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qing Ye
- BGI Research, Hangzhou 310030, China; China Jiliang University, Hangzhou 310018, China
| | | | - Kaichen Zhao
- College of Biomedicine and Health, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, Hubei 430070, China
| | - Longqi Liu
- BGI Research, Hangzhou 310030, China; Shanxi Medical University-BGI Collaborative Center for Future Medicine, Shanxi Medical University, Taiyuan 030001, China.
| | - Mingyue Wang
- BGI Research, Hangzhou 310030, China; Key Laboratory of Spatial Omics of Zhejiang Province, BGI Research, Hangzhou 310030, China.
| |
Collapse
|
4
|
Shen Y, Liu K, Liu J, Shen J, Ye T, Zhao R, Zhang R, Song Y. TBP bookmarks and preserves neural stem cell fate memory by orchestrating local chromatin architecture. Mol Cell 2025; 85:413-429.e10. [PMID: 39662469 DOI: 10.1016/j.molcel.2024.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Revised: 09/30/2024] [Accepted: 11/14/2024] [Indexed: 12/13/2024]
Abstract
Mitotic bookmarking has been posited as an important strategy for cells to faithfully propagate their fate memory through cell generations. However, the physiological significance and regulatory mechanisms of mitotic bookmarking in neural development remain unexplored. Here, we identified TATA-binding protein (TBP) as a crucial mitotic bookmarker for preserving the fate memory of Drosophila neural stem cells (NSCs). Phosphorylation by the super elongation complex (SEC) is important for TBP to retain as discrete foci at mitotic chromosomes of NSCs to effectively transmit their fate memory. TBP depletion leads to drastic NSC loss, whereas TBP overexpression enhances the ability of SEC to induce neural progenitor dedifferentiation and tumorigenesis. Importantly, TBP achieves its mitotic retention through recruiting the chromatin remodeler EP400, which in turn increases local chromatin accessibility via depositing H2A.Z. Thus, local chromatin remodeling ensures mitotic bookmarking, which may represent a general principle underlying the preservation of cell fate memory.
Collapse
Affiliation(s)
- Yuying Shen
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Kun Liu
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Jie Liu
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Jingwen Shen
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Tongtong Ye
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Runxiang Zhao
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Rulan Zhang
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yan Song
- Ministry of Education Key Laboratory of Cell Proliferation and Differentiation, School of Life Sciences, Peking University, Beijing 100871, China; Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
5
|
Amiri EE, Tenger-Trolander A, Li M, Thomas Julian A, Kasan K, Sanders SA, Blythe S, Schmidt-Ott U. Conservation of symmetry breaking at the level of chromatin accessibility between fly species with unrelated anterior determinants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.13.632851. [PMID: 39868093 PMCID: PMC11760685 DOI: 10.1101/2025.01.13.632851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Establishing the anterior-posterior body axis is a fundamental process during embryogenesis, and the fruit fly, Drosophila melanogaster, provides one of the best-known case studies of this process. In Drosophila, localized mRNA of bicoid serves as anterior determinant (AD). Bicoid engages in a concentration-dependent competition with nucleosomes and initiates symmetry-breaking along the AP axis by promoting chromatin accessibility at the loci of transcription factor (TF) genes that are expressed in the anterior of the embryo. However, ADs of other fly species are unrelated and structurally distinct, and little is known about how they function. We addressed this question in the moth fly, Clogmia albipunctata, in which a maternally expressed transcript isoform of the pair-rule segmentation gene odd-paired is localized in the anterior egg and has been co-opted as AD. We provide a de novo assembly and annotation of the Clogmia genome and describe how knockdown of zelda and maternal odd-paired transcript affect chromatin accessibility and expression of TF-encoding loci. The results of these experiments suggest direct roles of Cal-Zld in opening and closing chromatin during nuclear cleavage cycles and show that Clogmia's maternal odd-paired activity promotes chromatin accessibility and anterior expression during the early phase of zygotic genome activation at Clogmia's homeobrain and sloppy-paired loci. We conclude that unrelated dipteran ADs initiate anterior-posterior axis-specification at the level of enhancer accessibility and that homeobrain and sloppy-paired homologs may serve a more widely conserved role in the initiation of anterior pattern formation given their early anterior expression and function in head development in other insects.
Collapse
Affiliation(s)
- Ezra E. Amiri
- The University of Chicago, Department of Organismal Biology and Anatomy, 1027 East 57 Street, Chicago, Illinois 60637, USA
| | - Ayse Tenger-Trolander
- The University of Chicago, Department of Organismal Biology and Anatomy, 1027 East 57 Street, Chicago, Illinois 60637, USA
| | - Muzi Li
- The University of Chicago, Department of Organismal Biology and Anatomy, 1027 East 57 Street, Chicago, Illinois 60637, USA
| | - Alexander Thomas Julian
- Illinois Institute of Technology, Department of Biology, 3105 South Dearborn Street, Chicago, Illinois 60616, USA
| | - Koray Kasan
- The University of Chicago, Department of Organismal Biology and Anatomy, 1027 East 57 Street, Chicago, Illinois 60637, USA
| | - Sheri A. Sanders
- Notre Dame University, 252 Galvin Life Science Center/Freimann Life Science Center, Notre Dame, Indiana 46556, USA
| | - Shelby Blythe
- Northwestern University, Department of Molecular Biosciences, 2205 Tech Drive, Evanston, Illinois 60208, USA
- Northwestern University and The University of Chicago, National Institute for Theory and Mathematics in Biology, 875 North Michigan Avenue, Suite 3500, Chicago, Illinois 60611, USA
| | - Urs Schmidt-Ott
- The University of Chicago, Department of Organismal Biology and Anatomy, 1027 East 57 Street, Chicago, Illinois 60637, USA
| |
Collapse
|
6
|
Nichols RV, Rylaarsdam LE, O'Connell BL, Shipony Z, Iremadze N, Acharya SN, Adey AC. Atlas-scale single-cell DNA methylation profiling with sciMETv3. CELL GENOMICS 2025; 5:100726. [PMID: 39719707 PMCID: PMC11770211 DOI: 10.1016/j.xgen.2024.100726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 10/25/2024] [Accepted: 11/26/2024] [Indexed: 12/26/2024]
Abstract
Single-cell methods to assess DNA methylation have not achieved the same level of cell throughput per experiment compared to other modalities, with large-scale datasets requiring extensive automation, time, and other resources. Here, we describe sciMETv3, a combinatorial indexing-based technique that enables atlas-scale libraries to be produced in a single experiment. To reduce the sequencing burden, we demonstrate the compatibility of sciMETv3 with capture techniques to enrich regulatory regions, as well as the ability to leverage enzymatic conversion, which can yield higher library diversity. We showcase the throughput of sciMETv3 by producing a >140,000 cell library from human middle frontal gyrus split across four multiplexed individuals using both Illumina and Ultima sequencing instrumentation. Finally, we introduce sciMET+ATAC to enable high-throughput exploration of the interplay between chromatin accessibility and DNA methylation within the same cell.
Collapse
Affiliation(s)
- Ruth V Nichols
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Lauren E Rylaarsdam
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Brendan L O'Connell
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA; Cancer Early Detection Advanced Research Institute, Oregon Health & Science University, Portland, OR, USA
| | | | | | - Sonia N Acharya
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Andrew C Adey
- Department of Molecular & Medical Genetics, Oregon Health & Science University, Portland, OR, USA; Cancer Early Detection Advanced Research Institute, Oregon Health & Science University, Portland, OR, USA; Knight Cardiovascular Institute, Oregon Health & Science University, Portland, OR, USA; Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
7
|
Miao Z, Wang J, Park K, Kuang D, Kim J. Depth-corrected multi-factor dissection of chromatin accessibility for scATAC-seq data with PACS. Nat Commun 2025; 16:401. [PMID: 39757254 DOI: 10.1038/s41467-024-55580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 12/10/2024] [Indexed: 01/07/2025] Open
Abstract
Single cell ATAC-seq (scATAC-seq) experimental designs have become increasingly complex, with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current scATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture. To address these problems, we present a zero-adjusted statistical model, Probability model of Accessible Chromatin of Single cells (PACS), that allows complex hypothesis testing of accessibility-modulating factors while accounting for sparse and incomplete data. For differential accessibility analysis, PACS controls the false positive rate and achieves a 17% to 122% higher power on average than existing tools. We demonstrate the effectiveness of PACS through several analysis tasks, including supervised cell type annotation, compound hypothesis testing, batch effect correction, and spatiotemporal modeling. We apply PACS to datasets from various tissues and show its ability to reveal previously undiscovered insights in scATAC-seq data.
Collapse
Affiliation(s)
- Zhen Miao
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jianqiao Wang
- Department of Biostatistics, Harvard T.H. Chan School of Health, Boston, MA, USA
- Department of Statistics and Data Science, Tsinghua University, Beijing, China
| | - Kernyu Park
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Da Kuang
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Junhyong Kim
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA.
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
8
|
Kudron M, Gevirtzman L, Victorsen A, Lear BC, Gao J, Xu J, Samanta S, Frink E, Tran-Pearson A, Huynh C, Vafeados D, Hammonds A, Fisher W, Wall M, Wesseling G, Hernandez V, Lin Z, Kasparian M, White K, Allada R, Gerstein M, Hillier L, Celniker SE, Reinke V, Waterston RH. Binding profiles for 961 Drosophila and C. elegans transcription factors reveal tissue-specific regulatory relationships. Genome Res 2024; 34:2319-2334. [PMID: 39438113 PMCID: PMC11694743 DOI: 10.1101/gr.279037.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 10/17/2024] [Indexed: 10/25/2024]
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here, we present the culmination of the efforts of the modENCODE (model organism Encyclopedia of DNA Elements) and modERN (model organism Encyclopedia of Regulatory Networks) consortia to systematically assay TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). These data sets comprise 605 TFs identifying 3.6 M sites in the fly and 356 TFs identifying 0.9 M sites in the worm, and represent the majority of the regulatory space in each genome. We demonstrate that TFs associate with chromatin in clusters termed "metapeaks," that larger metapeaks have characteristics of high-occupancy target (HOT) regions, and that the importance of consensus sequence motifs bound by TFs depends on metapeak size and complexity. Combining ChIP-seq data with single-cell RNA-seq data in a machine-learning model identifies TFs with a prominent role in promoting target gene expression in specific cell types, even differentiating between parent-daughter cells during embryogenesis. These data are a rich resource for the community that should fuel and guide future investigations into TF function. To facilitate data accessibility and utility, all strains expressing green fluorescent protein (GFP)-tagged TFs are available at the stock centers for each organism. The chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center, GEO, and through a direct interface that provides rapid access to processed data sets and summary analyses, as well as widgets to probe the cell-type-specific TF-target relationships.
Collapse
Affiliation(s)
- Michelle Kudron
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Alec Victorsen
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Bridget C Lear
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Jinrui Xu
- Department of Biology, Howard University, Washington, District of Columbia 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, District of Columbia 20059, USA
| | - Swapna Samanta
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA
| | - Emily Frink
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Adri Tran-Pearson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Chau Huynh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Ann Hammonds
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - William Fisher
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Martha Wall
- Institute for Genomics and Systems Biology, University of Chicago, Chicago, Illinois 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | - Greg Wesseling
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Vanessa Hernandez
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Zhichun Lin
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Mary Kasparian
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Kevin White
- Department of Biochemistry and Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, 117597 Singapore
| | - Ravi Allada
- Department of Neurobiology, Northwestern University, Evanston, Illinois 60208, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
- Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Susan E Celniker
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut 06520, USA;
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;
| |
Collapse
|
9
|
Zhao Y, Yu ZM, Cui T, Li LD, Li YY, Qian FC, Zhou LW, Li Y, Fang QL, Huang XM, Zhang QY, Cai FH, Dong FJ, Shang DS, Li CQ, Wang QY. scBlood: A comprehensive single-cell accessible chromatin database of blood cells. Comput Struct Biotechnol J 2024; 23:2746-2753. [PMID: 39050785 PMCID: PMC11266868 DOI: 10.1016/j.csbj.2024.06.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/27/2024] Open
Abstract
The advent of single cell transposase-accessible chromatin sequencing (scATAC-seq) technology enables us to explore the genomic characteristics and chromatin accessibility of blood cells at the single-cell level. To fully make sense of the roles and regulatory complexities of blood cells, it is critical to collect and analyze these rapidly accumulating scATAC-seq datasets at a system level. Here, we present scBlood (https://bio.liclab.net/scBlood/), a comprehensive single-cell accessible chromatin database of blood cells. The current version of scBlood catalogs 770,907 blood cells and 452,247 non-blood cells from ∼400 high-quality scATAC-seq samples covering 30 tissues and 21 disease types. All data hosted on scBlood have undergone preprocessing from raw fastq files and multiple standards of quality control. Furthermore, we conducted comprehensive downstream analyses, including multi-sample integration analysis, cell clustering and annotation, differential chromatin accessibility analysis, functional enrichment analysis, co-accessibility analysis, gene activity score calculation, and transcription factor (TF) enrichment analysis. In summary, scBlood provides a user-friendly interface for searching, browsing, analyzing, visualizing, and downloading scATAC-seq data of interest. This platform facilitates insights into the functions and regulatory mechanisms of blood cells, as well as their involvement in blood-related diseases.
Collapse
Affiliation(s)
- Yu Zhao
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Zheng-Min Yu
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Ting Cui
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Li-Dong Li
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Yan-Yu Li
- School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing 163319, China
| | - Feng-Cui Qian
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Li-Wei Zhou
- State Key Laboratory of Stem Cell and Reproductive Biology, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Ye Li
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Qiao-Li Fang
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Xue-Mei Huang
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Qin-Yi Zhang
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Fu-Hong Cai
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - Fu-Juan Dong
- School of Computer, University of South China, Hengyang, Hunan 421001, China
| | - De-Si Shang
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Chun-Quan Li
- The First Affiliated Hospital & MOE Key Lab of Rare Pediatric Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Cardiovascular Lab of Big Data and Imaging Artificial Intelligence, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- The First Affiliated Hospital, Institute of Cardiovascular Disease, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| | - Qiu-Yu Wang
- School of Computer, University of South China, Hengyang, Hunan 421001, China
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Key Laboratory of Multi-omics And Artificial Intelligence of Cardiovascular Diseases, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
- Hunan Provincial Maternal and Child Health Care Hospital, National Health Commission Key Laboratory of Birth Defect Research and Prevention, Hengyang Medical School, University of South China, Hengyang, Hunan 421001, China
| |
Collapse
|
10
|
Hu H, Quon G. scPair: Boosting single cell multimodal analysis by leveraging implicit feature selection and single cell atlases. Nat Commun 2024; 15:9932. [PMID: 39548084 PMCID: PMC11568318 DOI: 10.1038/s41467-024-53971-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 10/25/2024] [Indexed: 11/17/2024] Open
Abstract
Multimodal single-cell assays profile multiple sets of features in the same cells and are widely used for identifying and mapping cell states between chromatin and mRNA and linking regulatory elements to target genes. However, the high dimensionality of input features and shallow sequencing depth compared to unimodal assays pose challenges in data analysis. Here we present scPair, a multimodal single-cell data framework that overcomes these challenges by employing an implicit feature selection approach. scPair uses dual encoder-decoder structures trained on paired data to align cell states across modalities and predict features from one modality to another. We demonstrate that scPair outperforms existing methods in accuracy and execution time, and facilitates downstream tasks such as trajectory inference. We further show scPair can augment smaller multimodal datasets with larger unimodal atlases to increase statistical power to identify groups of transcription factors active during different stages of neural differentiation.
Collapse
Affiliation(s)
- Hongru Hu
- Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, USA.
- Genome Center, University of California, Davis, CA, USA.
| | - Gerald Quon
- Genome Center, University of California, Davis, CA, USA.
- Department of Molecular and Cellular Biology, University of California, Davis, CA, USA.
| |
Collapse
|
11
|
Zeng Y, Ma Q, Chen J, Kong X, Chen Z, Liu H, Liu L, Qian Y, Wang X, Lu S. Single-cell sequencing: Current applications in various tuberculosis specimen types. Cell Prolif 2024; 57:e13698. [PMID: 38956399 PMCID: PMC11533074 DOI: 10.1111/cpr.13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 05/21/2024] [Accepted: 06/07/2024] [Indexed: 07/04/2024] Open
Abstract
Tuberculosis (TB) is a chronic disease caused by Mycobacterium tuberculosis (M.tb) and responsible for millions of deaths worldwide each year. It has a complex pathogenesis that primarily affects the lungs but can also impact systemic organs. In recent years, single-cell sequencing technology has been utilized to characterize the composition and proportion of immune cell subpopulations associated with the pathogenesis of TB disease since it has a high resolution that surpasses conventional techniques. This paper reviews the current use of single-cell sequencing technologies in TB research and their application in analysing specimens from various sources of TB, primarily peripheral blood and lung specimens. The focus is on how these technologies can reveal dynamic changes in immune cell subpopulations, genes and proteins during disease progression after M.tb infection. Based on the current findings, single-cell sequencing has significant potential clinical value in the field of TB research. Next, we will focus on the real-world applications of the potential targets identified through single-cell sequencing for diagnostics, therapeutics and the development of effective vaccines.
Collapse
Affiliation(s)
- Yuqin Zeng
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Quan Ma
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Jinyun Chen
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Xingxing Kong
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Zhanpeng Chen
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Huazhen Liu
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Lanlan Liu
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Yan Qian
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Xiaomin Wang
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| | - Shuihua Lu
- National Clinical Research Center for Infectious DiseaseShenzhen Third People's HospitalShenzhenGuangdong ProvinceChina
| |
Collapse
|
12
|
Chen S, Keleş S. GEEES: inferring cell-specific gene-enhancer interactions from multi-modal single-cell data. Bioinformatics 2024; 40:btae638. [PMID: 39468737 PMCID: PMC11549018 DOI: 10.1093/bioinformatics/btae638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 10/17/2024] [Accepted: 10/25/2024] [Indexed: 10/30/2024] Open
Abstract
MOTIVATION Gene-enhancer interactions are central to transcriptional regulation. Current multi-modal single-cell datasets that profile transcriptome and chromatin accessibility simultaneously in a single cell are yielding opportunities to infer gene-enhancer associations in a cell type specific manner. Computational efforts for such multi-modal single-cell datasets thus far focused on methods for identification and refinement of cell types and trajectory construction. While initial attempts for inferring gene-enhancer interactions have emerged, these have not been evaluated against benchmark datasets that materialized from bulk genomic experiments. Furthermore, existing approaches are limited to inferring gene-enhancer associations at the level of grouped cells as opposed to individual cells, thereby ignoring regulatory heterogeneity among the cells. RESULTS We present a new approach, GEEES for "Gene EnhancEr IntEractions from Multi-modal Single Cell Data," for inferring gene-enhancer associations at the single-cell level using multi-modal single-cell transcriptome and chromatin accessibility data. We evaluated GEEES alongside several multivariate regression-based alternatives we devised and state-of-the-art methods using a large number of benchmark datasets, providing a comprehensive assessment of current approaches. This analysis revealed significant discrepancies between gold-standard interactions and gene-enhancer associations derived from multi-modal single-cell data. Notably, incorporating gene-enhancer distance into the analysis markedly improved performance across all methods, positioning GEEES as a leading approach in this domain. While the overall improvement in performance metrics by GEEES is modest, it provides enhanced cell representation learning which can be leveraged for more effective downstream analysis. Furthermore, our review of existing experimentally driven benchmark datasets uncovers their limited concordance, underscoring the necessity for new high-throughput experiments to validate gene-enhancer interactions inferred from single-cell data. AVAILABILITY AND IMPLEMENTATION https://github.com/keleslab/GEEES.
Collapse
Affiliation(s)
- Shuyang Chen
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, United States
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706, United States
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53706, United States
| |
Collapse
|
13
|
Ciabrelli F, Atinbayeva N, Pane A, Iovino N. Epigenetic inheritance and gene expression regulation in early Drosophila embryos. EMBO Rep 2024; 25:4131-4152. [PMID: 39285248 PMCID: PMC11467379 DOI: 10.1038/s44319-024-00245-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/13/2024] [Accepted: 08/21/2024] [Indexed: 10/12/2024] Open
Abstract
Precise spatiotemporal regulation of gene expression is of paramount importance for eukaryotic development. The maternal-to-zygotic transition (MZT) during early embryogenesis in Drosophila involves the gradual replacement of maternally contributed mRNAs and proteins by zygotic gene products. The zygotic genome is transcriptionally activated during the first 3 hours of development, in a process known as "zygotic genome activation" (ZGA), by the orchestrated activities of a few pioneer factors. Their decisive role during ZGA has been characterized in detail, whereas the contribution of chromatin factors to this process has been historically overlooked. In this review, we aim to summarize the current knowledge of how chromatin regulation impacts the first stages of Drosophila embryonic development. In particular, we will address the following questions: how chromatin factors affect ZGA and transcriptional silencing, and how genome architecture promotes the integration of these processes early during development. Remarkably, certain chromatin marks can be intergenerationally inherited, and their presence in the early embryo becomes critical for the regulation of gene expression at later stages. Finally, we speculate on the possible roles of these chromatin marks as carriers of epialleles during transgenerational epigenetic inheritance (TEI).
Collapse
Affiliation(s)
- Filippo Ciabrelli
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg im Breisgau, Germany
| | - Nazerke Atinbayeva
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg im Breisgau, Germany
| | - Attilio Pane
- Institute of Biomedical Sciences/UFRJ, 21941902, Rio de Janeiro, Brazil
| | - Nicola Iovino
- Max Planck Institute of Immunobiology and Epigenetics, 79108, Freiburg im Breisgau, Germany.
| |
Collapse
|
14
|
Morrissey A, Shi J, James DQ, Mahony S. Accurate allocation of multimapped reads enables regulatory element analysis at repeats. Genome Res 2024; 34:937-951. [PMID: 38986578 PMCID: PMC11293539 DOI: 10.1101/gr.278638.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 06/14/2024] [Indexed: 07/12/2024]
Abstract
Transposable elements (TEs) and other repetitive regions have been shown to contain gene regulatory elements, including transcription factor binding sites. However, regulatory elements harbored by repeats have proven difficult to characterize using short-read sequencing assays such as ChIP-seq or ATAC-seq. Most regulatory genomics analysis pipelines discard "multimapped" reads that align equally well to multiple genomic locations. Because multimapped reads arise predominantly from repeats, current analysis pipelines fail to detect a substantial portion of regulatory events that occur in repetitive regions. To address this shortcoming, we developed Allo, a new approach to allocate multimapped reads in an efficient, accurate, and user-friendly manner. Allo combines probabilistic mapping of multimapped reads with a convolutional neural network that recognizes the read distribution features of potential peaks, offering enhanced accuracy in multimapping read assignment. Allo also provides read-level output in the form of a corrected alignment file, making it compatible with existing regulatory genomics analysis pipelines and downstream peak-finders. In a demonstration application on CTCF ChIP-seq data, we show that Allo results in the discovery of thousands of new CTCF peaks. Many of these peaks contain the expected cognate motif and/or serve as TAD boundaries. We additionally apply Allo to a diverse collection of ENCODE ChIP-seq data sets, resulting in multiple previously unidentified interactions between transcription factors and repetitive element families. Finally, we show that Allo may be particularly beneficial in identifying ChIP-seq peaks at centromeres, near segmentally duplicated genes, and in younger TEs, enabling new regulatory analyses in these regions.
Collapse
Affiliation(s)
- Alexis Morrissey
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Jeffrey Shi
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Daniela Q James
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
15
|
Otto DJ, Jordan C, Dury B, Dien C, Setty M. Quantifying cell-state densities in single-cell phenotypic landscapes using Mellon. Nat Methods 2024; 21:1185-1195. [PMID: 38890426 DOI: 10.1038/s41592-024-02302-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 05/08/2024] [Indexed: 06/20/2024]
Abstract
Cell-state density characterizes the distribution of cells along phenotypic landscapes and is crucial for unraveling the mechanisms that drive diverse biological processes. Here, we present Mellon, an algorithm for estimation of cell-state densities from high-dimensional representations of single-cell data. We demonstrate Mellon's efficacy by dissecting the density landscape of differentiating systems, revealing a consistent pattern of high-density regions corresponding to major cell types intertwined with low-density, rare transitory states. We present evidence implicating enhancer priming and the activation of master regulators in emergence of these transitory states. Mellon offers the flexibility to perform temporal interpolation of time-series data, providing a detailed view of cell-state dynamics during developmental processes. Mellon facilitates density estimation across various single-cell data modalities, scaling linearly with the number of cells. Our work underscores the importance of cell-state density in understanding the differentiation processes, and the potential of Mellon to provide insights into mechanisms guiding biological trajectories.
Collapse
Affiliation(s)
- Dominik J Otto
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Cailin Jordan
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Brennan Dury
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Christine Dien
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
| | - Manu Setty
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| |
Collapse
|
16
|
Chen H, Ryu J, Vinyard ME, Lerer A, Pinello L. SIMBA: single-cell embedding along with features. Nat Methods 2024; 21:1003-1013. [PMID: 37248389 PMCID: PMC11166568 DOI: 10.1038/s41592-023-01899-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 04/26/2023] [Indexed: 05/31/2023]
Abstract
Most current single-cell analysis pipelines are limited to cell embeddings and rely heavily on clustering, while lacking the ability to explicitly model interactions between different feature types. Furthermore, these methods are tailored to specific tasks, as distinct single-cell problems are formulated differently. To address these shortcomings, here we present SIMBA, a graph embedding method that jointly embeds single cells and their defining features, such as genes, chromatin-accessible regions and DNA sequences, into a common latent space. By leveraging the co-embedding of cells and features, SIMBA allows for the study of cellular heterogeneity, clustering-free marker discovery, gene regulation inference, batch effect removal and omics data integration. We show that SIMBA provides a single framework that allows diverse single-cell problems to be formulated in a unified way and thus simplifies the development of new analyses and extension to new single-cell modalities. SIMBA is implemented as a comprehensive Python library ( https://simba-bio.readthedocs.io ).
Collapse
Affiliation(s)
- Huidong Chen
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Jayoung Ryu
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Michael E Vinyard
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
- Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Adam Lerer
- Facebook AI Research, New York, NY, USA.
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Boston, MA, USA.
- Department of Pathology, Harvard Medical School, Boston, MA, USA.
- Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
17
|
Singh R, Wu AP, Mudide A, Berger B. Causal gene regulatory analysis with RNA velocity reveals an interplay between slow and fast transcription factors. Cell Syst 2024; 15:462-474.e5. [PMID: 38754366 DOI: 10.1016/j.cels.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 08/25/2023] [Accepted: 04/18/2024] [Indexed: 05/18/2024]
Abstract
Single-cell expression dynamics, from differentiation trajectories or RNA velocity, have the potential to reveal causal links between transcription factors (TFs) and their target genes in gene regulatory networks (GRNs). However, existing methods either overlook these expression dynamics or necessitate that cells be ordered along a linear pseudotemporal axis, which is incompatible with branching trajectories. We introduce Velorama, an approach to causal GRN inference that represents single-cell differentiation dynamics as a directed acyclic graph of cells, constructed from pseudotime or RNA velocity measurements. Additionally, Velorama enables the estimation of the speed at which TFs influence target genes. Applying Velorama, we uncover evidence that the speed of a TF's interactions is tied to its regulatory function. For human corticogenesis, we find that slow TFs are linked to gliomas, while fast TFs are associated with neuropsychiatric diseases. We expect Velorama to become a critical part of the RNA velocity toolkit for investigating the causal drivers of differentiation and disease.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA.
| | - Alexander P Wu
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Anish Mudide
- Phillips Exeter Academy, Exeter, NH 03883, USA; Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory and Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
18
|
Abnizova I, Stapel C, Boekhorst RT, Lee JTH, Hemberg M. Integrative analysis of transcriptomic and epigenomic data reveals distinct patterns for developmental and housekeeping gene regulation. BMC Biol 2024; 22:78. [PMID: 38600550 PMCID: PMC11005181 DOI: 10.1186/s12915-024-01869-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/14/2024] [Indexed: 04/12/2024] Open
Abstract
BACKGROUND Regulation of transcription is central to the emergence of new cell types during development, and it often involves activation of genes via proximal and distal regulatory regions. The activity of regulatory elements is determined by transcription factors (TFs) and epigenetic marks, but despite extensive mapping of such patterns, the extraction of regulatory principles remains challenging. RESULTS Here we study differentially and similarly expressed genes along with their associated epigenomic profiles, chromatin accessibility and DNA methylation, during lineage specification at gastrulation in mice. Comparison of the three lineages allows us to identify genomic and epigenomic features that distinguish the two classes of genes. We show that differentially expressed genes are primarily regulated by distal elements, while similarly expressed genes are controlled by proximal housekeeping regulatory programs. Differentially expressed genes are relatively isolated within topologically associated domains, while similarly expressed genes tend to be located in gene clusters. Transcription of differentially expressed genes is associated with differentially open chromatin at distal elements including enhancers, while that of similarly expressed genes is associated with ubiquitously accessible chromatin at promoters. CONCLUSION Based on these associations of (linearly) distal genes' transcription start sites (TSSs) and putative enhancers for developmental genes, our findings allow us to link putative enhancers to their target promoters and to infer lineage-specific repertoires of putative driver transcription factors, within which we define subgroups of pioneers and co-operators.
Collapse
Affiliation(s)
- Irina Abnizova
- Epigenetics Programme, Babraham Institute, Cambridge, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Carine Stapel
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| | | | | | - Martin Hemberg
- Wellcome Sanger Institute, Hinxton, UK.
- The Gene Lay Institute of Immunology and Inflammation Brigham & Women's Hospital and Harvard Medical School, Boston, USA.
| |
Collapse
|
19
|
Pollex T, Rabinowitz A, Gambetta MC, Marco-Ferreres R, Viales RR, Jankowski A, Schaub C, Furlong EEM. Enhancer-promoter interactions become more instructive in the transition from cell-fate specification to tissue differentiation. Nat Genet 2024; 56:686-696. [PMID: 38467791 PMCID: PMC11018526 DOI: 10.1038/s41588-024-01678-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/31/2024] [Indexed: 03/13/2024]
Abstract
To regulate expression, enhancers must come in proximity to their target gene. However, the relationship between the timing of enhancer-promoter (E-P) proximity and activity remains unclear, with examples of uncoupled, anticorrelated and correlated interactions. To assess this, we selected 600 characterized enhancers or promoters with tissue-specific activity in Drosophila embryos and performed Capture-C in FACS-purified myogenic or neurogenic cells during specification and tissue differentiation. This enabled direct comparison between E-P proximity and activity transitioning from OFF-to-ON and ON-to-OFF states across developmental conditions. This showed remarkably similar E-P topologies between specified muscle and neuronal cells, which are uncoupled from activity. During tissue differentiation, many new distal interactions emerge where changes in E-P proximity reflect changes in activity. The mode of E-P regulation therefore appears to change as embryogenesis proceeds, from largely permissive topologies during cell-fate specification to more instructive regulation during terminal tissue differentiation, when E-P proximity is coupled to activation.
Collapse
Affiliation(s)
- Tim Pollex
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- European Molecular Biology Laboratory (EMBL), Directors' Research Unit, Heidelberg, Germany
| | - Adam Rabinowitz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Maria Cristina Gambetta
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Raquel Marco-Ferreres
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Rebecca R Viales
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Aleksander Jankowski
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland
| | - Christoph Schaub
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany.
| |
Collapse
|
20
|
Zhang G, Fu Y, Yang L, Ye F, Zhang P, Zhang S, Ma L, Li J, Wu H, Han X, Wang J, Guo G. Construction of single-cell cross-species chromatin accessibility landscapes with combinatorial-hybridization-based ATAC-seq. Dev Cell 2024; 59:793-811.e8. [PMID: 38330939 DOI: 10.1016/j.devcel.2024.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 11/03/2023] [Accepted: 01/18/2024] [Indexed: 02/10/2024]
Abstract
Despite recent advances in single-cell genomics, the lack of maps for single-cell candidate cis-regulatory elements (cCREs) in non-mammal species has limited our exploration of conserved regulatory programs across vertebrates and invertebrates. Here, we developed a combinatorial-hybridization-based method for single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) named CH-ATAC-seq, enabling the construction of single-cell accessible chromatin landscapes for zebrafish, Drosophila, and earthworms (Eisenia andrei). By integrating scATAC censuses of humans, monkeys, and mice, we systematically identified 152 distinct main cell types and around 0.8 million cell-type-specific cCREs. Our analysis provided insights into the conservation of neural, muscle, and immune lineages across species, while epithelial cells exhibited a higher organ-origin heterogeneity. Additionally, a large-scale gene regulatory network (GRN) was constructed in four vertebrates by integrating scRNA-seq censuses. Overall, our study provides a valuable resource for comparative epigenomics, identifying the evolutionary conservation and divergence of gene regulation across different species.
Collapse
Affiliation(s)
- Guodong Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China
| | - Yuting Fu
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Lei Yang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Fang Ye
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China
| | - Peijing Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Shuang Zhang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Lifeng Ma
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Jiaqi Li
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Hanyu Wu
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China
| | - Xiaoping Han
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou 310058, China.
| | - Jingjing Wang
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China.
| | - Guoji Guo
- Bone Marrow Transplantation Center of the First Affiliated Hospital, and Center for Stem Cell and Regenerative Medicine, Zhejiang University School of Medicine, Hangzhou 310000, China; Liangzhu Laboratory, Zhejiang University, Hangzhou 311121, China; Zhejiang Provincial Key Laboratory for Tissue Engineering and Regenerative Medicine, Dr. Li Dak Sum & Yip Yio Chin Center for Stem Cell and Regenerative Medicine, Hangzhou 310058, China; Institute of Hematology, Zhejiang University, Hangzhou, China.
| |
Collapse
|
21
|
Miao Z, Wang J, Park K, Kuang D, Kim J. PACS allows comprehensive dissection of multiple factors governing chromatin accessibility from snATAC-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.30.551108. [PMID: 37577623 PMCID: PMC10418058 DOI: 10.1101/2023.07.30.551108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Single nucleus ATAC-seq (snATAC-seq) experimental designs have become increasingly complex with multiple factors that might affect chromatin accessibility, including genotype, cell type, tissue of origin, sample location, batch, etc., whose compound effects are difficult to test by existing methods. In addition, current snATAC-seq data present statistical difficulties due to their sparsity and variations in individual sequence capture. To address these problems, we present a zero-adjusted statistical model, Probability model of Accessible Chromatin of Single cells (PACS), that can allow complex hypothesis testing of factors that affect accessibility while accounting for sparse and incomplete data. For differential accessibility analysis, PACS controls the false positive rate and achieves on average a 17% to 122% higher power than existing tools. We demonstrate the effectiveness of PACS through several analysis tasks including supervised cell type annotation, compound hypothesis testing, batch effect correction, and spatiotemporal modeling. We apply PACS to several datasets from a variety of tissues and show its ability to reveal previously undiscovered insights in snATAC-seq data.
Collapse
Affiliation(s)
- Zhen Miao
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Jianqiao Wang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Kernyu Park
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Da Kuang
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Junhyong Kim
- Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
- Deptartment Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
22
|
Zhang H, Mulqueen RM, Iannuzo N, Farrera DO, Polverino F, Galligan JJ, Ledford JG, Adey AC, Cusanovich DA. txci-ATAC-seq: a massive-scale single-cell technique to profile chromatin accessibility. Genome Biol 2024; 25:78. [PMID: 38519979 PMCID: PMC10958877 DOI: 10.1186/s13059-023-03150-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Accepted: 12/20/2023] [Indexed: 03/25/2024] Open
Abstract
We develop a large-scale single-cell ATAC-seq method by combining Tn5-based pre-indexing with 10× Genomics barcoding, enabling the indexing of up to 200,000 nuclei across multiple samples in a single reaction. We profile 449,953 nuclei across diverse tissues, including the human cortex, mouse brain, human lung, mouse lung, mouse liver, and lung tissue from a club cell secretory protein knockout (CC16-/-) model. Our study of CC16-/- nuclei uncovers previously underappreciated technical artifacts derived from remnant 129 mouse strain genetic material, which cause profound cell-type-specific changes in regulatory elements near many genes, thereby confounding the interpretation of this commonly referenced mouse model.
Collapse
Affiliation(s)
- Hao Zhang
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
| | - Ryan M Mulqueen
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA
| | - Natalie Iannuzo
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
| | - Dominique O Farrera
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, USA
| | - Francesca Polverino
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
- Division of Pulmonary, Allergy, Critical Care, and Sleep Medicine, University of Arizona, Tucson, AZ, USA
- Banner - University Medicine North, Pulmonary - Clinic F, Tucson, AZ, USA
| | - James J Galligan
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, USA
| | - Julie G Ledford
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA
| | - Andrew C Adey
- Department of Molecular and Medical Genetics, Oregon Health & Science University, Portland, OR, USA.
- Cancer Early Detection Advanced Research Center, Oregon Health & Science University, Portland, OR, USA.
- Oregon Health & Science University, Knight Cancer Institute, Portland, OR, USA.
- Oregon Health & Science University, Knight Cardiovascular Institute, Portland, OR, USA.
| | - Darren A Cusanovich
- Department of Cellular and Molecular Medicine, University of Arizona, Tucson, AZ, USA.
- Asthma & Airway Disease Research Center, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
23
|
Halblander FN, Meng FW, Murphy PJ. Anp32e protects against accumulation of H2A.Z at Sox motif containing promoters during zebrafish gastrulation. Dev Biol 2024; 507:34-43. [PMID: 38159623 PMCID: PMC10922954 DOI: 10.1016/j.ydbio.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 12/04/2023] [Accepted: 12/23/2023] [Indexed: 01/03/2024]
Abstract
Epigenetic regulation of chromatin states is crucial for proper gene expression programs and progression during development, but precise mechanisms by which epigenetic factors influence differentiation remain poorly understood. Here we find that the histone variant H2A.Z accumulates at Sox motif-containing promoters during zebrafish gastrulation while neighboring genes become transcriptionally active. These changes coincide with reduced expression of anp32e, the H2A.Z histone removal chaperone, suggesting that loss of Anp32e may lead to increases in H2A.Z binding during differentiation. Remarkably, genetic removal of Anp32e in embryos leads to H2A.Z accumulation prior to gastrulation and developmental genes become precociously active. Accordingly, H2A.Z accumulation occurs most extensively at Sox motif-associated genes, including many which are normally activated following gastrulation. Altogether, our results provide compelling evidence for a mechanism in which Anp32e preferentially restricts H2A.Z accumulation at Sox motifs to regulate the initial phases of developmental differentiation in zebrafish.
Collapse
Affiliation(s)
- Fabian N Halblander
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester, NY, 14642, USA
| | - Fanju W Meng
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester, NY, 14642, USA.
| | - Patrick J Murphy
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester, NY, 14642, USA.
| |
Collapse
|
24
|
Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, Nichols EK, Taylor ML, Fulton O, O'Day DR, Gomes AR, Ilcisin S, Srivatsan S, Deng X, Disteche CM, Noble WS, Hamazaki N, Moens CB, Kimelman D, Cao J, Schier AF, Spielmann M, Murray SA, Trapnell C, Shendure J. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 2024; 626:1084-1093. [PMID: 38355799 PMCID: PMC10901739 DOI: 10.1038/s41586-024-07069-w] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 01/15/2024] [Indexed: 02/16/2024]
Abstract
The house mouse (Mus musculus) is an exceptional model system, combining genetic tractability with close evolutionary affinity to humans1,2. Mouse gestation lasts only 3 weeks, during which the genome orchestrates the astonishing transformation of a single-cell zygote into a free-living pup composed of more than 500 million cells. Here, to establish a global framework for exploring mammalian development, we applied optimized single-cell combinatorial indexing3 to profile the transcriptional states of 12.4 million nuclei from 83 embryos, precisely staged at 2- to 6-hour intervals spanning late gastrulation (embryonic day 8) to birth (postnatal day 0). From these data, we annotate hundreds of cell types and explore the ontogenesis of the posterior embryo during somitogenesis and of kidney, mesenchyme, retina and early neurons. We leverage the temporal resolution and sampling depth of these whole-embryo snapshots, together with published data4-8 from earlier timepoints, to construct a rooted tree of cell-type relationships that spans the entirety of prenatal development, from zygote to birth. Throughout this tree, we systematically nominate genes encoding transcription factors and other proteins as candidate drivers of the in vivo differentiation of hundreds of cell types. Remarkably, the most marked temporal shifts in cell states are observed within one hour of birth and presumably underlie the massive physiological adaptations that must accompany the successful transition of a mammalian fetus to life outside the womb.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| | - Beth K Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Truc-Mai Le
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Xingfan Huang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Eva K Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Megan L Taylor
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Olivia Fulton
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diana R O'Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Saskia Ilcisin
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christine M Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Cecilia B Moens
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Kimelman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junyue Cao
- Laboratory of Single-Cell Genomics and Population dynamics, The Rockefeller University, New York, NY, USA
| | - Alexander F Schier
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Malte Spielmann
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | | | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Seattle Hub for Synthetic Biology, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, Seattle, WA, USA.
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA.
- Seattle Hub for Synthetic Biology, Seattle, WA, USA.
| |
Collapse
|
25
|
Sabarís G, Ortíz DM, Laiker I, Mayansky I, Naik S, Cavalli G, Stern DL, Preger-Ben Noon E, Frankel N. The Density of Regulatory Information Is a Major Determinant of Evolutionary Constraint on Noncoding DNA in Drosophila. Mol Biol Evol 2024; 41:msae004. [PMID: 38364113 PMCID: PMC10871701 DOI: 10.1093/molbev/msae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Revised: 11/26/2023] [Accepted: 01/05/2024] [Indexed: 02/18/2024] Open
Abstract
Evolutionary analyses have estimated that ∼60% of nucleotides in intergenic regions of the Drosophila melanogaster genome are functionally relevant, suggesting that regulatory information may be encoded more densely in intergenic regions than has been revealed by most functional dissections of regulatory DNA. Here, we approached this issue through a functional dissection of the regulatory region of the gene shavenbaby (svb). Most of the ∼90 kb of this large regulatory region is highly conserved in the genus Drosophila, though characterized enhancers occupy a small fraction of this region. By analyzing the regulation of svb in different contexts of Drosophila development, we found that the regulatory information that drives svb expression in the abdominal pupal epidermis is organized in a different way than the elements that drive svb expression in the embryonic epidermis. While in the embryonic epidermis svb is activated by compact enhancers separated by large inactive DNA regions, svb expression in the pupal epidermis is driven by regulatory information distributed over broader regions of svb cis-regulatory DNA. In the same vein, we observed that other developmental genes also display a dense distribution of putative regulatory elements in their regulatory regions. Furthermore, we found that a large percentage of conserved noncoding DNA of the Drosophila genome is contained within regions of open chromatin. These results suggest that part of the evolutionary constraint on noncoding DNA of Drosophila is explained by the density of regulatory information, which may be greater than previously appreciated.
Collapse
Affiliation(s)
- Gonzalo Sabarís
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
- Institute of Human Genetics, UMR 9002 CNRS-Université de Montpellier, Montpellier, France
| | - Daniela M Ortíz
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
| | - Ian Laiker
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
| | - Ignacio Mayansky
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
| | - Sujay Naik
- Department of Genetics and Developmental Biology, The Rappaport Faculty of Medicine and Research Institute, Technion—Israel Institute of Technology, Haifa 3109601, Israel
| | - Giacomo Cavalli
- Institute of Human Genetics, UMR 9002 CNRS-Université de Montpellier, Montpellier, France
| | - David L Stern
- Janelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA 20147, USA
| | - Ella Preger-Ben Noon
- Department of Genetics and Developmental Biology, The Rappaport Faculty of Medicine and Research Institute, Technion—Israel Institute of Technology, Haifa 3109601, Israel
| | - Nicolás Frankel
- Instituto de Fisiología, Biología Molecular y Neurociencias (IFIBYNE), Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
- Departamento de Ecología, Genética y Evolución, Facultad de Ciencias Exactas y Naturales (FCEN), Universidad de Buenos Aires (UBA), Buenos Aires 1428, Argentina
| |
Collapse
|
26
|
de Almeida BP, Schaub C, Pagani M, Secchia S, Furlong EEM, Stark A. Targeted design of synthetic enhancers for selected tissues in the Drosophila embryo. Nature 2024; 626:207-211. [PMID: 38086418 PMCID: PMC10830412 DOI: 10.1038/s41586-023-06905-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 11/28/2023] [Indexed: 01/19/2024]
Abstract
Enhancers control gene expression and have crucial roles in development and homeostasis1-3. However, the targeted de novo design of enhancers with tissue-specific activities has remained challenging. Here we combine deep learning and transfer learning to design tissue-specific enhancers for five tissues in the Drosophila melanogaster embryo: the central nervous system, epidermis, gut, muscle and brain. We first train convolutional neural networks using genome-wide single-cell assay for transposase-accessible chromatin with sequencing (ATAC-seq) datasets and then fine-tune the convolutional neural networks with smaller-scale data from in vivo enhancer activity assays, yielding models with 13% to 76% positive predictive value according to cross-validation. We designed and experimentally assessed 40 synthetic enhancers (8 per tissue) in vivo, of which 31 (78%) were active and 27 (68%) functioned in the target tissue (100% for central nervous system and muscle). The strategy of combining genome-wide and small-scale functional datasets by transfer learning is generally applicable and should enable the design of tissue-, cell type- and cell state-specific enhancers in any system.
Collapse
Affiliation(s)
- Bernardo P de Almeida
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical University of Vienna, Vienna, Austria
- InstaDeep, Paris, France
| | - Christoph Schaub
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Michaela Pagani
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria
| | - Stefano Secchia
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Eileen E M Furlong
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany
| | - Alexander Stark
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Vienna, Austria.
- Medical University of Vienna, Vienna BioCenter (VBC), Vienna, Austria.
| |
Collapse
|
27
|
Lu C, Wei Y, Abbas M, Agula H, Wang E, Meng Z, Zhang R. Application of Single-Cell Assay for Transposase-Accessible Chromatin with High Throughput Sequencing in Plant Science: Advances, Technical Challenges, and Prospects. Int J Mol Sci 2024; 25:1479. [PMID: 38338756 PMCID: PMC10855595 DOI: 10.3390/ijms25031479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
The Single-cell Assay for Transposase-Accessible Chromatin with high throughput sequencing (scATAC-seq) has gained increasing popularity in recent years, allowing for chromatin accessibility to be deciphered and gene regulatory networks (GRNs) to be inferred at single-cell resolution. This cutting-edge technology now enables the genome-wide profiling of chromatin accessibility at the cellular level and the capturing of cell-type-specific cis-regulatory elements (CREs) that are masked by cellular heterogeneity in bulk assays. Additionally, it can also facilitate the identification of rare and new cell types based on differences in chromatin accessibility and the charting of cellular developmental trajectories within lineage-related cell clusters. Due to technical challenges and limitations, the data generated from scATAC-seq exhibit unique features, often characterized by high sparsity and noise, even within the same cell type. To address these challenges, various bioinformatic tools have been developed. Furthermore, the application of scATAC-seq in plant science is still in its infancy, with most research focusing on root tissues and model plant species. In this review, we provide an overview of recent progress in scATAC-seq and its application across various fields. We first conduct scATAC-seq in plant science. Next, we highlight the current challenges of scATAC-seq in plant science and major strategies for cell type annotation. Finally, we outline several future directions to exploit scATAC-seq technologies to address critical challenges in plant science, ranging from plant ENCODE(The Encyclopedia of DNA Elements) project construction to GRN inference, to deepen our understanding of the roles of CREs in plant biology.
Collapse
Affiliation(s)
- Chao Lu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yunxiao Wei
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Mubashir Abbas
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Hasi Agula
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Zhigang Meng
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Rui Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| |
Collapse
|
28
|
Yu J, Leng J, Hou Z, Sun D, Wu LY. Incorporating network diffusion and peak location information for better single-cell ATAC-seq data analysis. Brief Bioinform 2024; 25:bbae093. [PMID: 38493346 PMCID: PMC10944575 DOI: 10.1093/bib/bbae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 12/22/2023] [Accepted: 02/20/2024] [Indexed: 03/18/2024] Open
Abstract
Single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data provided new insights into the understanding of epigenetic heterogeneity and transcriptional regulation. With the increasing abundance of dataset resources, there is an urgent need to extract more useful information through high-quality data analysis methods specifically designed for scATAC-seq. However, analyzing scATAC-seq data poses challenges due to its near binarization, high sparsity and ultra-high dimensionality properties. Here, we proposed a novel network diffusion-based computational method to comprehensively analyze scATAC-seq data, named Single-Cell ATAC-seq Analysis via Network Refinement with Peaks Location Information (SCARP). SCARP formulates the Network Refinement diffusion method under the graph theory framework to aggregate information from different network orders, effectively compensating for missing signals in the scATAC-seq data. By incorporating distance information between adjacent peaks on the genome, SCARP also contributes to depicting the co-accessibility of peaks. These two innovations empower SCARP to obtain lower-dimensional representations for both cells and peaks more effectively. We have demonstrated through sufficient experiments that SCARP facilitated superior analyses of scATAC-seq data. Specifically, SCARP exhibited outstanding cell clustering performance, enabling better elucidation of cell heterogeneity and the discovery of new biologically significant cell subpopulations. Additionally, SCARP was also instrumental in portraying co-accessibility relationships of accessible regions and providing new insight into transcriptional regulation. Consequently, SCARP identified genes that were involved in key Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to diseases and predicted reliable cis-regulatory interactions. To sum up, our studies suggested that SCARP is a promising tool to comprehensively analyze the scATAC-seq data.
Collapse
Affiliation(s)
- Jiating Yu
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing 210044, China
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jiacheng Leng
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
- Zhejiang Lab, Hangzhou 311121, China
| | - Zhichao Hou
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Duanchen Sun
- School of Mathematics, Shandong University, Jinan 250100, China
| | - Ling-Yun Wu
- IAM, MADIS, NCMIS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
29
|
Kudron M, Gevirtzman L, Victorsen A, Lear BC, Gao J, Xu J, Samanta S, Frink E, Tran-Pearson A, Huynh C, Vafeados D, Hammonds A, Fisher W, Wall M, Wesseling G, Hernandez V, Lin Z, Kasparian M, White K, Allada R, Gerstein M, Hillier L, Celniker SE, Reinke V, Waterston RH. Binding profiles for 954 Drosophila and C. elegans transcription factors reveal tissue specific regulatory relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.18.576242. [PMID: 38293065 PMCID: PMC10827215 DOI: 10.1101/2024.01.18.576242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
A catalog of transcription factor (TF) binding sites in the genome is critical for deciphering regulatory relationships. Here we present the culmination of the modERN (model organism Encyclopedia of Regulatory Networks) consortium that systematically assayed TF binding events in vivo in two major model organisms, Drosophila melanogaster (fly) and Caenorhabditis elegans (worm). We describe key features of these datasets, comprising 604 TFs identifying 3.6M sites in the fly and 350 TFs identifying 0.9 M sites in the worm. Applying a machine learning model to these data identifies sets of TFs with a prominent role in promoting target gene expression in specific cell types. TF binding data are available through the ENCODE Data Coordinating Center and at https://epic.gs.washington.edu/modERNresource, which provides access to processed and summary data, as well as widgets to probe cell type-specific TF-target relationships. These data are a rich resource that should fuel investigations into TF function during development.
Collapse
Affiliation(s)
- Michelle Kudron
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Alec Victorsen
- Department of Laboratory Medicine & Pathology, University of Minnesota, Minneapolis, MN 55455
| | - Bridget C. Lear
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Jiahao Gao
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
| | - Jinrui Xu
- Department of Biology, Howard University, Washington, District of Columbia 20059, USA
- Center for Applied Data Science and Analytics, Howard University, Washington, District of Columbia 20059, USA
| | - Swapna Samanta
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Emily Frink
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Adri Tran-Pearson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Chau Huynh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Ann Hammonds
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - William Fisher
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Martha Wall
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Greg Wesseling
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Vanessa Hernandez
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Zhichun Lin
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Mary Kasparian
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Kevin White
- Department of Biochemistry and Precision Medicine Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597
| | - Ravi Allada
- Department of Neurobiology, Northwestern University, Evanston IL 60208
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
- Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06520, USA
| | - LaDeana Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Susan E. Celniker
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Robert H. Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| |
Collapse
|
30
|
Aragones DG, Palomino-Segura M, Sicilia J, Crainiciuc G, Ballesteros I, Sánchez-Cabo F, Hidalgo A, Calvo GF. Variable selection for nonlinear dimensionality reduction of biological datasets through bootstrapping of correlation networks. Comput Biol Med 2024; 168:107827. [PMID: 38086138 DOI: 10.1016/j.compbiomed.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/15/2023] [Accepted: 12/04/2023] [Indexed: 01/10/2024]
Abstract
Identifying the most relevant variables or features in massive datasets for dimensionality reduction can lead to improved and more informative display, faster computation times, and more explainable models of complex systems. Despite significant advances and available algorithms, this task generally remains challenging, especially in unsupervised settings. In this work, we propose a method that constructs correlation networks using all intervening variables and then selects the most informative ones based on network bootstrapping. The method can be applied in both supervised and unsupervised scenarios. We demonstrate its functionality by applying Uniform Manifold Approximation and Projection for dimensionality reduction to several high-dimensional biological datasets, derived from 4D live imaging recordings of hundreds of morpho-kinetic variables, describing the dynamics of thousands of individual leukocytes at sites of prominent inflammation. We compare our method with other standard ones in the field, such as Principal Component Analysis and Elastic Net, showing that it outperforms them. The proposed method can be employed in a wide range of applications, encompassing data analysis and machine learning.
Collapse
Affiliation(s)
- David G Aragones
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - Miguel Palomino-Segura
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain; Immunophysiology Research Group, Instituto Universitario de Investigación Biosanitaria de Extremadura (INUBE), Badajoz, Spain; Department of Physiology, Faculty of Sciences, University of Extremadura, Badajoz, Spain
| | - Jon Sicilia
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Georgiana Crainiciuc
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Iván Ballesteros
- Area of Cell and Developmental Biology, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Fátima Sánchez-Cabo
- Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid, Spain
| | - Andrés Hidalgo
- Vascular Biology and Therapeutics Program and Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA
| | - Gabriel F Calvo
- Department of Mathematics & MOLAB-Mathematical Oncology Laboratory, Universidad de Castilla-La Mancha, Ciudad Real, Spain.
| |
Collapse
|
31
|
Halblander FN, Meng FW, Murphy PJ. Anp32e protects against accumulation of H2A.Z at Sox motif containing promoters during zebrafish gastrulation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.572196. [PMID: 38187710 PMCID: PMC10769258 DOI: 10.1101/2023.12.18.572196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Epigenetic regulation of chromatin states is crucial for proper gene expression programs and progression during development, but precise mechanisms by which epigenetic factors influence differentiation remain poorly understood. Here we find that the histone variant H2A.Z accumulates at Sox motif-containing promoters during zebrafish gastrulation while neighboring genes become transcriptionally active. These changes coincide with reduced expression of anp32e, the H2A.Z histone removal chaperone, suggesting that loss of Anp32e may lead to increases in H2A.Z during differentiation. Remarkably, genetic removal of Anp32e in embryos leads to H2A.Z accumulation prior to gastrulation, and precocious developmental transcription of Sox motif associated genes. Altogether, our results provide compelling evidence for a mechanism in which Anp32e restricts H2A.Z accumulation at Sox motif-containing promoters, and subsequent down-regulation of Anp32e enables temporal up-regulation of Sox motif associated genes.
Collapse
Affiliation(s)
- Fabian N. Halblander
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester NY, 14642, USA
| | - Fanju W. Meng
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester NY, 14642, USA
| | - Patrick J. Murphy
- Department of Biomedical Genetics, University of Rochester Medical Center, Rochester NY, 14642, USA
| |
Collapse
|
32
|
Wilkinson AL, Zorzan I, Rugg-Gunn PJ. Epigenetic regulation of early human embryo development. Cell Stem Cell 2023; 30:1569-1584. [PMID: 37858333 DOI: 10.1016/j.stem.2023.09.010] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/18/2023] [Accepted: 09/25/2023] [Indexed: 10/21/2023]
Abstract
Studies of mammalian development have advanced our understanding of the genetic, epigenetic, and cellular processes that orchestrate embryogenesis and have uncovered new insights into the unique aspects of human embryogenesis. Recent studies have now produced the first epigenetic maps of early human embryogenesis, stimulating new ideas about epigenetic reprogramming, cell fate control, and the potential mechanisms underpinning developmental plasticity in human embryos. In this review, we discuss these new insights into the epigenetic regulation of early human development and the importance of these processes for safeguarding development. We also highlight unanswered questions and key challenges that remain to be addressed.
Collapse
Affiliation(s)
| | - Irene Zorzan
- Epigenetics Programme, Babraham Institute, Cambridge, UK
| | - Peter J Rugg-Gunn
- Epigenetics Programme, Babraham Institute, Cambridge, UK; Centre for Trophoblast Research, University of Cambridge, Cambridge, UK; Wellcome-MRC Cambridge Stem Cell Institute, Cambridge, UK.
| |
Collapse
|
33
|
Persad S, Choo ZN, Dien C, Sohail N, Masilionis I, Chaligné R, Nawy T, Brown CC, Sharma R, Pe'er I, Setty M, Pe'er D. SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data. Nat Biotechnol 2023; 41:1746-1757. [PMID: 36973557 PMCID: PMC10713451 DOI: 10.1038/s41587-023-01716-9] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Accepted: 02/20/2023] [Indexed: 03/29/2023]
Abstract
Metacells are cell groupings derived from single-cell sequencing data that represent highly granular, distinct cell states. Here we present single-cell aggregation of cell states (SEACells), an algorithm for identifying metacells that overcome the sparsity of single-cell data while retaining heterogeneity obscured by traditional cell clustering. SEACells outperforms existing algorithms in identifying comprehensive, compact and well-separated metacells in both RNA and assay for transposase-accessible chromatin (ATAC) modalities across datasets with discrete cell types and continuous trajectories. We demonstrate the use of SEACells to improve gene-peak associations, compute ATAC gene scores and infer the activities of critical regulators during differentiation. Metacell-level analysis scales to large datasets and is particularly well suited for patient cohorts, where per-patient aggregation provides more robust units for data integration. We use our metacells to reveal expression dynamics and gradual reconfiguration of the chromatin landscape during hematopoietic differentiation and to uniquely identify CD4 T cell differentiation and activation states associated with disease onset and severity in a Coronavirus Disease 2019 (COVID-19) patient cohort.
Collapse
Affiliation(s)
- Sitara Persad
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Zi-Ning Choo
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christine Dien
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Noor Sohail
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ignas Masilionis
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ronan Chaligné
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tal Nawy
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Chrysothemis C Brown
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Roshan Sharma
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Itsik Pe'er
- Department of Computer Science, Fu Foundation School of Engineering & Applied Science, Columbia University, New York, NY, USA
| | - Manu Setty
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA.
- Computational Biology Program, Public Health Sciences Division and Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle, WA, USA.
| | - Dana Pe'er
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Howard Hughes Medical Institute, New York, NY, USA.
| |
Collapse
|
34
|
Devens HR, Davidson PL, Byrne M, Wray GA. Hybrid Epigenomes Reveal Extensive Local Genetic Changes to Chromatin Accessibility Contribute to Divergence in Embryonic Gene Expression Between Species. Mol Biol Evol 2023; 40:msad222. [PMID: 37823438 PMCID: PMC10638671 DOI: 10.1093/molbev/msad222] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 06/14/2023] [Accepted: 07/27/2023] [Indexed: 10/13/2023] Open
Abstract
Chromatin accessibility plays an important role in shaping gene expression, yet little is known about the genetic and molecular mechanisms that influence the evolution of chromatin configuration. Both local (cis) and distant (trans) genetic influences can in principle influence chromatin accessibility and are based on distinct molecular mechanisms. We, therefore, sought to characterize the role that each of these plays in altering chromatin accessibility in 2 closely related sea urchin species. Using hybrids of Heliocidaris erythrogramma and Heliocidaris tuberculata, and adapting a statistical framework previously developed for the analysis of cis and trans influences on the transcriptome, we examined how these mechanisms shape the regulatory landscape at 3 important developmental stages, and compared our results to similar analyses of the transcriptome. We found extensive cis- and trans-based influences on evolutionary changes in chromatin, with cis effects generally larger in effect. Evolutionary changes in accessibility and gene expression are correlated, especially when expression has a local genetic basis. Maternal influences appear to have more of an effect on chromatin accessibility than on gene expression, persisting well past the maternal-to-zygotic transition. Chromatin accessibility near gene regulatory network genes appears to be distinctly regulated, with trans factors appearing to play an outsized role in the configuration of chromatin near these genes. Together, our results represent the first attempt to quantify cis and trans influences on evolutionary divergence in chromatin configuration in an outbred natural study system and suggest that chromatin regulation is more genetically complex than was previously appreciated.
Collapse
Affiliation(s)
| | | | - Maria Byrne
- School of Medical Science, The University of Sydney, Sydney, New South Wales, Australia
- School of Life and Environmental Science, The University of Sydney, Sydney, New South Wales, Australia
| | - Gregory A Wray
- Department of Biology, Duke University, Durham, NC, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC, USA
| |
Collapse
|
35
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023; 24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
- Department of Statistics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
36
|
Brennan KJ, Weilert M, Krueger S, Pampari A, Liu HY, Yang AWH, Morrison JA, Hughes TR, Rushlow CA, Kundaje A, Zeitlinger J. Chromatin accessibility in the Drosophila embryo is determined by transcription factor pioneering and enhancer activation. Dev Cell 2023; 58:1898-1916.e9. [PMID: 37557175 PMCID: PMC10592203 DOI: 10.1016/j.devcel.2023.07.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 05/09/2023] [Accepted: 07/13/2023] [Indexed: 08/11/2023]
Abstract
Chromatin accessibility is integral to the process by which transcription factors (TFs) read out cis-regulatory DNA sequences, but it is difficult to differentiate between TFs that drive accessibility and those that do not. Deep learning models that learn complex sequence rules provide an unprecedented opportunity to dissect this problem. Using zygotic genome activation in Drosophila as a model, we analyzed high-resolution TF binding and chromatin accessibility data with interpretable deep learning and performed genetic validation experiments. We identify a hierarchical relationship between the pioneer TF Zelda and the TFs involved in axis patterning. Zelda consistently pioneers chromatin accessibility proportional to motif affinity, whereas patterning TFs augment chromatin accessibility in sequence contexts where they mediate enhancer activation. We conclude that chromatin accessibility occurs in two tiers: one through pioneering, which makes enhancers accessible but not necessarily active, and the second when the correct combination of TFs leads to enhancer activation.
Collapse
Affiliation(s)
- Kaelan J Brennan
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Melanie Weilert
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Sabrina Krueger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Anusri Pampari
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA
| | - Hsiao-Yun Liu
- Department of Biology, New York University, New York, NY 10003, USA
| | - Ally W H Yang
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Jason A Morrison
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Timothy R Hughes
- Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada
| | | | - Anshul Kundaje
- Department of Computer Science, Stanford University, Palo Alto, CA 94305, USA; Department of Genetics, Stanford University, Palo Alto, CA 94305, USA
| | - Julia Zeitlinger
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA; Department of Pathology & Laboratory Medicine, The University of Kansas Medical Center, Kansas City, KS 66160, USA.
| |
Collapse
|
37
|
Gamache J, Gingerich D, Shwab EK, Barrera J, Garrett ME, Hume C, Crawford GE, Ashley-Koch AE, Chiba-Falek O. Integrative single-nucleus multi-omics analysis prioritizes candidate cis and trans regulatory networks and their target genes in Alzheimer's disease brains. Cell Biosci 2023; 13:185. [PMID: 37789374 PMCID: PMC10546724 DOI: 10.1186/s13578-023-01120-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 08/30/2023] [Indexed: 10/05/2023] Open
Abstract
BACKGROUND The genetic underpinnings of late-onset Alzheimer's disease (LOAD) are yet to be fully elucidated. Although numerous LOAD-associated loci have been discovered, the causal variants and their target genes remain largely unknown. Since the brain is composed of heterogenous cell subtypes, it is imperative to study the brain on a cell subtype specific level to explore the biological processes underlying LOAD. METHODS Here, we present the largest parallel single-nucleus (sn) multi-omics study to simultaneously profile gene expression (snRNA-seq) and chromatin accessibility (snATAC-seq) to date, using nuclei from 12 normal and 12 LOAD brains. We identified cell subtype clusters based on gene expression and chromatin accessibility profiles and characterized cell subtype-specific LOAD-associated differentially expressed genes (DEGs), differentially accessible peaks (DAPs) and cis co-accessibility networks (CCANs). RESULTS Integrative analysis defined disease-relevant CCANs in multiple cell subtypes and discovered LOAD-associated cell subtype-specific candidate cis regulatory elements (cCREs), their candidate target genes, and trans-interacting transcription factors (TFs), some of which, including ELK1, JUN, and SMAD4 in excitatory neurons, were also LOAD-DEGs. Finally, we focused on a subset of cell subtype-specific CCANs that overlap known LOAD-GWAS regions and catalogued putative functional SNPs changing the affinities of TF motifs within LOAD-cCREs linked to LOAD-DEGs, including APOE and MYO1E in a specific subtype of microglia and BIN1 in a subpopulation of oligodendrocytes. CONCLUSIONS To our knowledge, this study represents the most comprehensive systematic interrogation to date of regulatory networks and the impact of genetic variants on gene dysregulation in LOAD at a cell subtype resolution. Our findings reveal crosstalk between epigenetic, genomic, and transcriptomic determinants of LOAD pathogenesis and define catalogues of candidate genes, cCREs, and variants involved in LOAD genetic etiology and the cell subtypes in which they act to exert their pathogenic effects. Overall, these results suggest that cell subtype-specific cis-trans interactions between regulatory elements and TFs, and the genes dysregulated by these networks contribute to the development of LOAD.
Collapse
Affiliation(s)
- Julia Gamache
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Daniel Gingerich
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - E Keats Shwab
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Julio Barrera
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Melanie E Garrett
- Duke Molecular Physiology Institute, Duke University Medical Center, DUMC Box 104775, Durham, NC, 27701, USA
| | - Cordelia Hume
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA
| | - Gregory E Crawford
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA.
- Department of Pediatrics, Division of Medical Genetics, Duke University Medical Center, DUMC Box 3382, Durham, NC, 27708, USA.
- Center for Advanced Genomic Technologies, Duke University Medical Center, Durham, NC, 27708, USA.
| | - Allison E Ashley-Koch
- Duke Molecular Physiology Institute, Duke University Medical Center, DUMC Box 104775, Durham, NC, 27701, USA.
- Department of Medicine, Duke University Medical Center, Durham, NC, 27708, USA.
| | - Ornit Chiba-Falek
- Division of Translational Brain Sciences, Department of Neurology, Duke University Medical Center, DUMC Box 2900, Durham, NC, 27710, USA.
- Center for Genomic and Computational Biology, Duke University Medical Center, Durham, NC, 27708, USA.
| |
Collapse
|
38
|
Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.03.531029. [PMID: 36945441 PMCID: PMC10028846 DOI: 10.1101/2023.03.03.531029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
Abstract
Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Collapse
Affiliation(s)
- Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Research Computing Center, University of Chicago, Chicago, IL, USA
| | - Kaixuan Luo
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Vesalius Therapeutics, Cambridge, MA, USA
| | - Anthony Hung
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Karl Tayeb
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Committee on Genetics, Genomics and Systems Biology, University of Chicago, Chicago, IL, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Department of Statistics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
39
|
Mendieta JP, Sangra A, Yan H, Minow MAA, Schmitz RJ. Exploring plant cis-regulatory elements at single-cell resolution: overcoming biological and computational challenges to advance plant research. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023; 115:1486-1499. [PMID: 37309871 PMCID: PMC10598807 DOI: 10.1111/tpj.16351] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 06/06/2023] [Accepted: 06/08/2023] [Indexed: 06/14/2023]
Abstract
Cis-regulatory elements (CREs) are important sequences for gene expression and for plant biological processes such as development, evolution, domestication, and stress response. However, studying CREs in plant genomes has been challenging. The totipotent nature of plant cells, coupled with the inability to maintain plant cell types in culture and the inherent technical challenges posed by the cell wall has limited our understanding of how plant cell types acquire and maintain their identities and respond to the environment via CRE usage. Advances in single-cell epigenomics have revolutionized the field of identifying cell-type-specific CREs. These new technologies have the potential to significantly advance our understanding of plant CRE biology, and shed light on how the regulatory genome gives rise to diverse plant phenomena. However, there are significant biological and computational challenges associated with analyzing single-cell epigenomic datasets. In this review, we discuss the historical and foundational underpinnings of plant single-cell research, challenges, and common pitfalls in the analysis of plant single-cell epigenomic data, and highlight biological challenges unique to plants. Additionally, we discuss how the application of single-cell epigenomic data in various contexts stands to transform our understanding of the importance of CREs in plant genomes.
Collapse
Affiliation(s)
| | - Ankush Sangra
- Department of Genetics, University of Georgia, Athens, 30602, Georgia, USA
| | - Haidong Yan
- Department of Genetics, University of Georgia, Athens, 30602, Georgia, USA
| | - Mark A A Minow
- Department of Genetics, University of Georgia, Athens, 30602, Georgia, USA
| | - Robert J Schmitz
- Department of Genetics, University of Georgia, Athens, 30602, Georgia, USA
| |
Collapse
|
40
|
Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P, Cousin P, Iseli C, Restrepo S, Guex N, McCabe BD, Jankowski A, Levine MS, Gambetta MC. Chromosome-level organization of the regulatory genome in the Drosophila nervous system. Cell 2023; 186:3826-3844.e26. [PMID: 37536338 PMCID: PMC10529364 DOI: 10.1016/j.cell.2023.07.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Revised: 03/31/2023] [Accepted: 07/06/2023] [Indexed: 08/05/2023]
Abstract
Previous studies have identified topologically associating domains (TADs) as basic units of genome organization. We present evidence of a previously unreported level of genome folding, where distant TAD pairs, megabases apart, interact to form meta-domains. Within meta-domains, gene promoters and structural intergenic elements present in distant TADs are specifically paired. The associated genes encode neuronal determinants, including those engaged in axonal guidance and adhesion. These long-range associations occur in a large fraction of neurons but support transcription in only a subset of neurons. Meta-domains are formed by diverse transcription factors that are able to pair over long and flexible distances. We present evidence that two such factors, GAF and CTCF, play direct roles in this process. The relative simplicity of higher-order meta-domain interactions in Drosophila, compared with those previously described in mammals, allowed the demonstration that genomes can fold into highly specialized cell-type-specific scaffolds that enable megabase-scale regulatory associations.
Collapse
Affiliation(s)
- Giriram Mohana
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Julien Dorier
- Bioinformatics Competence Center, University of Lausanne, 1015 Lausanne, Switzerland; Bioinformatics Competence Center, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Xiao Li
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Marion Mouginot
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Rebecca C Smith
- Brain Mind Institute, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Héléna Malek
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Marion Leleu
- Bioinformatics Competence Center, University of Lausanne, 1015 Lausanne, Switzerland; Bioinformatics Competence Center, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Daniel Rodriguez
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Jenisha Khadka
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Patrycja Rosa
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland
| | - Pascal Cousin
- Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christian Iseli
- Bioinformatics Competence Center, University of Lausanne, 1015 Lausanne, Switzerland; Bioinformatics Competence Center, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Simon Restrepo
- Arcoris bio AG, Lüssirainstrasse 52, 6300 Zug, Switzerland
| | - Nicolas Guex
- Bioinformatics Competence Center, University of Lausanne, 1015 Lausanne, Switzerland; Bioinformatics Competence Center, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Brian D McCabe
- Brain Mind Institute, Swiss Federal Institute of Technology Lausanne, 1015 Lausanne, Switzerland
| | - Aleksander Jankowski
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, 02-097 Warsaw, Poland.
| | - Michael S Levine
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| | | |
Collapse
|
41
|
Armendariz DA, Sundarrajan A, Hon GC. Breaking enhancers to gain insights into developmental defects. eLife 2023; 12:e88187. [PMID: 37497775 PMCID: PMC10374278 DOI: 10.7554/elife.88187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023] Open
Abstract
Despite ground-breaking genetic studies that have identified thousands of risk variants for developmental diseases, how these variants lead to molecular and cellular phenotypes remains a gap in knowledge. Many of these variants are non-coding and occur at enhancers, which orchestrate key regulatory programs during development. The prevailing paradigm is that non-coding variants alter the activity of enhancers, impacting gene expression programs, and ultimately contributing to disease risk. A key obstacle to progress is the systematic functional characterization of non-coding variants at scale, especially since enhancer activity is highly specific to cell type and developmental stage. Here, we review the foundational studies of enhancers in developmental disease and current genomic approaches to functionally characterize developmental enhancers and their variants at scale. In the coming decade, we anticipate systematic enhancer perturbation studies to link non-coding variants to molecular mechanisms, changes in cell state, and disease phenotypes.
Collapse
Affiliation(s)
- Daniel A Armendariz
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Anjana Sundarrajan
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
| | - Gary C Hon
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, United States
- Hamon Center for Regenerative Science and Medicine, University of Texas Southwestern Medical Center, Dallas, United States
- Lyda Hill Department of Bioinformatics, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, United States
| |
Collapse
|
42
|
Yang Q, Xu Z, Zhou W, Wang P, Jiang Q, Juan L. An interpretable single-cell RNA sequencing data clustering method based on latent Dirichlet allocation. Brief Bioinform 2023; 24:bbad199. [PMID: 37225419 PMCID: PMC10359080 DOI: 10.1093/bib/bbad199] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 05/04/2023] [Accepted: 05/08/2023] [Indexed: 05/26/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) detects whole transcriptome signals for large amounts of individual cells and is powerful for determining cell-to-cell differences and investigating the functional characteristics of various cell types. scRNA-seq datasets are usually sparse and highly noisy. Many steps in the scRNA-seq analysis workflow, including reasonable gene selection, cell clustering and annotation, as well as discovering the underlying biological mechanisms from such datasets, are difficult. In this study, we proposed an scRNA-seq analysis method based on the latent Dirichlet allocation (LDA) model. The LDA model estimates a series of latent variables, i.e. putative functions (PFs), from the input raw cell-gene data. Thus, we incorporated the 'cell-function-gene' three-layer framework into scRNA-seq analysis, as this framework is capable of discovering latent and complex gene expression patterns via a built-in model approach and obtaining biologically meaningful results through a data-driven functional interpretation process. We compared our method with four classic methods on seven benchmark scRNA-seq datasets. The LDA-based method performed best in the cell clustering test in terms of both accuracy and purity. By analysing three complex public datasets, we demonstrated that our method could distinguish cell types with multiple levels of functional specialization, and precisely reconstruct cell development trajectories. Moreover, the LDA-based method accurately identified the representative PFs and the representative genes for the cell types/cell stages, enabling data-driven cell cluster annotation and functional interpretation. According to the literature, most of the previously reported marker/functionally relevant genes were recognized.
Collapse
Affiliation(s)
- Qi Yang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Zhaochun Xu
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
43
|
Otto D, Jordan C, Dury B, Dien C, Setty M. Quantifying Cell-State Densities in Single-Cell Phenotypic Landscapes using Mellon. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.09.548272. [PMID: 37502954 PMCID: PMC10369887 DOI: 10.1101/2023.07.09.548272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Cell-state density characterizes the distribution of cells along phenotypic landscapes and is crucial for unraveling the mechanisms that drive cellular differentiation, regeneration, and disease. Here, we present Mellon, a novel computational algorithm for high-resolution estimation of cell-state densities from single-cell data. We demonstrate Mellon's efficacy by dissecting the density landscape of various differentiating systems, revealing a consistent pattern of high-density regions corresponding to major cell types intertwined with low-density, rare transitory states. Utilizing hematopoietic stem cell fate specification to B-cells as a case study, we present evidence implicating enhancer priming and the activation of master regulators in the emergence of these transitory states. Mellon offers the flexibility to perform temporal interpolation of time-series data, providing a detailed view of cell-state dynamics during the inherently continuous developmental processes. Scalable and adaptable, Mellon facilitates density estimation across various single-cell data modalities, scaling linearly with the number of cells. Our work underscores the importance of cell-state density in understanding the differentiation processes, and the potential of Mellon to provide new insights into the regulatory mechanisms guiding cellular fate decisions.
Collapse
Affiliation(s)
- Dominik Otto
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA
- Computational Biology Program, Public Health Sciences Division, Seattle WA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
| | - Cailin Jordan
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA
- Computational Biology Program, Public Health Sciences Division, Seattle WA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
- Molecular and Cellular Biology Program, University of Washington, Seattle WA
| | - Brennan Dury
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA
- Computational Biology Program, Public Health Sciences Division, Seattle WA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
| | - Christine Dien
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA
- Computational Biology Program, Public Health Sciences Division, Seattle WA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
| | - Manu Setty
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle WA
- Computational Biology Program, Public Health Sciences Division, Seattle WA
- Translational Data Science IRC, Fred Hutchinson Cancer Center, Seattle WA
| |
Collapse
|
44
|
Raimundo F, Prompsy P, Vert JP, Vallot C. A benchmark of computational pipelines for single-cell histone modification data. Genome Biol 2023; 24:143. [PMID: 37340307 PMCID: PMC10280832 DOI: 10.1186/s13059-023-02981-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 06/07/2023] [Indexed: 06/22/2023] Open
Abstract
BACKGROUND Single-cell histone post translational modification (scHPTM) assays such as scCUT&Tag or scChIP-seq allow single-cell mapping of diverse epigenomic landscapes within complex tissues and are likely to unlock our understanding of various mechanisms involved in development or diseases. Running scHTPM experiments and analyzing the data produced remains challenging since few consensus guidelines currently exist regarding good practices for experimental design and data analysis pipelines. RESULTS We perform a computational benchmark to assess the impact of experimental parameters and data analysis pipelines on the ability of the cell representation to recapitulate known biological similarities. We run more than ten thousand experiments to systematically study the impact of coverage and number of cells, of the count matrix construction method, of feature selection and normalization, and of the dimension reduction algorithm used. This allows us to identify key experimental parameters and computational choices to obtain a good representation of single-cell HPTM data. We show in particular that the count matrix construction step has a strong influence on the quality of the representation and that using fixed-size bin counts outperforms annotation-based binning. Dimension reduction methods based on latent semantic indexing outperform others, and feature selection is detrimental, while keeping only high-quality cells has little influence on the final representation as long as enough cells are analyzed. CONCLUSIONS This benchmark provides a comprehensive study on how experimental parameters and computational choices affect the representation of single-cell HPTM data. We propose a series of recommendations regarding matrix construction, feature and cell selection, and dimensionality reduction algorithms.
Collapse
Affiliation(s)
- Félix Raimundo
- Google Research, Brain team, 75009, Paris, France
- Translational Research Department, Institut Curie, PSL Research University, 75005, Paris, France
| | - Pacôme Prompsy
- Translational Research Department, Institut Curie, PSL Research University, 75005, Paris, France
- CNRS UMR3244, Institut Curie, PSL Research University, 75005, Paris, France
| | - Jean-Philippe Vert
- Google Research, Brain team, 75009, Paris, France.
- Owkin, Inc, NY, New York, USA.
| | - Céline Vallot
- Translational Research Department, Institut Curie, PSL Research University, 75005, Paris, France.
- CNRS UMR3244, Institut Curie, PSL Research University, 75005, Paris, France.
| |
Collapse
|
45
|
Van de Sande B, Lee JS, Mutasa-Gottgens E, Naughton B, Bacon W, Manning J, Wang Y, Pollard J, Mendez M, Hill J, Kumar N, Cao X, Chen X, Khaladkar M, Wen J, Leach A, Ferran E. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov 2023; 22:496-520. [PMID: 37117846 PMCID: PMC10141847 DOI: 10.1038/s41573-023-00688-4] [Citation(s) in RCA: 138] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2023] [Indexed: 04/30/2023]
Abstract
Single-cell technologies, particularly single-cell RNA sequencing (scRNA-seq) methods, together with associated computational tools and the growing availability of public data resources, are transforming drug discovery and development. New opportunities are emerging in target identification owing to improved disease understanding through cell subtyping, and highly multiplexed functional genomics screens incorporating scRNA-seq are enhancing target credentialling and prioritization. ScRNA-seq is also aiding the selection of relevant preclinical disease models and providing new insights into drug mechanisms of action. In clinical development, scRNA-seq can inform decision-making via improved biomarker identification for patient stratification and more precise monitoring of drug response and disease progression. Here, we illustrate how scRNA-seq methods are being applied in key steps in drug discovery and development, and discuss ongoing challenges for their implementation in the pharmaceutical industry.
Collapse
Affiliation(s)
| | | | | | - Bart Naughton
- Computational Neurobiology, Eisai, Cambridge, MA, USA
| | - Wendi Bacon
- EMBL-EBI, Wellcome Genome Campus, Hinxton, UK
- The Open University, Milton Keynes, UK
| | | | - Yong Wang
- Precision Bioinformatics, Prometheus Biosciences, San Diego, CA, USA
| | | | - Melissa Mendez
- Genomic Sciences, GlaxoSmithKline, Collegeville, PA, USA
| | - Jon Hill
- Global Computational Biology and Digital Sciences, Boehringer Ingelheim Pharmaceuticals Inc., Ridgefield, CT, USA
| | - Namit Kumar
- Informatics & Predictive Sciences, Bristol Myers Squibb, San Diego, CA, USA
| | - Xiaohong Cao
- Genomic Research Center, AbbVie Inc., Cambridge, MA, USA
| | - Xiao Chen
- Magnet Biomedicine, Cambridge, MA, USA
| | - Mugdha Khaladkar
- Human Genetics and Computational Biology, GlaxoSmithKline, Collegeville, PA, USA
| | - Ji Wen
- Oncology Research and Development Unit, Pfizer, La Jolla, CA, USA
| | | | | |
Collapse
|
46
|
Huynh K, Smith BR, Macdonald SJ, Long AD. Genetic variation in chromatin state across multiple tissues in Drosophila melanogaster. PLoS Genet 2023; 19:e1010439. [PMID: 37146087 PMCID: PMC10191298 DOI: 10.1371/journal.pgen.1010439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 05/17/2023] [Accepted: 04/20/2023] [Indexed: 05/07/2023] Open
Abstract
We use ATAC-seq to examine chromatin accessibility for four different tissues in Drosophila melanogaster: adult female brain, ovaries, and both wing and eye-antennal imaginal discs from males. Each tissue is assayed in eight different inbred strain genetic backgrounds, seven associated with a reference quality genome assembly. We develop a method for the quantile normalization of ATAC-seq fragments and test for differences in coverage among genotypes, tissues, and their interaction at 44099 peaks throughout the euchromatic genome. For the strains with reference quality genome assemblies, we correct ATAC-seq profiles for read mis-mapping due to nearby polymorphic structural variants (SVs). Comparing coverage among genotypes without accounting for SVs results in a highly elevated rate (55%) of identifying false positive differences in chromatin state between genotypes. After SV correction, we identify 1050, 30383, and 4508 regions whose peak heights are polymorphic among genotypes, among tissues, or exhibit genotype-by-tissue interactions, respectively. Finally, we identify 3988 candidate causative variants that explain at least 80% of the variance in chromatin state at nearby ATAC-seq peaks.
Collapse
Affiliation(s)
- Khoi Huynh
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California, United States of America
| | - Brittny R. Smith
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
- Center for Computational Biology, University of Kansas, Lawrence, Kansas, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California, United States of America
| |
Collapse
|
47
|
Qiu Y, Yan C, Zhao P, Zou Q. SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data. Brief Bioinform 2023; 24:7147025. [PMID: 37122068 DOI: 10.1093/bib/bbad149] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. AVAILABILITY AND IMPLEMENTATION The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.
Collapse
Affiliation(s)
- Yushan Qiu
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Chang Yan
- College of Mathematics and Statistics, Shenzhen University, 518000, Guangdong, China
| | - Pu Zhao
- College of Life and Health Sciences, Northeastern University, Shenyang, 110169, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610056, China
| |
Collapse
|
48
|
Bera BS, Thompson TV, Sosa E, Nomaru H, Reynolds D, Dubin RA, Maqbool SB, Zheng D, Morrow BE, Greally JM, Suzuki M. An optimized approach for multiplexing single-nuclear ATAC-seq using oligonucleotide-conjugated antibodies. Epigenetics Chromatin 2023; 16:14. [PMID: 37118773 PMCID: PMC10142415 DOI: 10.1186/s13072-023-00486-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 04/13/2023] [Indexed: 04/30/2023] Open
Abstract
BACKGROUND Single-cell technologies to analyze transcription and chromatin structure have been widely used in many research areas to reveal the functions and molecular properties of cells at single-cell resolution. Sample multiplexing techniques are valuable when performing single-cell analysis, reducing technical variation and permitting cost efficiencies. Several commercially available methods have been used in many scRNA-seq studies. On the other hand, while several methods have been published, multiplexing techniques for single nuclear assay for transposase-accessible chromatin (snATAC)-seq assays remain under development. We developed a simple nucleus hashing method using oligonucleotide-conjugated antibodies recognizing nuclear pore complex proteins, NuHash, to perform snATAC-seq library preparations by multiplexing. RESULTS We performed multiplexing snATAC-seq analyses on a mixture of human and mouse cell samples (two samples, 2-plex, and four samples, 4-plex) using NuHash. The analyses on nuclei with at least 10,000 read counts showed that the demultiplexing accuracy of NuHash was high, and only ten out of 9144 nuclei (2-plex) and 150 of 12,208 nuclei (4-plex) had discordant classifications between NuHash demultiplexing and discrimination using reference genome alignments. The differential open chromatin region (OCR) analysis between female and male samples revealed that male-specific OCRs were enriched in chromosome Y (four out of nine). We also found that five female-specific OCRs (20 OCRs) were on chromosome X. A comparative analysis between snATAC-seq and deeply sequenced bulk ATAC-seq on the same samples revealed that the bulk ATAC-seq signal intensity was positively correlated with the number of cell clusters detected in snATAC-seq. Moreover, when we categorized snATAC-seq peaks based on the number of cell clusters in which the peak was present, we observed different distributions over different genomic features between the groups. This result suggests that the peak intensities of bulk ATAC-seq can be used to identify different types of functional loci. CONCLUSIONS Our multiplexing method using oligo-conjugated anti-nuclear pore complex proteins, NuHash, permits high-accuracy demultiplexing of samples. The NuHash protocol is straightforward, works on frozen samples, and requires no modifications for snATAC-seq library preparation.
Collapse
Affiliation(s)
- Betelehem Solomon Bera
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA
| | - Taylor V Thompson
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Eric Sosa
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Hiroko Nomaru
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Thinkcyte Inc., Tokyo, Japan
| | - David Reynolds
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Robert A Dubin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Shahina B Maqbool
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Deyou Zheng
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Bernice E Morrow
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - John M Greally
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Masako Suzuki
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA.
- Department of Nutrition, Texas A&M University, College Station, TX, USA.
| |
Collapse
|
49
|
Qiu C, Martin BK, Welsh IC, Daza RM, Le TM, Huang X, Nichols EK, Taylor ML, Fulton O, O’Day DR, Gomes AR, Ilcisin S, Srivatsan S, Deng X, Disteche CM, Noble WS, Hamazaki N, Moens CB, Kimelman D, Cao J, Schier AF, Spielmann M, Murray SA, Trapnell C, Shendure J. A single-cell transcriptional timelapse of mouse embryonic development, from gastrula to pup. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.05.535726. [PMID: 37066300 PMCID: PMC10104014 DOI: 10.1101/2023.04.05.535726] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
The house mouse, Mus musculus, is an exceptional model system, combining genetic tractability with close homology to human biology. Gestation in mouse development lasts just under three weeks, a period during which its genome orchestrates the astonishing transformation of a single cell zygote into a free-living pup composed of >500 million cells. Towards a global framework for exploring mammalian development, we applied single cell combinatorial indexing (sci-*) to profile the transcriptional states of 12.4 million nuclei from 83 precisely staged embryos spanning late gastrulation (embryonic day 8 or E8) to birth (postnatal day 0 or P0), with 2-hr temporal resolution during somitogenesis, 6-hr resolution through to birth, and 20-min resolution during the immediate postpartum period. From these data (E8 to P0), we annotate dozens of trajectories and hundreds of cell types and perform deeper analyses of the unfolding of the posterior embryo during somitogenesis as well as the ontogenesis of the kidney, mesenchyme, retina, and early neurons. Finally, we leverage the depth and temporal resolution of these whole embryo snapshots, together with other published data, to construct and curate a rooted tree of cell type relationships that spans mouse development from zygote to pup. Throughout this tree, we systematically nominate sets of transcription factors (TFs) and other genes as candidate drivers of the in vivo differentiation of hundreds of mammalian cell types. Remarkably, the most dramatic shifts in transcriptional state are observed in a restricted set of cell types in the hours immediately following birth, and presumably underlie the massive changes in physiology that must accompany the successful transition of a placental mammal to extrauterine life.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Beth K. Martin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Riza M. Daza
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Truc-Mai Le
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Xingfan Huang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Eva K. Nichols
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Megan L. Taylor
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Olivia Fulton
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Diana R. O’Day
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | | | - Saskia Ilcisin
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
| | - Sanjay Srivatsan
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Medical Scientist Training Program, University of Washington, Seattle, WA, USA
| | - Xinxian Deng
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Christine M. Disteche
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Medicine, University of Washington, Seattle, WA, USA
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Nobuhiko Hamazaki
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| | - Cecilia B. Moens
- Division of Basic Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - David Kimelman
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Junyue Cao
- Laboratory of Single-cell genomics and Population dynamics, The Rockefeller University, New York, NY, USA
| | - Alexander F. Schier
- Biozentrum, University of Basel, Basel, Switzerland
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Malte Spielmann
- Max Planck Institute for Molecular Genetics, Berlin, Germany
- Institute of Human Genetics, University Hospitals Schleswig-Holstein, University of Lübeck and Kiel University, Lübeck, Kiel, Germany
- DZHK (German Centre for Cardiovascular Research), partner site Hamburg, Lübeck, Kiel, Lübeck, Germany
| | | | - Cole Trapnell
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA, USA
- Howard Hughes Medical Institute, Seattle, WA, USA
| |
Collapse
|
50
|
Jiang S, Huang Z, Li Y, Yu C, Yu H, Ke Y, Jiang L, Liu J. Single-cell chromatin accessibility and transcriptome atlas of mouse embryos. Cell Rep 2023; 42:112210. [PMID: 36881507 DOI: 10.1016/j.celrep.2023.112210] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 11/08/2022] [Accepted: 02/16/2023] [Indexed: 03/08/2023] Open
Abstract
Cis-regulatory elements regulate gene expression and lineage specification. However, the potential regulation of cis-elements on mammalian embryogenesis remains largely unexplored. To address this question, we perform single-cell assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA-seq in embryonic day 7.5 (E7.5) and E13.5 mouse embryos. We construct the chromatin accessibility landscapes with cell spatial information in E7.5 embryos, showing the spatial patterns of cis-elements and the spatial distribution of potentially functional transcription factors (TFs). We further show that many germ-layer-specific cis-elements and TFs in E7.5 embryos are maintained in the cell types derived from the corresponding germ layers at later stages, suggesting that these cis-elements and TFs are important during cell differentiation. We also find a potential progenitor for Sertoli and granulosa cells in gonads. Interestingly, both Sertoli and granulosa cells exist in male gonads and female gonads during gonad development. Collectively, we provide a valuable resource to understand organogenesis in mammals.
Collapse
Affiliation(s)
- Shan Jiang
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Zheng Huang
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yun Li
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Chengwei Yu
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; College of Future Technology College, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hao Yu
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; College of Future Technology College, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuwen Ke
- College of Biological Science, China Agricultural University, Beijing 100193, China
| | - Lan Jiang
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; College of Future Technology College, University of Chinese Academy of Sciences, Beijing 100049, China; Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Jiang Liu
- China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China; College of Future Technology College, University of Chinese Academy of Sciences, Beijing 100049, China; CAS Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.
| |
Collapse
|