1
|
Subedi S, Sumida TS, Park YP. A scalable approach to topic modelling in single-cell data by approximate pseudobulk projection. Life Sci Alliance 2024; 7:e202402713. [PMID: 39107066 PMCID: PMC11303850 DOI: 10.26508/lsa.202402713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/09/2024] Open
Abstract
Probabilistic topic modelling has become essential in many types of single-cell data analysis. Based on probabilistic topic assignments in each cell, we identify the latent representation of cellular states. A dictionary matrix, consisting of topic-specific gene frequency vectors, provides interpretable bases to be compared with known cell type-specific marker genes and other pathway annotations. However, fitting a topic model on a large number of cells would require heavy computational resources-specialized computing units, computing time and memory. Here, we present a scalable approximation method customized for single-cell RNA-seq data analysis, termed ASAP, short for Annotating a Single-cell data matrix by Approximate Pseudobulk estimation. Our approach is more accurate than existing methods but requires orders of magnitude less computing time, leaving much lower memory consumption. We also show that our approach is widely applicable for atlas-scale data analysis; our method seamlessly integrates single-cell and bulk data in joint analysis, not requiring additional preprocessing or feature selection steps.
Collapse
Affiliation(s)
- Sishir Subedi
- https://ror.org/03rmrcq20Bioinformatics Graduate Program, University of British Columbia, Vancouver, Canada
- BC Cancer Research, Vancouver, Canada
| | - Tomokazu S Sumida
- Neurology, Program for Neuroinflammation, Yale School of Medicine, New Haven, CT, USA
| | - Yongjin P Park
- BC Cancer Research, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
- Department of Statistics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
2
|
Hingerl JC, Martens LD, Karollus A, Manz T, Buenrostro JD, Theis FJ, Gagneur J. scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.19.613754. [PMID: 39345504 PMCID: PMC11429888 DOI: 10.1101/2024.09.19.613754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Understanding how regulatory DNA elements shape gene expression across individual cells is a fundamental challenge in genomics. Joint RNA-seq and epigenomic profiling provides opportunities to build unifying models of gene regulation capturing sequence determinants across steps of gene expression. However, current models, developed primarily for bulk omics data, fail to capture the cellular heterogeneity and dynamic processes revealed by single-cell multi-modal technologies. Here, we introduce scooby, the first model to predict scRNA-seq coverage and scATAC-seq insertion profiles along the genome from sequence at single-cell resolution. For this, we leverage the pre-trained multi-omics profile predictor Borzoi as a foundation model, equip it with a cell-specific decoder, and fine-tune its sequence embeddings. Specifically, we condition the decoder on the cell position in a precomputed single-cell embedding resulting in strong generalization capability. Applied to a hematopoiesis dataset, scooby recapitulates cell-specific expression levels of held-out genes and cells, and identifies regulators and their putative target genes through in silico motif deletion. Moreover, accurate variant effect prediction with scooby allows for breaking down bulk eQTL effects into single-cell effects and delineating their impact on chromatin accessibility and gene expression. We anticipate scooby to aid unraveling the complexities of gene regulation at the resolution of individual cells.
Collapse
|
3
|
Johnston KG, Grieco SF, Nie Q, Theis FJ, Xu X. Small data methods in omics: the power of one. Nat Methods 2024; 21:1597-1602. [PMID: 39174710 DOI: 10.1038/s41592-024-02390-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 07/24/2024] [Indexed: 08/24/2024]
Abstract
Over the last decade, biology has begun utilizing 'big data' approaches, resulting in large, comprehensive atlases in modalities ranging from transcriptomics to neural connectomics. However, these approaches must be complemented and integrated with 'small data' approaches to efficiently utilize data from individual labs. Integration of smaller datasets with major reference atlases is critical to provide context to individual experiments, and approaches toward integration of large and small data have been a major focus in many fields in recent years. Here we discuss progress in integration of small data with consortium-sized atlases across multiple modalities, and its potential applications. We then examine promising future directions for utilizing the power of small data to maximize the information garnered from small-scale experiments. We envision that, in the near future, international consortia comprising many laboratories will work together to collaboratively build reference atlases and foundation models using small data methods.
Collapse
Affiliation(s)
- Kevin G Johnston
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
- Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Steven F Grieco
- Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA, USA
- Center for Neural Circuit Mapping, University of California, Irvine, Irvine, CA, USA
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
| | - Fabian J Theis
- Helmholtz Center Munich-German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.
- School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany.
- Department of Mathematics, Technical University of Munich, Munich, Germany.
| | - Xiangmin Xu
- Department of Anatomy and Neurobiology, School of Medicine, University of California, Irvine, Irvine, CA, USA.
- Center for Neural Circuit Mapping, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
4
|
Metzner E, Southard KM, Norman TM. Multiome Perturb-seq unlocks scalable discovery of integrated perturbation effects on the transcriptome and epigenome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.26.605307. [PMID: 39091800 PMCID: PMC11291144 DOI: 10.1101/2024.07.26.605307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/04/2024]
Abstract
Single-cell CRISPR screens link genetic perturbations to transcriptional states, but high-throughput methods connecting these induced changes to their regulatory foundations are limited. Here we introduce Multiome Perturb-seq, extending single-cell CRISPR screens to simultaneously measure perturbation-induced changes in gene expression and chromatin accessibility. We apply Multiome Perturb-seq in a CRISPRi screen of 13 chromatin remodelers in human RPE-1 cells, achieving efficient assignment of sgRNA identities to single nuclei via an improved method for capturing barcode transcripts from nuclear RNA. We organize expression and accessibility measurements into coherent programs describing the integrated effects of perturbations on cell state, finding that ARID1A and SUZ12 knockdowns induce programs enriched for developmental features. Pseudotime analysis of perturbations connects accessibility changes to changes in gene expression, highlighting the value of multimodal profiling. Overall, our method provides a scalable and simply implemented system to dissect the regulatory logic underpinning cell state.
Collapse
Affiliation(s)
- Eli Metzner
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Kaden M. Southard
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Thomas M. Norman
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
5
|
Loers JU, Vermeirssen V. A single-cell multimodal view on gene regulatory network inference from transcriptomics and chromatin accessibility data. Brief Bioinform 2024; 25:bbae382. [PMID: 39207727 PMCID: PMC11359808 DOI: 10.1093/bib/bbae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/27/2024] [Accepted: 07/23/2024] [Indexed: 09/04/2024] Open
Abstract
Eukaryotic gene regulation is a combinatorial, dynamic, and quantitative process that plays a vital role in development and disease and can be modeled at a systems level in gene regulatory networks (GRNs). The wealth of multi-omics data measured on the same samples and even on the same cells has lifted the field of GRN inference to the next stage. Combinations of (single-cell) transcriptomics and chromatin accessibility allow the prediction of fine-grained regulatory programs that go beyond mere correlation of transcription factor and target gene expression, with enhancer GRNs (eGRNs) modeling molecular interactions between transcription factors, regulatory elements, and target genes. In this review, we highlight the key components for successful (e)GRN inference from (sc)RNA-seq and (sc)ATAC-seq data exemplified by state-of-the-art methods as well as open challenges and future developments. Moreover, we address preprocessing strategies, metacell generation and computational omics pairing, transcription factor binding site detection, and linear and three-dimensional approaches to identify chromatin interactions as well as dynamic and causal eGRN inference. We believe that the integration of transcriptomics together with epigenomics data at a single-cell level is the new standard for mechanistic network inference, and that it can be further advanced with integrating additional omics layers and spatiotemporal data, as well as with shifting the focus towards more quantitative and causal modeling strategies.
Collapse
Affiliation(s)
- Jens Uwe Loers
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| | - Vanessa Vermeirssen
- Lab for Computational Biology, Integromics and Gene Regulation (CBIGR), Cancer Research Institute Ghent (CRIG), Corneel Heymanslaan 10, 9000 Ghent, Belgium
- Department of Biomedical Molecular Biology, Ghent University, Zwijnaarde-Technologiepark 71, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, 9000 Ghent, Belgium
| |
Collapse
|
6
|
Soroczynski J, Anderson LJ, Yeung JL, Rendleman JM, Oren DA, Konishi HA, Risca VI. OpenTn5: Open-Source Resource for Robust and Scalable Tn5 Transposase Purification and Characterization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.11.602973. [PMID: 39026714 PMCID: PMC11257509 DOI: 10.1101/2024.07.11.602973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Tagmentation combines DNA fragmentation and sequencing adapter addition by leveraging the transposition activity of the bacterial cut-and-paste Tn5 transposase, to enable efficient sequencing library preparation. Here we present an open-source protocol for the generation of multi-purpose hyperactive Tn5 transposase, including its benchmarking in CUT&Tag, bulk and single-cell ATAC-seq. The OpenTn5 protocol yields multi-milligram quantities of pG-Tn5E54K, L372P protein per liter of E. coli culture, sufficient for thousands of tagmentation reactions and the enzyme retains activity in storage for more than a year.
Collapse
Affiliation(s)
- Jan Soroczynski
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| | - Lauren J. Anderson
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| | - Joanna L. Yeung
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| | - Justin M. Rendleman
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| | - Deena A. Oren
- Structural Biology Resource Center, The Rockefeller University, New York, NY
| | - Hide A. Konishi
- Laboratory of Chromosome and Cell Biology, The Rockefeller University, New York, NY
| | - Viviana I. Risca
- Laboratory of Genome Architecture and Dynamics, The Rockefeller University, New York, NY
| |
Collapse
|
7
|
Huey JD, Abdennur N. Bigtools: a high-performance BigWig and BigBed library in Rust. Bioinformatics 2024; 40:btae350. [PMID: 38837370 PMCID: PMC11167208 DOI: 10.1093/bioinformatics/btae350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/20/2024] [Accepted: 05/29/2024] [Indexed: 06/07/2024] Open
Abstract
MOTIVATION The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications. RESULTS Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python. AVAILABILITY AND IMPLEMENTATION Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io, Bioconda, and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.
Collapse
Affiliation(s)
- Jack D Huey
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, United States
- Diabetes Center of Excellence, UMass Chan Medical School, Worcester, MA 01605, United States
| | - Nezar Abdennur
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA 01605, United States
- Department of Systems Biology, UMass Chan Medical School, Worcester, MA 01605, United States
| |
Collapse
|
8
|
Zhang S, Shu H, Zhou J, Rubin-Sigler J, Yang X, Liu Y, Cooper-Knock J, Monte E, Zhu C, Tu S, Li H, Tong M, Ecker JR, Ichida JK, Shen Y, Zeng J, Tsao PS, Snyder MP. Deconvolution of polygenic risk score in single cells unravels cellular and molecular heterogeneity of complex human diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.14.594252. [PMID: 38798507 PMCID: PMC11118500 DOI: 10.1101/2024.05.14.594252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Polygenic risk scores (PRSs) are commonly used for predicting an individual's genetic risk of complex diseases. Yet, their implication for disease pathogenesis remains largely limited. Here, we introduce scPRS, a geometric deep learning model that constructs single-cell-resolved PRS leveraging reference single-cell chromatin accessibility profiling data to enhance biological discovery as well as disease prediction. Real-world applications across multiple complex diseases, including type 2 diabetes (T2D), hypertrophic cardiomyopathy (HCM), and Alzheimer's disease (AD), showcase the superior prediction power of scPRS compared to traditional PRS methods. Importantly, scPRS not only predicts disease risk but also uncovers disease-relevant cells, such as hormone-high alpha and beta cells for T2D, cardiomyocytes and pericytes for HCM, and astrocytes, microglia and oligodendrocyte progenitor cells for AD. Facilitated by a layered multi-omic analysis, scPRS further identifies cell-type-specific genetic underpinnings, linking disease-associated genetic variants to gene regulation within corresponding cell types. We substantiate the disease relevance of scPRS-prioritized HCM genes and demonstrate that the suppression of these genes in HCM cardiomyocytes is rescued by Mavacamten treatment. Additionally, we establish a novel microglia-specific regulatory relationship between the AD risk variant rs7922621 and its target genes ANXA11 and TSPAN14. We further illustrate the detrimental effects of suppressing these two genes on microglia phagocytosis. Our work provides a multi-tasking, interpretable framework for precise disease prediction and systematic investigation of the genetic, cellular, and molecular basis of complex diseases, laying the methodological foundation for single-cell genetics.
Collapse
Affiliation(s)
- Sai Zhang
- Department of Epidemiology, University of Florida, Gainesville, FL, USA
- Departments of Biostatistics & Biomedical Engineering, Genetics Institute, McKnight Brain Institute, University of Florida, Gainesville, FL, USA
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
- These authors contributed equally: Sai Zhang, Hantao Shu, and Jingtian Zhou
| | - Hantao Shu
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
- These authors contributed equally: Sai Zhang, Hantao Shu, and Jingtian Zhou
| | - Jingtian Zhou
- Arc Institute, Palo Alto, CA, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- These authors contributed equally: Sai Zhang, Hantao Shu, and Jingtian Zhou
| | - Jasper Rubin-Sigler
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, University of Southern California, Los Angeles, CA, USA
| | - Xiaoyu Yang
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Yuxi Liu
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Johnathan Cooper-Knock
- Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield, UK
| | - Emma Monte
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Chenchen Zhu
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Sharon Tu
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, University of Southern California, Los Angeles, CA, USA
| | - Han Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Mingming Tong
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Joseph R. Ecker
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
- Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA, USA
| | - Justin K. Ichida
- Department of Stem Cell Biology and Regenerative Medicine, Eli and Edythe Broad Center for Regenerative Medicine and Stem Cell Research, University of Southern California, Los Angeles, CA, USA
| | - Yin Shen
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Department of Neurology, Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA, USA
| | - Jianyang Zeng
- School of Engineering, Research Center for Industries of the Future, Westlake University, Hangzhou, Zhejiang, China
| | - Philip S. Tsao
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Michael P. Snyder
- Department of Genetics, Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
9
|
Li S, Li Y, Sun Y, Li Y, Chen X, Tang S, Chen S. EpiCarousel: memory- and time-efficient identification of metacells for atlas-level single-cell chromatin accessibility data. Bioinformatics 2024; 40:btae191. [PMID: 38588573 PMCID: PMC11037479 DOI: 10.1093/bioinformatics/btae191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/02/2024] [Accepted: 04/05/2024] [Indexed: 04/10/2024] Open
Abstract
SUMMARY Recent technical advancements in single-cell chromatin accessibility sequencing (scCAS) have brought new insights to the characterization of epigenetic heterogeneity. As single-cell genomics experiments scale up to hundreds of thousands of cells, the demand for computational resources for downstream analysis grows intractably large and exceeds the capabilities of most researchers. Here, we propose EpiCarousel, a tailored Python package based on lazy loading, parallel processing, and community detection for memory- and time-efficient identification of metacells, i.e. the emergence of homogenous cells, in large-scale scCAS data. Through comprehensive experiments on five datasets of various protocols, sample sizes, dimensions, number of cell types, and degrees of cell-type imbalance, EpiCarousel outperformed baseline methods in systematic evaluation of memory usage, computational time, and multiple downstream analyses including cell type identification. Moreover, EpiCarousel executes preprocessing and downstream cell clustering on the atlas-level dataset with 707 043 cells and 1 154 611 peaks within 2 h consuming <75 GB of RAM and provides superior performance for characterizing cell heterogeneity than state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION The EpiCarousel software is well-documented and freely available at https://github.com/biox-nku/epicarousel. It can be seamlessly interoperated with extensive scCAS analysis toolkits.
Collapse
Affiliation(s)
- Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yuxi Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Yu Sun
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Yaru Li
- Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Xiaoyang Chen
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| |
Collapse
|
10
|
Huey JD, Abdennur N. Bigtools: a high-performance BigWig and BigBed library in Rust. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.06.579187. [PMID: 38370777 PMCID: PMC10871241 DOI: 10.1101/2024.02.06.579187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
The BigWig and BigBed file formats were originally designed for the visualization of next-generation sequencing data through a genome browser. Due to their versatility, these formats have long since become ubiquitous for the storage of processed sequencing data and regularly serve as the basis for downstream data analysis. As the number and size of sequencing experiments continues to accelerate, there is an increasing demand to efficiently generate and query BigWig and BigBed files in a scalable and robust manner, and to efficiently integrate these functionalities into data analysis environments and third-party applications. Here, we present Bigtools, a feature-complete, high-performance, and integrable software library for generating and querying both BigWig and BigBed files. Bigtools is written in the Rust programming language and includes a flexible suite of command line tools as well as bindings to Python. Bigtools is cross-platform and released under the MIT license. It is distributed on Crates.io and the Python Package Index, and the source code is available at https://github.com/jackh726/bigtools.
Collapse
Affiliation(s)
- Jack D Huey
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, 01605, USA
| | - Nezar Abdennur
- Department of Genomics and Computational Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
- Department of Systems Biology, UMass Chan Medical School, Worcester, MA, 01605, USA
| |
Collapse
|
11
|
Lu C, Wei Y, Abbas M, Agula H, Wang E, Meng Z, Zhang R. Application of Single-Cell Assay for Transposase-Accessible Chromatin with High Throughput Sequencing in Plant Science: Advances, Technical Challenges, and Prospects. Int J Mol Sci 2024; 25:1479. [PMID: 38338756 PMCID: PMC10855595 DOI: 10.3390/ijms25031479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 01/16/2024] [Accepted: 01/23/2024] [Indexed: 02/12/2024] Open
Abstract
The Single-cell Assay for Transposase-Accessible Chromatin with high throughput sequencing (scATAC-seq) has gained increasing popularity in recent years, allowing for chromatin accessibility to be deciphered and gene regulatory networks (GRNs) to be inferred at single-cell resolution. This cutting-edge technology now enables the genome-wide profiling of chromatin accessibility at the cellular level and the capturing of cell-type-specific cis-regulatory elements (CREs) that are masked by cellular heterogeneity in bulk assays. Additionally, it can also facilitate the identification of rare and new cell types based on differences in chromatin accessibility and the charting of cellular developmental trajectories within lineage-related cell clusters. Due to technical challenges and limitations, the data generated from scATAC-seq exhibit unique features, often characterized by high sparsity and noise, even within the same cell type. To address these challenges, various bioinformatic tools have been developed. Furthermore, the application of scATAC-seq in plant science is still in its infancy, with most research focusing on root tissues and model plant species. In this review, we provide an overview of recent progress in scATAC-seq and its application across various fields. We first conduct scATAC-seq in plant science. Next, we highlight the current challenges of scATAC-seq in plant science and major strategies for cell type annotation. Finally, we outline several future directions to exploit scATAC-seq technologies to address critical challenges in plant science, ranging from plant ENCODE(The Encyclopedia of DNA Elements) project construction to GRN inference, to deepen our understanding of the roles of CREs in plant biology.
Collapse
Affiliation(s)
- Chao Lu
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Yunxiao Wei
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Mubashir Abbas
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Hasi Agula
- Key Laboratory of Herbage & Endemic Crop Biology, Ministry of Education, School of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, AB T2N 4N1, Canada
| | - Zhigang Meng
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| | - Rui Zhang
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China; (C.L.); (Y.W.)
| |
Collapse
|