1
|
Kim MC, Gate R, Lee DS, Tolopko A, Lu A, Gordon E, Shifrut E, Garcia-Nieto PE, Marson A, Ntranos V, Ye CJ. Method of moments framework for differential expression analysis of single-cell RNA sequencing data. Cell 2024; 187:6393-6410.e16. [PMID: 39454576 DOI: 10.1016/j.cell.2024.09.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 03/06/2024] [Accepted: 09/26/2024] [Indexed: 10/28/2024]
Abstract
Differential expression analysis of single-cell RNA sequencing (scRNA-seq) data is central for characterizing how experimental factors affect the distribution of gene expression. However, distinguishing between biological and technical sources of cell-cell variability and assessing the statistical significance of quantitative comparisons between cell groups remain challenging. We introduce Memento, a tool for robust and efficient differential analysis of mean expression, variability, and gene correlation from scRNA-seq data, scalable to millions of cells and thousands of samples. We applied Memento to 70,000 tracheal epithelial cells to identify interferon-responsive genes, 160,000 CRISPR-Cas9 perturbed T cells to reconstruct gene-regulatory networks, 1.2 million peripheral blood mononuclear cells (PBMCs) to map cell-type-specific quantitative trait loci (QTLs), and the 50-million-cell CELLxGENE Discover corpus to compare arbitrary cell groups. In all cases, Memento identified more significant and reproducible differences in mean expression compared with existing methods. It also identified differences in variability and gene correlation that suggest distinct transcriptional regulation mechanisms imparted by perturbations.
Collapse
Affiliation(s)
- Min Cheol Kim
- Medical Scientist Training Program, University of California, San Francisco, San Francisco, CA, USA; UC Berkeley-UCSF Graduate Program in Bioengineering, San Francisco, CA, USA; Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Rachel Gate
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - David S Lee
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Andrew Lu
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | - Erin Gordon
- Division of Pulmonary and Critical Care, University of California, San Francisco, San Francisco, CA, USA
| | - Eric Shifrut
- Diabetes Center, University of California, San Francisco, San Francisco, CA, USA; Division of Infectious Diseases, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | | | - Alexander Marson
- Division of Infectious Diseases, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA; Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA
| | - Vasilis Ntranos
- Diabetes Center, University of California, San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA; Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA; Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA; Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA; Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Gladstone-UCSF Institute of Genomic Immunology, San Francisco, CA, USA; Division of Rheumatology, Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
2
|
Petrany A, Chen R, Zhang S, Chen Y. Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data. Genome Res 2024; 34:1636-1650. [PMID: 39406498 PMCID: PMC11529838 DOI: 10.1101/gr.278843.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 09/10/2024] [Indexed: 11/01/2024]
Abstract
High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the P-value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions.
Collapse
Affiliation(s)
- Alicia Petrany
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, New Jersey 08028, USA
| | - Ruoyu Chen
- Moorestown High School, Moorestown, New Jersey 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, New Jersey 08028, USA;
| |
Collapse
|
3
|
Ghosh T, Baxter RM, Seal S, Lui VG, Rudra P, Vu T, Hsieh EW, Ghosh D. cytoKernel: Robust kernel embeddings for assessing differential expression of single cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.16.608287. [PMID: 39229233 PMCID: PMC11370373 DOI: 10.1101/2024.08.16.608287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
High-throughput sequencing of single-cell data can be used to rigorously evlauate cell specification and enable intricate variations between groups or conditions. Many popular existing methods for differential expression target differences in aggregate measurements (mean, median, sum) and limit their approaches to detect only global differential changes. We present a robust method for differential expression of single-cell data using a kernel-based score test, cytoKernel. cytoKernel is specifically designed to assess the differential expression of single cell RNA sequencing and high-dimensional flow or mass cytometry data using the full probability distribution pattern. cytoKernel is based on kernel embeddings which employs the probability distributions of the single cell data, by calculating the pairwise divergence/distance between distributions of subjects. It can detect both patterns involving aggregate changes, as well as more elusive variations that are often overlooked due to the multimodal characteristics of single cell data. We performed extensive benchmarks across both simulated and real data sets from mass cytometry data and single-cell RNA sequencing. The cytoKernel procedure effectively controls the False Discovery Rate (FDR) and shows favourable performance compared to existing methods. The method is able to identify more differential patterns than existing approaches. We apply cytoKernel to assess gene expression and protein marker expression differences from cell subpopulations in various publicly available single-cell RNAseq and mass cytometry data sets. The methods described in this paper are implemented in the open-source R package cytoKernel, which is freely available from Bioconductor at http://bioconductor.org/packages/cytoKernel.
Collapse
Affiliation(s)
- Tusharkanti Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Ryan M Baxter
- Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Souvik Seal
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC, USA
| | - Victor G Lui
- Center for Translational Immunology, Benaroya Research Institute at Virginia Mason, Seattle, WA, USA
| | - Pratyaydipta Rudra
- Department of Statistics, Oklahoma State University, Stillwater, OK, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Elena Wy Hsieh
- Department of Immunology and Microbiology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
4
|
Yan J, Zeng Q, Wang X. RankCompV3: a differential expression analysis algorithm based on relative expression orderings and applications in single-cell RNA transcriptomics. BMC Bioinformatics 2024; 25:259. [PMID: 39112940 PMCID: PMC11304794 DOI: 10.1186/s12859-024-05889-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 07/30/2024] [Indexed: 08/11/2024] Open
Abstract
BACKGROUND Effective identification of differentially expressed genes (DEGs) has been challenging for single-cell RNA sequencing (scRNA-seq) profiles. Many existing algorithms have high false positive rates (FPRs) and often fail to identify weak biological signals. RESULTS We present a novel method for identifying DEGs in scRNA-seq data called RankCompV3. It is based on the comparison of relative expression orderings (REOs) of gene pairs which are determined by comparing the expression levels of a pair of genes in a set of single-cell profiles. The numbers of genes with consistently higher or lower expression levels than the gene of interest are counted in two groups in comparison, respectively, and the result is tabulated in a 3 × 3 contingency table which is tested by McCullagh's method to determine if the gene is dysregulated. In both simulated and real scRNA-seq data, RankCompV3 tightly controlled the FPR and demonstrated high accuracy, outperforming 11 other common single-cell DEG detection algorithms. Analysis with either regular single-cell or synthetic pseudo-bulk profiles produced highly concordant DEGs with the ground-truth. In addition, RankCompV3 demonstrates higher sensitivity to weak biological signals than other methods. The algorithm was implemented using Julia and can be called in R. The source code is available at https://github.com/pathint/RankCompV3.jl . CONCLUSIONS The REOs-based algorithm is a valuable tool for analyzing single-cell RNA profiles and identifying DEGs with high accuracy and sensitivity.
Collapse
Affiliation(s)
- Jing Yan
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| | - Qiuhong Zeng
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China
| | - Xianlong Wang
- Department of Bioinformatics, Fujian Key Laboratory of Medical Bioinformatics, School of Medical Technology and Engineering, Fujian Medical University, Fuzhou, 350122, China.
- The Second Affiliated Hospital, Fujian Medical University, Quanzhou, 362000, China.
| |
Collapse
|
5
|
Biswas B, Kumar N, Sugimoto M, Hoque MA. scHD4E: Novel ensemble learning-based differential expression analysis method for single-cell RNA-sequencing data. Comput Biol Med 2024; 178:108769. [PMID: 38897145 DOI: 10.1016/j.compbiomed.2024.108769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/14/2024] [Accepted: 06/15/2024] [Indexed: 06/21/2024]
Abstract
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster's combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew's correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
Collapse
Affiliation(s)
- Biplab Biswas
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh; Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Nishith Kumar
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh.
| | - Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Md Aminul Hoque
- Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
6
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
7
|
Missarova A, Dann E, Rosen L, Satija R, Marioni J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol 2024; 25:189. [PMID: 39026254 PMCID: PMC11256449 DOI: 10.1186/s13059-024-03334-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
Single-cell RNA-sequencing enables testing for differential expression (DE) between conditions at a cell type level. While powerful, one of the limitations of such approaches is that the sensitivity of DE testing is dictated by the sensitivity of clustering, which is often suboptimal. To overcome this, we present miloDE-a cluster-free framework for DE testing (available as an open-source R package). We illustrate the performance of miloDE on both simulated and real data. Using miloDE, we identify a transient hemogenic endothelia-like state in mouse embryos lacking Tal1 and detect distinct programs during macrophage activation in idiopathic pulmonary fibrosis.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Leah Rosen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Rahul Satija
- Center for Genomics and Systems Biology, NYU, New York, USA.
- New York Genome Center, New York, USA.
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK.
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK.
| |
Collapse
|
8
|
Ozier-Lafontaine A, Fourneaux C, Durif G, Arsenteva P, Vallot C, Gandrillon O, Gonin-Giraud S, Michel B, Picard F. Kernel-based testing for single-cell differential analysis. Genome Biol 2024; 25:114. [PMID: 38702740 PMCID: PMC11069218 DOI: 10.1186/s13059-024-03255-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 04/22/2024] [Indexed: 05/06/2024] Open
Abstract
Single-cell technologies offer insights into molecular feature distributions, but comparing them poses challenges. We propose a kernel-testing framework for non-linear cell-wise distribution comparison, analyzing gene expression and epigenomic modifications. Our method allows feature-wise and global transcriptome/epigenome comparisons, revealing cell population heterogeneities. Using a classifier based on embedding variability, we identify transitions in cell states, overcoming limitations of traditional single-cell analysis. Applied to single-cell ChIP-Seq data, our approach identifies untreated breast cancer cells with an epigenomic profile resembling persister cells. This demonstrates the effectiveness of kernel testing in uncovering subtle population variations that might be missed by other methods.
Collapse
Affiliation(s)
- A Ozier-Lafontaine
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France.
| | - C Fourneaux
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - G Durif
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - P Arsenteva
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France
| | - C Vallot
- CNRS UMR3244, Institut Curie, PSL University, Paris, France
- Translational Research Department, Institut Curie, PSL University, Paris, France
| | - O Gandrillon
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - S Gonin-Giraud
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - B Michel
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France.
| | - F Picard
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
9
|
Wu Q, Gu Z, Shang B, Wan D, Zhang Q, Zhang X, Xie P, Cheng S, Zhang W, Zhang K. Circulating tumor cell clustering modulates RNA splicing and polyadenylation to facilitate metastasis. Cancer Lett 2024; 588:216757. [PMID: 38417668 DOI: 10.1016/j.canlet.2024.216757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/07/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024]
Abstract
Circulating tumor cell (CTC) clusters exhibit significantly higher metastatic potential compared to single CTCs. However, the underlying mechanism behind this phenomenon remains unclear, and the role of posttranscriptional RNA regulation in CTC clusters has not been explored. Here, we conducted a comparative analysis of alternative splicing (AS) and alternative polyadenylation (APA) profiles between single CTCs and CTC clusters. We identified 994 and 836 AS events in single CTCs and CTC clusters, respectively, with ∼20% of AS events showing differential regulation between the two cell types. A key event in this differential splicing was observed in SRSF6, which disrupted AS profiles and contributed to the increased malignancy of CTC clusters. Regarding APA, we found a global lengthening of 3' UTRs in CTC clusters compared to single CTCs. This alteration was primarily governed by 14 core APA factors, particularly PPP1CA. The modified APA profiles facilitated the cell cycle progression of CTC clusters and indicated their reduced susceptibility to oxidative stress. Further investigation revealed that the proportion of H2AFY mRNA with long 3' UTR instead of short 3' UTR was higher in CTC clusters than single CTCs. The AU-rich elements (AREs) within the long 3' UTR of H2AFY mRNA enhance mRNA stability and translation activity, resulting in promoting cell proliferation and invasion, which potentially facilitate the establishment and rapid formation of metastatic tumors mediated by CTC clusters. These findings provide new insights into the mechanisms driving CTC cluster metastasis.
Collapse
Affiliation(s)
- Quanyou Wu
- Division of Abdominal Cancer, Department of Medical Oncology, Cancer Center and Laboratory of Molecular Targeted Therapy in Oncology, West China Hospital, Sichuan University, Chengdu, Sichuan Province, 610041, China; State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Zhaoru Gu
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Bingqing Shang
- Department of Urology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Duo Wan
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qi Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Xiaoli Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Peipei Xie
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Shujun Cheng
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| | - Wen Zhang
- Department of Immunology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| | - Kaitai Zhang
- State Key Laboratory of Molecular Oncology, Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
10
|
Guo X, Ning J, Chen Y, Liu G, Zhao L, Fan Y, Sun S. Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies. Brief Funct Genomics 2024; 23:95-109. [PMID: 37022699 DOI: 10.1093/bfgp/elad011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 12/09/2022] [Accepted: 03/10/2023] [Indexed: 04/07/2023] Open
Abstract
Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.
Collapse
Affiliation(s)
- Xiya Guo
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Jin Ning
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yuanze Chen
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Guoliang Liu
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Liyan Zhao
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Yue Fan
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| | - Shiquan Sun
- School of Public Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
- Key Laboratory of Trace Elements and Endemic Diseases, Center for Single Cell Omics and Health, Xi'an Jiaotong University, Xi'an, Shaanxi 710061, P.R. China
| |
Collapse
|
11
|
Ding J, Liu R, Wen H, Tang W, Li Z, Venegas J, Su R, Molho D, Jin W, Wang Y, Lu Q, Li L, Zuo W, Chang Y, Xie Y, Tang J. DANCE: a deep learning library and benchmark platform for single-cell analysis. Genome Biol 2024; 25:72. [PMID: 38504331 PMCID: PMC10949782 DOI: 10.1186/s13059-024-03211-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024] Open
Abstract
DANCE is the first standard, generic, and extensible benchmark platform for accessing and evaluating computational methods across the spectrum of benchmark datasets for numerous single-cell analysis tasks. Currently, DANCE supports 3 modules and 8 popular tasks with 32 state-of-art methods on 21 benchmark datasets. People can easily reproduce the results of supported algorithms across major benchmark datasets via minimal efforts, such as using only one command line. In addition, DANCE provides an ecosystem of deep learning architectures and tools for researchers to facilitate their own model development. DANCE is an open-source Python package that welcomes all kinds of contributions.
Collapse
Affiliation(s)
- Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Hongzhi Wen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Wenzhuo Tang
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Zhaoheng Li
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Julian Venegas
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Runze Su
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Dylan Molho
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Wei Jin
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Yixin Wang
- Department of Bioengineering, Stanford University, Palo Alto, USA
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Lingxiao Li
- Department of Computer Science, Boston University, Boston, USA
| | - Wangyang Zuo
- Department of Computer Science, Zhejiang University of Technology, Zhejiang, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuying Xie
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, USA.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| |
Collapse
|
12
|
Lin KZ, Qiu Y, Roeder K. eSVD-DE: cohort-wide differential expression in single-cell RNA-seq data using exponential-family embeddings. BMC Bioinformatics 2024; 25:113. [PMID: 38486150 PMCID: PMC10941434 DOI: 10.1186/s12859-024-05724-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/28/2024] [Indexed: 03/17/2024] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA) datasets are becoming increasingly popular in clinical and cohort studies, but there is a lack of methods to investigate differentially expressed (DE) genes among such datasets with numerous individuals. While numerous methods exist to find DE genes for scRNA data from limited individuals, differential-expression testing for large cohorts of case and control individuals using scRNA data poses unique challenges due to substantial effects of human variation, i.e., individual-level confounding covariates that are difficult to account for in the presence of sparsely-observed genes. RESULTS We develop the eSVD-DE, a matrix factorization that pools information across genes and removes confounding covariate effects, followed by a novel two-sample test in mean expression between case and control individuals. In general, differential testing after dimension reduction yields an inflation of Type-1 errors. However, we overcome this by testing for differences between the case and control individuals' posterior mean distributions via a hierarchical model. In previously published datasets of various biological systems, eSVD-DE has more accuracy and power compared to other DE methods typically repurposed for analyzing cohort-wide differential expression. CONCLUSIONS eSVD-DE proposes a novel and powerful way to test for DE genes among cohorts after performing a dimension reduction. Accurate identification of differential expression on the individual level, instead of the cell level, is important for linking scRNA-seq studies to our understanding of the human population.
Collapse
Affiliation(s)
- Kevin Z Lin
- Department of Biostatistics, University of Washington, Seattle, WA, USA.
| | - Yixuan Qiu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, People's Republic of China
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
13
|
Yoshitake R, Mori H, Ha D, Wu X, Wang J, Wang X, Saeki K, Chang G, Shim HJ, Chan Y, Chen S. Molecular features of luminal breast cancer defined through spatial and single-cell transcriptomics. Clin Transl Med 2024; 14:e1548. [PMID: 38282415 PMCID: PMC10823285 DOI: 10.1002/ctm2.1548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 12/28/2023] [Accepted: 01/06/2024] [Indexed: 01/30/2024] Open
Abstract
BACKGROUND Intratumour heterogeneity is a hallmark of most solid tumours, including breast cancers. We applied spatial transcriptomics and single-cell RNA-sequencing on patient-derived xenografts (PDXs) to profile spatially resolved cell populations within oestrogen receptor-positive (ER+ ) breast cancer and to elucidate their importance in oestrogen-dependent tumour growth. METHODS Two PDXs of 'ER-high' breast cancers with opposite oestrogen-mediated growth responses were investigated: oestrogen-suppressed GS3 (80-100% ER) and oestrogen-dependent SC31 (40-90% ER) models. The observation was validated via single-cell analyses on an 'ER-low' PDX, GS1 (5% ER). The results from our spatial and single-cell analyses were further supported by a public ER+ breast cancer single-cell dataset and protein-based dual immunohistochemistry (IHC) of SC31 examining important luminal cancer markers (i.e., ER, progesterone receptor and Ki67). The translational implication of our findings was assessed by clinical outcome analyses on publicly available cohorts. RESULTS Our space-gene-function study revealed four spatially distinct compartments within ER+ breast cancers. These compartments showed functional diversity (oestrogen-responsive, proliferative, hypoxia-induced and inflammation-related). The 'proliferative' population, rather than the 'oestrogen-responsive' compartment, was crucial for oestrogen-dependent tumour growth, leading to the acquisition of luminal B-like features. The cells expressing typical oestrogen-responsive genes like PGR were not directly linked to oestrogen-dependent proliferation. Dual IHC analyses demonstrated the distinct contribution of the Ki67+ proliferative cells toward oestrogen-mediated growth and their response to a CDK4/6 inhibitor. The gene signatures derived from the proliferative, hypoxia-induced and inflammation-related compartments were significantly correlated with worse clinical outcomes, while patients with the oestrogen-responsive signature showed better prognoses, suggesting that this compartment would not be directly associated with oestrogen-dependent tumour progression. CONCLUSIONS Our study identified the gene signature in our 'proliferative' compartment as an important determinant of luminal cancer subtypes. This 'proliferative' cell population is a causative feature of luminal B breast cancer, contributing toward its aggressive behaviours.
Collapse
Affiliation(s)
- Ryohei Yoshitake
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Hitomi Mori
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
- Department of Surgery and OncologyGraduate School of Medicine, Kyushu UniversityFukuokaJapan
| | - Desiree Ha
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Xiwei Wu
- Integrative Genomics CoreBeckman Research Institute of City of HopeMonroviaCaliforniaUSA
| | - Jinhui Wang
- Integrative Genomics CoreBeckman Research Institute of City of HopeMonroviaCaliforniaUSA
| | - Xiaoqiang Wang
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Kohei Saeki
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
- Faculty of Veterinary MedicineOkayama University of ScienceImabariEhimeJapan
| | - Gregory Chang
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Hyun Jeong Shim
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Yin Chan
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| | - Shiuan Chen
- Department of Cancer Biology and Molecular MedicineBeckman Research Institute of City of HopeDuarteCaliforniaUSA
| |
Collapse
|
14
|
Motomura K, Matsuzaka T, Shichino S, Ogawa T, Pan H, Nakajima T, Asano Y, Okayama T, Takeuchi T, Ohno H, Han SI, Miyamoto T, Takeuchi Y, Sekiya M, Sone H, Yahagi N, Nakagawa Y, Oda T, Ueha S, Ikeo K, Ogura A, Matsushima K, Shimano H. Single-Cell Transcriptome Profiling of Pancreatic Islets From Early Diabetic Mice Identifies Anxa10 for Ca2+ Allostasis Toward β-Cell Failure. Diabetes 2024; 73:75-92. [PMID: 37871012 PMCID: PMC10784657 DOI: 10.2337/db23-0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 10/10/2023] [Indexed: 10/25/2023]
Abstract
Type 2 diabetes is a progressive disorder denoted by hyperglycemia and impaired insulin secretion. Although a decrease in β-cell function and mass is a well-known trigger for diabetes, the comprehensive mechanism is still unidentified. Here, we performed single-cell RNA sequencing of pancreatic islets from prediabetic and diabetic db/db mice, an animal model of type 2 diabetes. We discovered a diabetes-specific transcriptome landscape of endocrine and nonendocrine cell types with subpopulations of β- and α-cells. We recognized a new prediabetic gene, Anxa10, that was induced by and regulated Ca2+ influx from metabolic stresses. Anxa10-overexpressed β-cells displayed suppression of glucose-stimulated intracellular Ca2+ elevation and potassium-induced insulin secretion. Pseudotime analysis of β-cells predicted that this Ca2+-surge responder cluster would proceed to mitochondria dysfunction and endoplasmic reticulum stress. Other trajectories comprised dedifferentiation and transdifferentiation, emphasizing acinar-like cells in diabetic islets. Altogether, our data provide a new insight into Ca2+ allostasis and β-cell failure processes. ARTICLE HIGHLIGHTS The transcriptome of single-islet cells from healthy, prediabetic, and diabetic mice was studied. Distinct β-cell heterogeneity and islet cell-cell network in prediabetes and diabetes were found. A new prediabetic β-cell marker, Anxa10, regulates intracellular Ca2+ and insulin secretion. Diabetes triggers β-cell to acinar cell transdifferentiation.
Collapse
Affiliation(s)
- Kaori Motomura
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Takashi Matsuzaka
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
- Transborder Medical Research Center, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Shigeyuki Shichino
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Tatsuro Ogawa
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Hao Pan
- Department of Bio-Science, Nagahama Institute of BioScience and Technology, Nagahama, Shiga, Japan
| | - Takuya Nakajima
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Yasuhito Asano
- Faculty of Information Networking for Innovation and Design, Toyo University, Tokyo, Japan
| | - Toshitsugu Okayama
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Tomoyo Takeuchi
- Tsukuba Human Tissue Biobank Center, University of Tsukuba Hospital, Ibaraki, Japan
| | - Hiroshi Ohno
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Song-iee Han
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Takafumi Miyamoto
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Yoshinori Takeuchi
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Motohiro Sekiya
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Hirohito Sone
- Department of Hematology, Endocrinology and Metabolism, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
| | - Naoya Yahagi
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Yoshimi Nakagawa
- Division of Complex Biosystem Research, Department of Research and Development, Institute of Natural Medicine, University of Toyama, Toyama, Japan
| | - Tatsuya Oda
- Department of Gastrointestinal and Hepatobiliary Pancreatic Surgery, Faculty of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
| | - Satoshi Ueha
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Kazuho Ikeo
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | - Atsushi Ogura
- Department of Bio-Science, Nagahama Institute of BioScience and Technology, Nagahama, Shiga, Japan
| | - Kouji Matsushima
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Noda, Japan
| | - Hitoshi Shimano
- Department of Endocrinology and Metabolism, Institute of Medicine, University of Tsukuba, Tsukuba, Ibaraki, Japan
- International Institute for Integrative Sleep Medicine (WPI-IIIS), University of Tsukuba, Ibaraki, Japan
- Life Science Center of Tsukuba Advanced Research Alliance, University of Tsukuba, Tsukuba, Ibaraki, Japan
| |
Collapse
|
15
|
Gilis J, Perin L, Malfait M, Van den Berge K, Takele Assefa A, Verbist B, Risso D, Clement L. Differential detection workflows for multi-sample single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.17.572043. [PMID: 38187695 PMCID: PMC10769270 DOI: 10.1101/2023.12.17.572043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
In single-cell transcriptomics, differential gene expression (DE) analyses typically focus on testing differences in the average expression of genes between cell types or conditions of interest. Single-cell transcriptomics, however, also has the promise to prioritise genes for which the expression differ in other aspects of the distribution. Here we develop a workflow for assessing differential detection (DD), which tests for differences in the average fraction of samples or cells in which a gene is detected. After benchmarking eight different DD data analysis strategies, we provide a unified workflow for jointly assessing DE and DD. Using simulations and two case studies, we show that DE and DD analysis provide complementary information, both in terms of the individual genes they report and in the functional interpretation of those genes.
Collapse
Affiliation(s)
- Jeroen Gilis
- These authors contributed equally
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
- Data Mining and Modeling for Biomedicine, VIB Flemish Institute for Biotechnology, Ghent, 9000, Belgium
| | - Laura Perin
- These authors contributed equally
- Department of Statistical Sciences, University of Padova, Padova, Italy
| | - Milan Malfait
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
| | - Koen Van den Berge
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Alemu Takele Assefa
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Bie Verbist
- Statistics and Decision Sciences, Johnson and Johnson Innovative Medicine, Beerse, Belgium
| | - Davide Risso
- Department of Statistical Sciences, University of Padova, Padova, Italy
- Padua Center for Network Medicine, University of Padova, Padova, Italy
| | - Lieven Clement
- Applied Mathematics, Computer science and Statistics, Ghent University, Ghent, 9000, Belgium
- Bioinformatics Institute, Ghent University, Ghent, 9000, Belgium
| |
Collapse
|
16
|
Qu Y, Lim JJY, An O, Yang H, Toh YC, Chua JJE. FEZ1 participates in human embryonic brain development by modulating neuronal progenitor subpopulation specification and migrations. iScience 2023; 26:108497. [PMID: 38213789 PMCID: PMC10783620 DOI: 10.1016/j.isci.2023.108497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 09/13/2023] [Accepted: 11/17/2023] [Indexed: 01/13/2024] Open
Abstract
Mutations in the human fasciculation and elongation protein zeta 1 (FEZ1) gene are found in schizophrenia and Jacobsen syndrome patients. Here, using human cerebral organoids (hCOs), we show that FEZ1 expression is turned on early during brain development and is detectable in both neuroprogenitor subtypes and immature neurons. FEZ1 deletion disrupts expression of neuronal and synaptic development genes. Using single-cell RNA sequencing, we detected abnormal expansion of homeodomain-only protein homeobox (HOPX)- outer radial glia (oRG), concurrent with a reduction of HOPX+ oRG, in FEZ1-null hCOs. HOPX- oRGs show higher cell mobility as compared to HOPX+ oRGs. Ectopic localization of neuroprogenitors to the outer layer is seen in FEZ1-null hCOs. Anomalous encroachment of TBR2+ intermediate progenitors into CTIP2+ deep layer neurons further indicated abnormalities in cortical layer formation these hCOs. Collectively, our findings highlight the involvement of FEZ1 in early cortical brain development and how it contributes to neurodevelopmental disorders.
Collapse
Affiliation(s)
- Yinghua Qu
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Department of Biomedical Engineering, National University of Singapore, Singapore 117583, Singapore
| | - Jonathan Jun-Yong Lim
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Healthy Longevity Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- LSI Neurobiology Programme, National University of Singapore, Singapore 117456, Singapore
| | - Omer An
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Henry Yang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore 117599, Singapore
| | - Yi-Chin Toh
- Department of Biomedical Engineering, National University of Singapore, Singapore 117583, Singapore
- School of Mechanical, Medical and Process Engineering, Queensland University of Technology, Brisbane, QLD 4059, Australia
- Centre for Biomedical Technologies, Queensland University of Technology, Brisbane, QLD 4059, Australia
| | - John Jia En Chua
- Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- Healthy Longevity Translational Research Program, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117456, Singapore
- LSI Neurobiology Programme, National University of Singapore, Singapore 117456, Singapore
- Institute for Molecular and Cell Biology, A∗STAR, Singapore 138473, Singapore
| |
Collapse
|
17
|
Huang H, Liu C, Wagle MM, Yang P. Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis. Genome Biol 2023; 24:259. [PMID: 37950331 PMCID: PMC10638755 DOI: 10.1186/s13059-023-03100-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 10/24/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Feature selection is an essential task in single-cell RNA-seq (scRNA-seq) data analysis and can be critical for gene dimension reduction and downstream analyses, such as gene marker identification and cell type classification. Most popular methods for feature selection from scRNA-seq data are based on the concept of differential distribution wherein a statistical model is used to detect changes in gene expression among cell types. Recent development of deep learning-based feature selection methods provides an alternative approach compared to traditional differential distribution-based methods in that the importance of a gene is determined by neural networks. RESULTS In this work, we explore the utility of various deep learning-based feature selection methods for scRNA-seq data analysis. We sample from Tabula Muris and Tabula Sapiens atlases to create scRNA-seq datasets with a range of data properties and evaluate the performance of traditional and deep learning-based feature selection methods for cell type classification, feature selection reproducibility and diversity, and computational time. CONCLUSIONS Our study provides a reference for future development and application of deep learning-based feature selection methods for single-cell omics data analyses.
Collapse
Affiliation(s)
- Hao Huang
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Manoj M Wagle
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia
| | - Pengyi Yang
- Computational Systems Biology Unit, Faculty of Medicine and Health, Children's Medical Research Institute, University of Sydney, Westmead, NSW, 2145, Australia.
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, Camperdown, NSW, 2006, Australia.
- Sydney Precision Data Science Centre, University of Sydney, Camperdown, NSW, 2006, Australia.
- Charles Perkins Centre, University of Sydney, Camperdown, NSW, 2006, Australia.
| |
Collapse
|
18
|
Zheng H, Vijg J, Fard AT, Mar JC. Measuring cell-to-cell expression variability in single-cell RNA-sequencing data: a comparative analysis and applications to B cell aging. Genome Biol 2023; 24:238. [PMID: 37864221 PMCID: PMC10588274 DOI: 10.1186/s13059-023-03036-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 08/11/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies enable the capture of gene expression heterogeneity and consequently facilitate the study of cell-to-cell variability at the cell type level. Although different methods have been proposed to quantify cell-to-cell variability, it is unclear what the optimal statistical approach is, especially in light of challenging data structures that are unique to scRNA-seq data like zero inflation. RESULTS We systematically evaluate the performance of 14 different variability metrics that are commonly applied to transcriptomic data for measuring cell-to-cell variability. Leveraging simulations and real datasets, we benchmark the metric performance based on data-specific features, sparsity and sequencing platform, biological properties, and the ability to recapitulate true levels of biological variability based on known gene sets. Next, we use scran, the metric with the strongest all-round performance, to investigate changes in cell-to-cell variability that occur during B cell differentiation and the aging processes. The analysis of primary cell types from hematopoietic stem cells (HSCs) and B lymphopoiesis reveals unique gene signatures with consistent patterns of variable and stable expression profiles during B cell differentiation which highlights the significance of these methods. Identifying differentially variable genes between young and old cells elucidates the regulatory changes that may be overlooked by solely focusing on mean expression changes and we investigate this in the context of regulatory networks. CONCLUSIONS We highlight the importance of capturing cell-to-cell gene expression variability in a complex biological process like differentiation and aging and emphasize the value of these findings at the level of individual cell types.
Collapse
Affiliation(s)
- Huiwen Zheng
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
19
|
Liu Y, Zhao J, Adams TS, Wang N, Schupp JC, Wu W, McDonough JE, Chupp GL, Kaminski N, Wang Z, Yan X. iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects. BMC Bioinformatics 2023; 24:318. [PMID: 37608264 PMCID: PMC10463720 DOI: 10.1186/s12859-023-05432-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 07/18/2023] [Indexed: 08/24/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis. RESULTS We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from the same tissue and disease showed that the results of iDESC achieved the best consistency between datasets and the best disease relevance. CONCLUSIONS iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.
Collapse
Affiliation(s)
- Yunqing Liu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jiayi Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Taylor S Adams
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Ningya Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
| | - Jonas C Schupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
- Department of Respiratory Medicine, Hannover Medical School and Biomedical Research in End-Stage and Obstructive Lung Disease Hannover, German Center for Lung Research (DZL), Hannover, Germany
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA
- Meta Platforms, Inc, Cambridge, USA
| | - John E McDonough
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Geoffrey L Chupp
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
| | - Xiting Yan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, 06520, USA.
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, 06520, USA.
| |
Collapse
|
20
|
Vandenbon A, Diez D. A universal tool for predicting differentially active features in single-cell and spatial genomics data. Sci Rep 2023; 13:11830. [PMID: 37481581 PMCID: PMC10363154 DOI: 10.1038/s41598-023-38965-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/18/2023] [Indexed: 07/24/2023] Open
Abstract
With the growing complexity of single-cell and spatial genomics data, there is an increasing importance of unbiased and efficient exploratory data analysis tools. One common exploratory data analysis step is the prediction of genes with different levels of activity in a subset of cells or locations inside a tissue. We previously developed singleCellHaystack, a method for predicting differentially expressed genes from single-cell transcriptome data, without relying on comparisons between clusters of cells. Here we present an update to singleCellHaystack, which is now a universally applicable method for predicting differentially active features: (1) singleCellHaystack now accepts continuous features that can be RNA or protein expression, chromatin accessibility or module scores from single-cell, spatial and even bulk genomics data, and (2) it can handle 1D trajectories, 2-3D spatial coordinates, as well as higher-dimensional latent spaces as input coordinates. Performance has been drastically improved, with up to ten times reduction in computational time and scalability to millions of cells, making singleCellHaystack a suitable tool for exploratory analysis of atlas level datasets. singleCellHaystack is available as packages in both R and Python.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Institute for Life and Medical Sciences, Kyoto University, 53 Shougoin Kawahara-cho, Sakyo-ku, Kyoto, 606-8507, Japan.
- Institute for Liberal Arts and Sciences, Kyoto University, Yoshidanihonmatsu-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Diego Diez
- Immunology Frontier Research Center, Osaka University, 3-1, Yamada-oka, Suita, Osaka, 565-0871, Japan
- Open and Transdisciplinary Research Institute (OTRI), Osaka University, 1-1, Yamada-oka, Suita, Osaka, 565-0871, Japan
| |
Collapse
|
21
|
Pan Y, Landis JT, Moorad R, Wu D, Marron JS, Dittmer DP. The Poisson distribution model fits UMI-based single-cell RNA-sequencing data. BMC Bioinformatics 2023; 24:256. [PMID: 37330471 PMCID: PMC10276395 DOI: 10.1186/s12859-023-05349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/24/2023] [Indexed: 06/19/2023] Open
Abstract
BACKGROUND Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. RESULTS We avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. CONCLUSIONS This new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson.
Collapse
Affiliation(s)
- Yue Pan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Justin T Landis
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Razia Moorad
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Di Wu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Adam School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - J S Marron
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, USA
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, USA
| | - Dirk P Dittmer
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, USA.
- Department of Microbiology and Immunology, University of North Carolina at Chapel Hill, Chapel Hill, USA.
| |
Collapse
|
22
|
Boyeau P, Regier J, Gayoso A, Jordan MI, Lopez R, Yosef N. An empirical Bayes method for differential expression analysis of single cells with deep generative models. Proc Natl Acad Sci U S A 2023; 120:e2209124120. [PMID: 37192164 PMCID: PMC10214125 DOI: 10.1073/pnas.2209124120] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 01/23/2023] [Indexed: 05/18/2023] Open
Abstract
Detecting differentially expressed genes is important for characterizing subpopulations of cells. In scRNA-seq data, however, nuisance variation due to technical factors like sequencing depth and RNA capture efficiency obscures the underlying biological signal. Deep generative models have been extensively applied to scRNA-seq data, with a special focus on embedding cells into a low-dimensional latent space and correcting for batch effects. However, little attention has been paid to the problem of utilizing the uncertainty from the deep generative model for differential expression (DE). Furthermore, the existing approaches do not allow for controlling for effect size or the false discovery rate (FDR). Here, we present lvm-DE, a generic Bayesian approach for performing DE predictions from a fitted deep generative model, while controlling the FDR. We apply the lvm-DE framework to scVI and scSphere, two deep generative models. The resulting approaches outperform state-of-the-art methods at estimating the log fold change in gene expression levels as well as detecting differentially expressed genes between subpopulations of cells.
Collapse
Affiliation(s)
- Pierre Boyeau
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA74720
| | - Jeffrey Regier
- Department of Statistics, University of Michigan, Ann Arbor, MI48109
| | - Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, CA94720
| | - Michael I. Jordan
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA74720
- Center for Computational Biology, University of California, Berkeley, CA94720
- Department of Statistics, University of California, Berkeley, CA94720
| | - Romain Lopez
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA74720
| | - Nir Yosef
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA74720
- Center for Computational Biology, University of California, Berkeley, CA94720
- Department of Systems Immunology, Weizmann Institute of Science, Rehovot76100, Israel
| |
Collapse
|
23
|
Crowell HL, Morillo Leonardo SX, Soneson C, Robinson MD. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol 2023; 24:62. [PMID: 36991470 PMCID: PMC10061781 DOI: 10.1186/s13059-023-02904-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/20/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant-on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. RESULTS Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. CONCLUSIONS Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.
Collapse
Affiliation(s)
- Helena L Crowell
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | | | - Charlotte Soneson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland
- Current address: Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
| |
Collapse
|
24
|
Missarova A, Dann E, Rosen L, Satija R, Marioni J. Sensitive cluster-free differential expression testing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.08.531744. [PMID: 36945506 PMCID: PMC10028920 DOI: 10.1101/2023.03.08.531744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2023]
Abstract
Comparing molecular features, including the identification of genes with differential expression (DE) between conditions, is a powerful approach for characterising disease-specific phenotypes. When testing for DE in single-cell RNA sequencing data, current pipelines first assign cells into discrete clusters (or cell types), followed by testing for differences within each cluster. Consequently, the sensitivity and specificity of DE testing are limited and ultimately dictated by the granularity of the cell type annotation, with discrete clustering being especially suboptimal for continuous trajectories. To overcome these limitations, we present miloDE - a cluster-free framework for differential expression testing. We build on the Milo approach, introduced for differential cell abundance testing, which leverages the graph representation of single-cell data to assign relatively homogenous, 'neighbouring' cells into overlapping neighbourhoods. We address key differences between differential abundance and expression testing at the level of neighbourhood assignment, statistical testing, and multiple testing correction. To illustrate the performance of miloDE we use both simulations and real data, in the latter case identifying a transient haemogenic endothelia-like state in chimeric mouse embryos lacking Tal1 as well as uncovering distinct transcriptional programs that characterise changes in macrophages in patients with Idiopathic Pulmonary Fibrosis. miloDE is available as an open-source R package at https://github.com/MarioniLab/miloDE.
Collapse
Affiliation(s)
- Alsu Missarova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Emma Dann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Leah Rosen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Rahul Satija
- Center for Genomics and Systems Biology, NYU
- New York Genome Center
| | - John Marioni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genentech, South San Francisco, CA, USA
| |
Collapse
|
25
|
Yoshimatsu S, Nakajima M, Sonn I, Natsume R, Sakimura K, Nakatsukasa E, Sasaoka T, Nakamura M, Serizawa T, Sato T, Sasaki E, Deng H, Okano H. Attempts for deriving extended pluripotent stem cells from common marmoset embryonic stem cells. Genes Cells 2023; 28:156-169. [PMID: 36530170 DOI: 10.1111/gtc.13000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Revised: 12/13/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022]
Abstract
Extended pluripotent stem cells (EPSCs) derived from mice and humans showed an enhanced potential for chimeric formation. By exploiting transcriptomic approaches, we assessed the differences in gene expression profile between extended EPSCs derived from mice and humans, and those newly derived from the common marmoset (marmoset; Callithrix jacchus). Although the marmoset EPSC-like cells displayed a unique colony morphology distinct from murine and human EPSCs, they displayed a pluripotent state akin to embryonic stem cells (ESCs), as confirmed by gene expression and immunocytochemical analyses of pluripotency markers and three-germ-layer differentiation assay. Importantly, the marmoset EPSC-like cells showed interspecies chimeric contribution to mouse embryos, such as E6.5 blastocysts in vitro and E6.5 epiblasts in vivo in mouse development. Also, we discovered that the perturbation of gene expression of the marmoset EPSC-like cells from the original ESCs resembled that of human EPSCs. Taken together, our multiple analyses evaluated the efficacy of the method for the derivation of marmoset EPSCs.
Collapse
Affiliation(s)
- Sho Yoshimatsu
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan.,Laboratory for Marmoset Neural Architecture, RIKEN Center for Brain Science, Saitama, Japan
| | - Mayutaka Nakajima
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan
| | - Iki Sonn
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan
| | - Rie Natsume
- Department of Animal Model Development, Brain Research Institute, Niigata University, Niigata, Japan
| | - Kenji Sakimura
- Department of Animal Model Development, Brain Research Institute, Niigata University, Niigata, Japan
| | - Ena Nakatsukasa
- Department of Animal Model Development, Brain Research Institute, Niigata University, Niigata, Japan
| | - Toshikuni Sasaoka
- Department of Animal Model Development, Brain Research Institute, Niigata University, Niigata, Japan
| | - Mari Nakamura
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan
| | - Takashi Serizawa
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan
| | - Tsukika Sato
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan
| | - Erika Sasaki
- Laboratory for Marmoset Neural Architecture, RIKEN Center for Brain Science, Saitama, Japan.,Department of Marmoset Biology and Medicine, Central Institute for Experimental Animals, Kanagawa, Japan
| | - Hongkui Deng
- Stem Cell Research Center, Peking University, Beijing, China
| | - Hideyuki Okano
- Department of Physiology, Keio University School of Medicine, Tokyo, Japan.,Laboratory for Marmoset Neural Architecture, RIKEN Center for Brain Science, Saitama, Japan
| |
Collapse
|
26
|
Deng W, Li B, Wang J, Jiang W, Yan X, Li N, Vukmirovic M, Kaminski N, Wang J, Zhao H. A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy. Brief Bioinform 2023; 24:bbac616. [PMID: 36631398 PMCID: PMC9851324 DOI: 10.1093/bib/bbac616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/14/2022] [Indexed: 01/13/2023] Open
Abstract
Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.
Collapse
Affiliation(s)
- Wenxuan Deng
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Bolun Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Jiawei Wang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Xiting Yan
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Ningshan Li
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Milica Vukmirovic
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jing Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| |
Collapse
|
27
|
Sun L, Wang G, Zhang Z. SimCH: simulation of single-cell RNA sequencing data by modeling cellular heterogeneity at gene expression level. Brief Bioinform 2023; 24:6961608. [PMID: 36575569 DOI: 10.1093/bib/bbac590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/08/2022] [Accepted: 12/02/2022] [Indexed: 12/29/2022] Open
Abstract
Single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) has been a powerful technology for transcriptome analysis. However, the systematic validation of diverse computational tools used in scRNA-seq analysis remains challenging. Here, we propose a novel simulation tool, termed as Simulation of Cellular Heterogeneity (SimCH), for the flexible and comprehensive assessment of scRNA-seq computational methods. The Gaussian Copula framework is recruited to retain gene coexpression of experimental data shown to be associated with cellular heterogeneity. The synthetic count matrices generated by suitable SimCH modes closely match experimental data originating from either homogeneous or heterogeneous cell populations and either unique molecular identifier (UMI)-based or non-UMI-based techniques. We demonstrate how SimCH can benchmark several types of computational methods, including cell clustering, discovery of differentially expressed genes, trajectory inference, batch correction and imputation. Moreover, we show how SimCH can be used to conduct power evaluation of cell clustering methods. Given these merits, we believe that SimCH can accelerate single-cell research.
Collapse
Affiliation(s)
- Lei Sun
- School of Information Engineering, Yangzhou University, Yangzhou, P.R. China.,School of Artificial Intelligence, Yangzhou University, Yangzhou, P.R. China.,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing, P.R. China
| | - Gongming Wang
- School of Information Engineering, Yangzhou University, Yangzhou, P.R. China.,School of Artificial Intelligence, Yangzhou University, Yangzhou, P.R. China.,China Unicom Software Research Institute Jinan Branch, Jinan, P.R. China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, and China National Center for Bioinformation, Beijing, P.R. China.,School of Life Science, University of Chinese Academy of Sciences, Beijing, P.R. China
| |
Collapse
|
28
|
Chatterjee D, Deng WM. Standardization of Single-Cell RNA-Sequencing Analysis Workflow to Study Drosophila Ovary. Methods Mol Biol 2023; 2677:151-171. [PMID: 37464241 DOI: 10.1007/978-1-0716-3259-8_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
Developments in single-cell technology have considerably changed the way we study biology. Significant efforts have been made over the last few years to build comprehensive cell-type-specific transcriptomic atlases for a wide range of tissues in several model organisms in order to discover cell-type-specific markers and drivers of gene expression. One such tissue is the ovary of the fruit-fly Drosophila melanogaster, which is a popular model system with wide-ranging applications in the study of both development and disease. Three independent studies have recently produced comprehensive maps of cell-type-specific gene expression that describe both spatiotemporal regulation of the process of oogenesis and unique transcriptomic profiles of different cell types that constitute the ovary. In this chapter, we outlined the wet-lab protocol that was followed in our recent study for sample preparation and reanalyze the resultant dataset to discuss the benchmarks in data analysis, which are fundamental to comprehensive curation of the single-cell dataset representing the fly ovary.
Collapse
Affiliation(s)
- Deeptiman Chatterjee
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, Tulane Cancer Center, New Orleans, LA, USA.
- Current address: Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Wu-Min Deng
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, Tulane Cancer Center, New Orleans, LA, USA.
| |
Collapse
|
29
|
Dharmaratne M, Kulkarni AS, Taherian Fard A, Mar JC. scShapes: a statistical framework for identifying distribution shapes in single-cell RNA-sequencing data. Gigascience 2022; 12:giac126. [PMID: 36691728 PMCID: PMC9871437 DOI: 10.1093/gigascience/giac126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 08/27/2022] [Accepted: 12/15/2022] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) methods have been advantageous for quantifying cell-to-cell variation by profiling the transcriptomes of individual cells. For scRNA-seq data, variability in gene expression reflects the degree of variation in gene expression from one cell to another. Analyses that focus on cell-cell variability therefore are useful for going beyond changes based on average expression and, instead, identifying genes with homogeneous expression versus those that vary widely from cell to cell. RESULTS We present a novel statistical framework, scShapes, for identifying differential distributions in single-cell RNA-sequencing data using generalized linear models. Most approaches for differential gene expression detect shifts in the mean value. However, as single-cell data are driven by overdispersion and dropouts, moving beyond means and using distributions that can handle excess zeros is critical. scShapes quantifies gene-specific cell-to-cell variability by testing for differences in the expression distribution while flexibly adjusting for covariates if required. We demonstrate that scShapes identifies subtle variations that are independent of altered mean expression and detects biologically relevant genes that were not discovered through standard approaches. CONCLUSIONS This analysis also draws attention to genes that switch distribution shapes from a unimodal distribution to a zero-inflated distribution and raises open questions about the plausible biological mechanisms that may give rise to this, such as transcriptional bursting. Overall, the results from scShapes help to expand our understanding of the role that gene expression plays in the transcriptional regulation of a specific perturbation or cellular phenotype. Our framework scShapes is incorporated into a Bioconductor R package (https://www.bioconductor.org/packages/release/bioc/html/scShapes.html).
Collapse
Affiliation(s)
- Malindrie Dharmaratne
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Ameya S Kulkarni
- Institute for Aging Research, Albert Einstein College of Medicine, Bronx, New York, NY 10461, USA
- Department of Medicine, Division of Endocrinology, Albert Einstein College of Medicine, Bronx, New York, NY 10461, USA
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jessica C Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia
| |
Collapse
|
30
|
Shakola F, Palejev D, Ivanov I. A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022; 13:2362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.
Collapse
Affiliation(s)
- Felitsiya Shakola
- GATE Institute, Sofia University, 125 Tsarigradsko Shosse, Bl. 2, 1113 Sofia, Bulgaria
| | - Dean Palejev
- Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G. Bonchev St., Bl. 8, 1113 Sofia, Bulgaria
| | - Ivan Ivanov
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
31
|
Luo X, Qin F, Xiao F, Cai G. BISC: accurate inference of transcriptional bursting kinetics from single-cell transcriptomic data. Brief Bioinform 2022; 23:6793779. [PMID: 36326081 DOI: 10.1093/bib/bbac464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 11/06/2022] Open
Abstract
Gene expression in mammalian cells is inherently stochastic and mRNAs are synthesized in discrete bursts. Single-cell transcriptomics provides an unprecedented opportunity to explore the transcriptome-wide kinetics of transcriptional bursting. However, current analysis methods provide limited accuracy in bursting inference due to substantial noise inherent to single-cell transcriptomic data. In this study, we developed BISC, a Bayesian method for inferring bursting parameters from single cell transcriptomic data. Based on a beta-gamma-Poisson model, BISC modeled the mean-variance dependency to achieve accurate estimation of bursting parameters from noisy data. Evaluation based on both simulation and real intron sequential RNA fluorescence in situ hybridization data showed improved accuracy and reliability of BISC over existing methods, especially for genes with low expression values. Further application of BISC found bursting frequency but not bursting size was strongly associated with gene expression regulation. Moreover, our analysis provided new mechanistic insights into the functional role of enhancer and superenhancer by modulating both bursting frequency and size. BISC also formulated a downstream framework to identify differential bursting (in frequency and size separately) genes in samples under different conditions. Applying to multiple datasets (a mouse embryonic cell and fibroblast dataset, a human immune cell dataset and a human pancreatic cell dataset), BISC identified known cell-type signature genes that were missed by differential expression analysis, providing additional insights in understanding the cell-specific stochastic gene transcription. Applying to datasets of human lung and colon cancers, BISC successfully detected tumor signature genes based on alterations in bursting kinetics, which illustrates its value in understanding disease development regarding transcriptional bursting. Collectively, BISC provides a new tool for accurately inferring bursting kinetics and detecting differential bursting genes. This study also produced new insights in the role of transcriptional bursting in regulating gene expression, cell identity and tumor progression.
Collapse
Affiliation(s)
- Xizhi Luo
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Fei Qin
- Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| | - Feifei Xiao
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, USA
| | - Guoshuai Cai
- Department of Environmental Health Science, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA
| |
Collapse
|
32
|
Salavaty A, Shehni SA, Ramialison M, Currie PD. Systematic molecular profiling of acute leukemia cancer stem cells allows identification of druggable targets. Heliyon 2022; 8:e11093. [PMID: 36281397 PMCID: PMC9586918 DOI: 10.1016/j.heliyon.2022.e11093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/04/2022] [Accepted: 10/11/2022] [Indexed: 11/06/2022] Open
Abstract
Acute myeloid leukemia (AML) is one of the most prevalent and acute blood cancers with a poor prognosis and low overall survival rate, especially in the elderly. Although several new AML markers and drug targets have been recently identified, the rate of long-term cancer eradication has not improved significantly due to the presence and drug resistance of AML cancer stem cells (CSCs). Here we develop a novel computational pipeline to analyze the transcriptomic profiles of AML cancer (stem) cells and identify novel candidate AML CSC markers and drug targets. In our novel pipeline we apply a top-down meta-analysis strategy to integrate The Cancer Genome Atlas data with CSC datasets to infer cell stemness features. As a result, a set of genes termed the "AML key CSC genes" along with all the available drugs/compounds that could target them were identified. Overall, our novel computational pipeline could retrieve known cancer drugs (Carfilzomib) and predicted novel drugs such as Zonisamide, Amitriptyline, and their targets amongst the top ranked drugs and drug targets for targeting AML. Additionally, the pipeline applied in this study could be used for the identification of CSC-specific markers, drivers and their respective targeting drugs in other cancer types.
Collapse
Affiliation(s)
- Adrian Salavaty
- Australian Regenerative Medicine Institute, Monash University, Clayton, VIC 3800, Australia
- Systems Biology Institute Australia, Monash University, Clayton, VIC 3800, Australia
| | - Sara Alaei Shehni
- Australian Regenerative Medicine Institute, Monash University, Clayton, VIC 3800, Australia
| | - Mirana Ramialison
- Australian Regenerative Medicine Institute, Monash University, Clayton, VIC 3800, Australia
- Systems Biology Institute Australia, Monash University, Clayton, VIC 3800, Australia
- Novo Nordisk Foundation Center for Stem Cell Medicine, Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, VIC, 3052, Australia
- Department of Pediatrics, The Royal Children's Hospital, University of Melbourne Parkville, VIC, 3052, Australia
| | - Peter D. Currie
- Australian Regenerative Medicine Institute, Monash University, Clayton, VIC 3800, Australia
- EMBL Australia, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
33
|
Okada D, Zheng C, Cheng JH. Mathematical model for the relationship between single-cell and bulk gene expression to clarify the interpretation of bulk gene expression data. Comput Struct Biotechnol J 2022; 20:4850-4859. [PMID: 36147671 PMCID: PMC9474327 DOI: 10.1016/j.csbj.2022.08.062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Differential expression analysis is a standard approach in molecular biology. For example, genes whose expression levels differ between diseased and non-diseased samples are considered to be associated with that disease. On the other hand, differential variability analysis focuses on the differences of the variances of gene expression between sample groups. Although differential variability is also known to capture biological information, its interpretation remains unclear and controversial. Recent single-cell analyses have revealed that differences between sample groups can affect gene expression in a cellular subset-specific manner or by altering the proportion of a particular cellular subset. The aim of this study is to clarify the interpretation of mean and variance of bulk gene expression data. METHOD We developed a mathematical model in which the bulk gene expression value is proportional to the mean value of the single-cell gene expression profile. Based on this model, we performed theoretical, simulated and real single-cell RNA-seq data analyses. RESULT AND CONCLUSION We identified how differences in single-cell gene expression profiles affect the differences in the mean and the variance of bulk gene expression. It is shown that differential expression analysis of bulk expression data can overlook significant changes in gene expression at the single-cell level. Further, differential variability analysis capture the complex feature affected by different gene expression shifts for each subset, changes in the proportions of cellular subsets, and variation in single-cell distribution parameters among samples.
Collapse
Affiliation(s)
- Daigo Okada
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto 6068507, Kyoto, Japan
| | - Cheng Zheng
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto 6068507, Kyoto, Japan
| | - Jian Hao Cheng
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, South Research Bldg. No.1(5F), 53 Shogoinkawahara-cho, Sakyo-ku, Kyoto 6068507, Kyoto, Japan
| |
Collapse
|
34
|
Sardoo AM, Zhang S, Ferraro TN, Keck TM, Chen Y. Decoding brain memory formation by single-cell RNA sequencing. Brief Bioinform 2022; 23:6713514. [PMID: 36156112 PMCID: PMC9677489 DOI: 10.1093/bib/bbac412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/10/2022] [Accepted: 08/25/2022] [Indexed: 12/14/2022] Open
Abstract
To understand how distinct memories are formed and stored in the brain is an important and fundamental question in neuroscience and computational biology. A population of neurons, termed engram cells, represents the physiological manifestation of a specific memory trace and is characterized by dynamic changes in gene expression, which in turn alters the synaptic connectivity and excitability of these cells. Recent applications of single-cell RNA sequencing (scRNA-seq) and single-nucleus RNA sequencing (snRNA-seq) are promising approaches for delineating the dynamic expression profiles in these subsets of neurons, and thus understanding memory-specific genes, their combinatorial patterns and regulatory networks. The aim of this article is to review and discuss the experimental and computational procedures of sc/snRNA-seq, new studies of molecular mechanisms of memory aided by sc/snRNA-seq in human brain diseases and related mouse models, and computational challenges in understanding the regulatory mechanisms underlying long-term memory formation.
Collapse
Affiliation(s)
- Atlas M Sardoo
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Thomas N Ferraro
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, NJ 08103, USA
| | - Thomas M Keck
- Department of Biological & Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA,Department of Chemistry & Biochemistry, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Corresponding author. Yong Chen, Department of Biological and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA. Tel.: +1 856 256 4500; E-mail:
| |
Collapse
|
35
|
Junttila S, Smolander J, Elo LL. Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data. Brief Bioinform 2022; 23:6649780. [PMID: 35880426 PMCID: PMC9487674 DOI: 10.1093/bib/bbac286] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 06/07/2022] [Accepted: 06/23/2022] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) enables researchers to quantify transcriptomes of thousands of cells simultaneously and study transcriptomic changes between cells. scRNA-seq datasets increasingly include multisubject, multicondition experiments to investigate cell-type-specific differential states (DS) between conditions. This can be performed by first identifying the cell types in all the subjects and then by performing a DS analysis between the conditions within each cell type. Naïve single-cell DS analysis methods that treat cells statistically independent are subject to false positives in the presence of variation between biological replicates, an issue known as the pseudoreplicate bias. While several methods have already been introduced to carry out the statistical testing in multisubject scRNA-seq analysis, comparisons that include all these methods are currently lacking. Here, we performed a comprehensive comparison of 18 methods for the identification of DS changes between conditions from multisubject scRNA-seq data. Our results suggest that the pseudobulk methods performed generally best. Both pseudobulks and mixed models that model the subjects as a random effect were superior compared with the naïve single-cell methods that do not model the subjects in any way. While the naïve models achieved higher sensitivity than the pseudobulk methods and the mixed models, they were subject to a high number of false positives. In addition, accounting for subjects through latent variable modeling did not improve the performance of the naïve methods.
Collapse
Affiliation(s)
| | | | - Laura L Elo
- Corresponding author: Laura L. Elo, Turku Bioscience Centre, University of Turku and Åbo Akademi University, FI-20520 Turku, Finland. Tel.: +358504680795; E-mail:
| |
Collapse
|
36
|
Jones A, Townes FW, Li D, Engelhardt BE. Contrastive latent variable modeling with application to case-control sequencing experiments. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Andrew Jones
- Department of Computer Science, Princeton University
| | | | - Didong Li
- Department of Computer Science, Princeton University
| | | |
Collapse
|
37
|
Zhang S, Xie L, Cui Y, Carone BR, Chen Y. Detecting Fear-Memory-Related Genes from Neuronal scRNA-seq Data by Diverse Distributions and Bhattacharyya Distance. Biomolecules 2022; 12:biom12081130. [PMID: 36009024 PMCID: PMC9405875 DOI: 10.3390/biom12081130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 08/12/2022] [Accepted: 08/15/2022] [Indexed: 11/16/2022] Open
Abstract
The detection of differentially expressed genes (DEGs) is one of most important computational challenges in the analysis of single-cell RNA sequencing (scRNA-seq) data. However, due to the high heterogeneity and dropout noise inherent in scRNAseq data, challenges in detecting DEGs exist when using a single distribution of gene expression levels, leaving much room to improve the precision and robustness of current DEG detection methods. Here, we propose the use of a new method, DEGman, which utilizes several possible diverse distributions in combination with Bhattacharyya distance. DEGman can automatically select the best-fitting distributions of gene expression levels, and then detect DEGs by permutation testing of Bhattacharyya distances of the selected distributions from two cell groups. Compared with several popular DEG analysis tools on both large-scale simulation data and real scRNA-seq data, DEGman shows an overall improvement in the balance of sensitivity and precision. We applied DEGman to scRNA-seq data of TRAP; Ai14 mouse neurons to detect fear-memory-related genes that are significantly differentially expressed in neurons with and without fear memory. DEGman detected well-known fear-memory-related genes and many novel candidates. Interestingly, we found 25 DEGs in common in five neuron clusters that are functionally enriched for synaptic vesicles, indicating that the coupled dynamics of synaptic vesicles across in neurons plays a critical role in remote memory formation. The proposed method leverages the advantage of the use of diverse distributions in DEG analysis, exhibiting better performance in analyzing composite scRNA-seq datasets in real applications.
Collapse
Affiliation(s)
- Shaoqiang Zhang
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Linjuan Xie
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yaxuan Cui
- Department of Computer Science, College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Benjamin R. Carone
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
| | - Yong Chen
- Department of Biology and Biomedical Sciences, Rowan University, Glassboro, NJ 08028, USA
- Correspondence: ; Tel.: +1-856-256-4500
| |
Collapse
|
38
|
Mallick H, Chatterjee S, Chowdhury S, Chatterjee S, Rahnavard A, Hicks SC. Differential expression of single-cell RNA-seq data using Tweedie models. Stat Med 2022; 41:3492-3510. [PMID: 35656596 PMCID: PMC9288986 DOI: 10.1002/sim.9430] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 12/13/2022]
Abstract
The performance of computational methods and software to identify differentially expressed features in single-cell RNA-sequencing (scRNA-seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA-seq expression features. To model the technological variability in cross-platform scRNA-seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA-seq expression profiles across experimental platforms induced by platform- and gene-specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero-inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero-inflated scRNA-seq data with excessive zero counts. Using both synthetic and published plate- and droplet-based scRNA-seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state-of-the-art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open-source software (R/Bioconductor package) is available at https://github.com/himelmallick/Tweedieverse.
Collapse
Affiliation(s)
- Himel Mallick
- Biostatistics and Research Decision Sciences, Merck &
Co., Inc., Rahway, NJ 07065, USA
| | - Suvo Chatterjee
- Epidemiology Branch, Division of Intramural Population
Health Research, Eunice Kennedy Shriver National Institute of Child
Health and Human Development, National Institutes of Health, Bethesda, MD 20892,
USA
| | - Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences and Icahn
Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount
Sinai, New York, NY 10029, USA
| | - Saptarshi Chatterjee
- Department of Statistics, Data and Analytics, Eli Lilly
& Company, Indianapolis, IN 46225, USA
| | - Ali Rahnavard
- Computational Biology Institute, Department of
Biostatistics and Bioinformatics, Milken Institute School of Public Health, The
George Washington University, Washington, DC 20052, USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School
of Public Health, Baltimore, MD 21205, USA
| |
Collapse
|
39
|
Das S, Rai A, Rai SN. Differential Expression Analysis of Single-Cell RNA-Seq Data: Current Statistical Approaches and Outstanding Challenges. ENTROPY 2022; 24:e24070995. [PMID: 35885218 PMCID: PMC9315519 DOI: 10.3390/e24070995] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 06/25/2022] [Accepted: 07/09/2022] [Indexed: 01/11/2023]
Abstract
With the advent of single-cell RNA-sequencing (scRNA-seq), it is possible to measure the expression dynamics of genes at the single-cell level. Through scRNA-seq, a huge amount of expression data for several thousand(s) of genes over million(s) of cells are generated in a single experiment. Differential expression analysis is the primary downstream analysis of such data to identify gene markers for cell type detection and also provide inputs to other secondary analyses. Many statistical approaches for differential expression analysis have been reported in the literature. Therefore, we critically discuss the underlying statistical principles of the approaches and distinctly divide them into six major classes, i.e., generalized linear, generalized additive, Hurdle, mixture models, two-class parametric, and non-parametric approaches. We also succinctly discuss the limitations that are specific to each class of approaches, and how they are addressed by other subsequent classes of approach. A number of challenges are identified in this study that must be addressed to develop the next class of innovative approaches. Furthermore, we also emphasize the methodological challenges involved in differential expression analysis of scRNA-seq data that researchers must address to draw maximum benefit from this recent single-cell technology. This study will serve as a guide to genome researchers and experimental biologists to objectively select options for their analysis.
Collapse
Affiliation(s)
- Samarendra Das
- ICAR-Directorate of Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- International Centre for Foot and Mouth Disease, Arugul, Bhubaneswar 752050, India
- Correspondence: or (S.D.); (S.N.R.)
| | - Anil Rai
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India;
| | - Shesh N. Rai
- School of Interdisciplinary and Graduate Studies, University of Louisville, Louisville, KY 40292, USA
- Biostatistics and Bioinformatics Facility, Brown Cancer Center, University of Louisville, Louisville, KY 40202, USA
- Biostatisitcs and Informatics Facility, Center for Integrative Environmental Health Sciences, University of Louisville, Louisville, KY 40202, USA
- Data Analysis and Sample Management Facility, The University of Louisville Super Fund Center, University of Louisville, Louisville, KY 40202, USA
- Hepatobiology and Toxicology Center, University of Louisville, Louisville, KY 40202, USA
- Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY 40202, USA
- Correspondence: or (S.D.); (S.N.R.)
| |
Collapse
|
40
|
Ellis D, Wu D, Datta S. SAREV: A review on statistical analytics of single-cell RNA sequencing data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14:e1558. [PMID: 36034329 PMCID: PMC9400796 DOI: 10.1002/wics.1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/09/2021] [Indexed: 06/15/2023]
Abstract
Due to the development of next-generation RNA sequencing (NGS) technologies, there has been tremendous progress in research involving determining the role of genomics, transcriptomics and epigenomics in complex biological systems. However, scientists have realized that information obtained using earlier technology, frequently called 'bulk RNA-seq' data, provides information averaged across all the cells present in a tissue. Relatively newly developed single cell (scRNA-seq) technology allows us to provide transcriptomic information at a single-cell resolution. Nevertheless, these high-resolution data have their own complex natures and demand novel statistical data analysis methods to provide effective and highly accurate results on complex biological systems. In this review, we cover many such recently developed statistical methods for researchers wanting to pursue scRNA-seq statistical and computational research as well as scientific research about these existing methods and free software tools available for their generated data. This review is certainly not exhaustive due to page limitations. We have tried to cover the popular methods starting from quality control to the downstream analysis of finding differentially expressed genes and concluding with a brief description of network analysis.
Collapse
Affiliation(s)
- Dorothy Ellis
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Dongyuan Wu
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| | - Susmita Datta
- Department of Biostatistics, University of Florida, School of Public Health and Health Professions, Gainesville, FL
| |
Collapse
|
41
|
Wei X, Dong J, Wang F. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation. Bioinformatics 2022; 38:3377-3384. [PMID: 35639705 DOI: 10.1093/bioinformatics/btac357] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 04/29/2022] [Accepted: 05/20/2022] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Rapid developments of single-cell RNA sequencing technologies allow study of responses to external perturbations at individual cell level. However, in many cases, it is hard to collect the perturbed cells, such as knowing the response of a cell type to the drug before actual medication to a patient. Prediction in silicon could alleviate the problem and save cost. Although several tools have been developed, their prediction accuracy leaves much room for improvement. RESULTS In this article, we propose scPreGAN (Single-Cell data Prediction base on GAN), a deep generative model for predicting the response of single-cell expression to perturbation. ScPreGAN integrates autoencoder and generative adversarial network, the former is to extract common information of the unperturbed data and the perturbed data, the latter is to predict the perturbed data. Experiments on three real datasets show that scPreGAN outperforms three state-of-the-art methods, which can capture the complicated distribution of cell expression and generate the prediction data with the same expression abundance as the real data. AVAILABILITY AND IMPLEMENTATION The implementation of scPreGAN is available via https://github.com/JaneJiayiDong/scPreGAN. To reproduce the results of this article, please visit https://github.com/JaneJiayiDong/scPreGAN-reproducibility. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiajie Wei
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Jiayi Dong
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| | - Fei Wang
- Shanghai Key Lab of Intelligent Information Processing, Shanghai, China
- School of Computer Science and Technology, Fudan University, Shanghai, China
| |
Collapse
|
42
|
Shichino S, Ueha S, Hashimoto S, Ogawa T, Aoki H, Wu B, Chen CY, Kitabatake M, Ouji-Sageshima N, Sawabata N, Kawaguchi T, Okayama T, Sugihara E, Hontsu S, Ito T, Iwata Y, Wada T, Ikeo K, Sato TA, Matsushima K. TAS-Seq is a robust and sensitive amplification method for bead-based scRNA-seq. Commun Biol 2022; 5:602. [PMID: 35760847 PMCID: PMC9245575 DOI: 10.1038/s42003-022-03536-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Accepted: 05/27/2022] [Indexed: 12/22/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) is valuable for analyzing cellular heterogeneity. Cell composition accuracy is critical for analyzing cell-cell interaction networks from scRNA-seq data. However, droplet- and plate-based scRNA-seq techniques have cell sampling bias that could affect the cell composition of scRNA-seq datasets. Here we developed terminator-assisted solid-phase cDNA amplification and sequencing (TAS-Seq) for scRNA-seq based on a terminator, terminal transferase, and nanowell/bead-based scRNA-seq platform. TAS-Seq showed high tolerance to variations in the terminal transferase reaction, which complicate the handling of existing terminal transferase-based scRNA-seq methods. In murine and human lung samples, TAS-Seq yielded scRNA-seq data that were highly correlated with flow-cytometric data, showing higher gene-detection sensitivity and more robust detection of important cell-cell interactions and expression of growth factors/interleukins in cell subsets than 10X Chromium v2 and Smart-seq2. Expanding TAS-Seq application will improve understanding and atlas construction of lung biology at the single-cell level.
Collapse
Affiliation(s)
- Shigeyuki Shichino
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | - Satoshi Ueha
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | - Shinichi Hashimoto
- Department of Molecular Pathophysiology, Institute of Advanced Medicine, Wakayama Medical University, Wakayama, Japan
| | - Tatsuro Ogawa
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | - Hiroyasu Aoki
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | - Bin Wu
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | - Chang-Yu Chen
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan
| | | | | | - Noriyoshi Sawabata
- Department of Thoracic and Cardio-Vascular Surgery, Nara Medical University, Nara, Japan
| | - Takeshi Kawaguchi
- Department of Thoracic and Cardio-Vascular Surgery, Nara Medical University, Nara, Japan
| | | | - Eiji Sugihara
- Research and Development Center for Precision Medicine, University of Tsukuba, Ibaragi, Japan
- Center for Joint Research Facilities Support, Research Promotion and Support Headquarters, Fujita Health University, Aichi, Japan
| | - Shigeto Hontsu
- Department of Respiratory Medicine, Nara Medical University, Nara, Japan
| | - Toshihiro Ito
- Department of Immunology, Nara Medical University, Nara, Japan
| | - Yasunori Iwata
- Division of Infection Control, Kanazawa University Hospital, Department of Nephrology and Laboratory Medicine, Kanazawa University, Ishikawa, Japan
| | - Takashi Wada
- Division of Infection Control, Kanazawa University Hospital, Department of Nephrology and Laboratory Medicine, Kanazawa University, Ishikawa, Japan
| | - Kazuho Ikeo
- National Institute of Genetics, Shizuoka, Japan
| | - Taka-Aki Sato
- Research and Development Center for Precision Medicine, University of Tsukuba, Ibaragi, Japan
| | - Kouji Matsushima
- Division of Molecular Regulation of Inflammatory and Immune Diseases, Research Institute of Biomedical Sciences, Tokyo University of Science, Chiba, Japan.
| |
Collapse
|
43
|
Wang R, Lin DY, Jiang Y. EPIC: Inferring relevant cell types for complex traits by integrating genome-wide association studies and single-cell RNA sequencing. PLoS Genet 2022; 18:e1010251. [PMID: 35709291 PMCID: PMC9242467 DOI: 10.1371/journal.pgen.1010251] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 06/29/2022] [Accepted: 05/12/2022] [Indexed: 11/18/2022] Open
Abstract
More than a decade of genome-wide association studies (GWASs) have identified genetic risk variants that are significantly associated with complex traits. Emerging evidence suggests that the function of trait-associated variants likely acts in a tissue- or cell-type-specific fashion. Yet, it remains challenging to prioritize trait-relevant tissues or cell types to elucidate disease etiology. Here, we present EPIC (cEll tyPe enrIChment), a statistical framework that relates large-scale GWAS summary statistics to cell-type-specific gene expression measurements from single-cell RNA sequencing (scRNA-seq). We derive powerful gene-level test statistics for common and rare variants, separately and jointly, and adopt generalized least squares to prioritize trait-relevant cell types while accounting for the correlation structures both within and between genes. Using enrichment of loci associated with four lipid traits in the liver and enrichment of loci associated with three neurological disorders in the brain as ground truths, we show that EPIC outperforms existing methods. We apply our framework to multiple scRNA-seq datasets from different platforms and identify cell types underlying type 2 diabetes and schizophrenia. The enrichment is replicated using independent GWAS and scRNA-seq datasets and further validated using PubMed search and existing bulk case-control testing results.
Collapse
Affiliation(s)
- Rujin Wang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| | - Yuchao Jiang
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
- Department of Genetics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, United States of America
- * E-mail: (D-YL); (YJ)
| |
Collapse
|
44
|
Zou J, Deng F, Wang M, Zhang Z, Liu Z, Zhang X, Hua R, Chen K, Zou X, Hao J. scCODE: an R package for data-specific differentially expressed gene detection on single-cell RNA-sequencing data. Brief Bioinform 2022; 23:6590434. [PMID: 35598331 DOI: 10.1093/bib/bbac180] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 04/06/2022] [Accepted: 04/22/2022] [Indexed: 12/13/2022] Open
Abstract
Abstract
Differential expression (DE) gene detection in single-cell ribonucleic acid (RNA)-sequencing (scRNA-seq) data is a key step to understand the biological question investigated. Filtering genes is suggested to improve the performance of DE methods, but the influence of filtering genes has not been demonstrated. Furthermore, the optimal methods for different scRNA-seq datasets are divergent, and different datasets should benefit from data-specific DE gene detection strategies. However, existing tools did not take gene filtering into consideration. There is a lack of metrics for evaluating the optimal method on experimental datasets. Based on two new metrics, we propose single-cell Consensus Optimization of Differentially Expressed gene detection, an R package to automatically optimize DE gene detection for each experimental scRNA-seq dataset.
Collapse
Affiliation(s)
- Jiawei Zou
- School of Life Sciences and Biotechnology, Shanghai Centre for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Fulan Deng
- School of Materials Science and Engineering, Shanghai Institute of Technology, Shanghai 201418, China
| | - Miaochen Wang
- Department of Oral and Maxillofacial-Head & Neck Oncology, Shanghai Ninth Peopleȉs Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology
| | - Zhen Zhang
- Department of Oral and Maxillofacial-Head & Neck Oncology, Shanghai Ninth Peopleȉs Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology
| | - Zheqi Liu
- Department of Oral and Maxillofacial-Head & Neck Oncology, Shanghai Ninth Peopleȉs Hospital, Shanghai Jiao Tong University School of Medicine; College of Stomatology, Shanghai Jiao Tong University; National Center for Stomatology; National Clinical Research Center for Oral Diseases; Shanghai Key Laboratory of Stomatology
| | - Xiaobin Zhang
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
- Department of Cardiovascular Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Rong Hua
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
| | - Ke Chen
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai, 201602, China
| | - Xin Zou
- Jinshan Hospital Center for Tumor Diagnosis & Therapy, Jinshan Hospital, Fudan University, Shanghai, 201508, China
| | - Jie Hao
- Institute of Clinical Science, Zhongshan Hospital, Fudan University, Shanghai, China
| |
Collapse
|
45
|
Zhang M, Guo FR. BSDE: barycenter single-cell differential expression for case-control studies. Bioinformatics 2022; 38:2765-2772. [PMID: 35561165 PMCID: PMC9113363 DOI: 10.1093/bioinformatics/btac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Revised: 03/14/2022] [Accepted: 03/23/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-cell sequencing brings about a revolutionarily high resolution for finding differentially expressed genes (DEGs) by disentangling highly heterogeneous cell tissues. Yet, such analysis is so far mostly focused on comparing between different cell types from the same individual. As single-cell sequencing becomes cheaper and easier to use, an increasing number of datasets from case-control studies are becoming available, which call for new methods for identifying differential expressions between case and control individuals. RESULTS To bridge this gap, we propose barycenter single-cell differential expression (BSDE), a nonparametric method for finding DEGs for case-control studies. Through the use of optimal transportation for aggregating distributions and computing their distances, our method overcomes the restrictive parametric assumptions imposed by standard mixed-effect-modeling approaches. Through simulations, we show that BSDE can accurately detect a variety of differential expressions while maintaining the type-I error at a prescribed level. Further, 1345 and 1568 cell type-specific DEGs are identified by BSDE from datasets on pulmonary fibrosis and multiple sclerosis, among which the top findings are supported by previous results from the literature. AVAILABILITY AND IMPLEMENTATION R package BSDE is freely available from doi.org/10.5281/zenodo.6332254. For real data analysis with the R package, see doi.org/10.5281/zenodo.6332566. These can also be accessed thorough GitHub at github.com/mqzhanglab/BSDE and github.com/mqzhanglab/BSDE_pipeline. The two single-cell sequencing datasets can be download with UCSC cell browser from cells.ucsc.edu/?ds=ms and cells.ucsc.edu/?ds=lung-pf-control. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Surgery, Perelman Medical School, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
46
|
Nault R, Saha S, Bhattacharya S, Dodson J, Sinha S, Maiti T, Zacharewski T. Benchmarking of a Bayesian single cell RNAseq differential gene expression test for dose-response study designs. Nucleic Acids Res 2022; 50:e48. [PMID: 35061903 PMCID: PMC9071439 DOI: 10.1093/nar/gkac019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 12/15/2021] [Accepted: 01/07/2022] [Indexed: 12/04/2022] Open
Abstract
The application of single-cell RNA sequencing (scRNAseq) for the evaluation of chemicals, drugs, and food contaminants presents the opportunity to consider cellular heterogeneity in pharmacological and toxicological responses. Current differential gene expression analysis (DGEA) methods focus primarily on two group comparisons, not multi-group dose-response study designs used in safety assessments. To benchmark DGEA methods for dose-response scRNAseq experiments, we proposed a multiplicity corrected Bayesian testing approach and compare it against 8 other methods including two frequentist fit-for-purpose tests using simulated and experimental data. Our Bayesian test method outperformed all other tests for a broad range of accuracy metrics including control of false positive error rates. Most notable, the fit-for-purpose and standard multiple group DGEA methods were superior to the two group scRNAseq methods for dose-response study designs. Collectively, our benchmarking of DGEA methods demonstrates the importance in considering study design when determining the most appropriate test methods.
Collapse
Affiliation(s)
- Rance Nault
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, USA
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, USA
| | - Satabdi Saha
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Sudin Bhattacharya
- Biomedical Engineering Department, Pharmacology & Toxicology, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Jack Dodson
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, USA
| | - Samiran Sinha
- Department of Statistics, Texas A&M University, College Station, TX 77843, USA
| | - Tapabrata Maiti
- Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
| | - Tim Zacharewski
- Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, MI, USA
- Institute for Integrative Toxicology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
47
|
Zhu B, Li H, Zhang L, Chandra SS, Zhao H. A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data. Brief Bioinform 2022; 23:6581434. [PMID: 35514182 PMCID: PMC9487630 DOI: 10.1093/bib/bbac166] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/02/2022] [Accepted: 04/13/2022] [Indexed: 11/13/2022] Open
Abstract
The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson's disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.
Collapse
Affiliation(s)
- Biqing Zhu
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Li
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA
| | - Le Zhang
- Department of Neurology, School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - Sreeganga S Chandra
- Department of Neurology, School of Medicine, Yale University, New Haven, CT, 06511, USA,Department of Neuroscience, School of Medicine, Yale University, New Haven, CT, 06511, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA,Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA,Corresponding author. Hongyu Zhao, 300 George Street, Ste 503, New Haven, CT 06511. E-mail:
| |
Collapse
|
48
|
Abondio P, De Intinis C, da Silva Gonçalves Vianez Júnior JL, Pace L. SINGLE CELL MULTIOMIC APPROACHES TO DISENTANGLE T CELL HETEROGENEITY. Immunol Lett 2022; 246:37-51. [DOI: 10.1016/j.imlet.2022.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 04/16/2022] [Accepted: 04/26/2022] [Indexed: 11/29/2022]
|
49
|
Kim C, Lee H, Jeong J, Jung K, Han B. MarcoPolo: a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering. Nucleic Acids Res 2022; 50:e71. [PMID: 35420135 PMCID: PMC9262626 DOI: 10.1093/nar/gkac216] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 03/16/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
The standard analysis pipeline for single-cell RNA-seq data consists of sequential steps initiated by clustering the cells. An innate limitation of this pipeline is that an imperfect clustering result can irreversibly affect the succeeding steps. For example, there can be cell types not well distinguished by clustering because they largely share the global structure, such as the anterior primitive streak and mid primitive streak cells. If one searches differentially expressed genes (DEGs) solely based on clustering, marker genes for distinguishing these types will be missed. Moreover, clustering depends on many parameters and can often be subjective to manual decisions. To overcome these limitations, we propose MarcoPolo, a method that identifies informative DEGs independently of prior clustering. MarcoPolo sorts out genes by evaluating if the distributions are bimodal, if similar expression patterns are observed in other genes, and if the expressing cells are proximal in a low-dimensional space. Using real datasets with FACS-purified cell labels, we demonstrate that MarcoPolo recovers marker genes better than competing methods. Notably, MarcoPolo finds key genes that can distinguish cell types that are not distinguishable by the standard clustering. MarcoPolo is built in a convenient software package that provides analysis results in an HTML file.
Collapse
Affiliation(s)
- Chanwoo Kim
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, Republic of Korea.,Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Hanbin Lee
- Department of Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Juhee Jeong
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Keehoon Jung
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea.,Department of Anatomy and Cell Biology, Seoul National University College of Medicine, Seoul, Republic of Korea.,Institute of Allergy and Clinical Immunology, Seoul National University Medical Research Center, Seoul, Republic of Korea
| | - Buhm Han
- Department of Biomedical Sciences, BK21 Plus Biomedical Science Project, Seoul National University College of Medicine, Seoul, Republic of Korea.,Interdisciplinary Program in Bioengineering, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
50
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|