1
|
Goss K, Horwitz EM. Single-cell multiomics to advance cell therapy. Cytotherapy 2025; 27:137-145. [PMID: 39530970 DOI: 10.1016/j.jcyt.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 10/21/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
Single-cell RNA-sequencing (scRNAseq) was first introduced in 2009 and has evolved with many technological advancements over the last decade. Not only are there several scRNAseq platforms differing in many aspects, but there are also a large number of computational pipelines available for downstream analyses which are being developed at an exponential rate. Such computational data appear in many scientific publications in virtually every field of study; thus, investigators should be able to understand and interpret data in this rapidly evolving field. Here, we discuss key differences in scRNAseq platforms, crucial steps in scRNAseq experiments, standard downstream analyses and introduce newly developed multimodal approaches. We then discuss how single-cell omics has been applied to advance the field of cell therapy.
Collapse
Affiliation(s)
- Kyndal Goss
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA
| | - Edwin M Horwitz
- Marcus Center for Advanced Cellular Therapy, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Aflac Cancer & Blood Disorders Center, Children's Healthcare of Atlanta, Atlanta, Georgia, USA; Department of Pediatrics, Emory University School of Medicine, Atlanta, Georgia, USA; Graduate Division of Biology and Biomedical Sciences, Emory University Laney Graduate School, Atlanta, Georgia, USA.
| |
Collapse
|
2
|
Calarco JA, Taylor SR, Miller DM. Detecting gene expression in Caenorhabditis elegans. Genetics 2025; 229:1-108. [PMID: 39693264 DOI: 10.1093/genetics/iyae167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2024] [Accepted: 09/30/2024] [Indexed: 12/20/2024] Open
Abstract
Reliable methods for detecting and analyzing gene expression are necessary tools for understanding development and investigating biological responses to genetic and environmental perturbation. With its fully sequenced genome, invariant cell lineage, transparent body, wiring diagram, detailed anatomy, and wide array of genetic tools, Caenorhabditis elegans is an exceptionally useful model organism for linking gene expression to cellular phenotypes. The development of new techniques in recent years has greatly expanded our ability to detect gene expression at high resolution. Here, we provide an overview of gene expression methods for C. elegans, including techniques for detecting transcripts and proteins in situ, bulk RNA sequencing of whole worms and specific tissues and cells, single-cell RNA sequencing, and high-throughput proteomics. We discuss important considerations for choosing among these techniques and provide an overview of publicly available online resources for gene expression data.
Collapse
Affiliation(s)
- John A Calarco
- Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada, M5S 3G5
| | - Seth R Taylor
- Department of Cell Biology and Physiology, Brigham Young University, Provo, UT 84602, USA
| | - David M Miller
- Department of Cell and Developmental Biology, Vanderbilt University, Nashville, TN 37240, USA
- Neuroscience Program, Vanderbilt University, Nashville, TN 37240, USA
| |
Collapse
|
3
|
Livne D, Efroni S. Pathway metrics accurately stratify T cells to their cells states. BioData Min 2024; 17:60. [PMID: 39716187 DOI: 10.1186/s13040-024-00416-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Accepted: 12/10/2024] [Indexed: 12/25/2024] Open
Abstract
Pathway analysis is a powerful approach for elucidating insights from gene expression data and associating such changes with cellular phenotypes. The overarching objective of pathway research is to identify critical molecular drivers within a cellular context and uncover novel signaling networks from groups of relevant biomolecules. In this work, we present PathSingle, a Python-based pathway analysis tool tailored for single-cell data analysis. PathSingle employs a unique graph-based algorithm to enable the classification of diverse cellular states, such as T cell subtypes. Designed to be open-source, extensible, and computationally efficient, PathSingle is available at https://github.com/zurkin1/PathSingle under the MIT license. This tool provides researchers with a versatile framework for uncovering biologically meaningful insights from high-dimensional single-cell transcriptomics data, facilitating a deeper understanding of cellular regulation and function.
Collapse
Affiliation(s)
- Dani Livne
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel.
| | - Sol Efroni
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| |
Collapse
|
4
|
Liu X, Wang H, Gao J. scIALM: A method for sparse scRNA-seq expression matrix imputation using the Inexact Augmented Lagrange Multiplier with low error. Comput Struct Biotechnol J 2024; 23:549-558. [PMID: 38274995 PMCID: PMC10809077 DOI: 10.1016/j.csbj.2023.12.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/27/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a high-throughput sequencing technology that quantifies gene expression profiles of specific cell populations at the single-cell level, providing a foundation for studying cellular heterogeneity and patient pathological characteristics. It is effective for developmental, fertility, and disease studies. However, the cell-gene expression matrix of single-cell sequencing data is often sparse and contains numerous zero values. Some of the zero values derive from noise, where dropout noise has a large impact on downstream analysis. In this paper, we propose a method named scIALM for imputation recovery of sparse single-cell RNA data expression matrices, which employs the Inexact Augmented Lagrange Multiplier method to use sparse but clean (accurate) data to recover unknown entries in the matrix. We perform experimental analysis on four datasets, calling the expression matrix after Quality Control (QC) as the original matrix, and comparing the performance of scIALM with six other methods using mean squared error (MSE), mean absolute error (MAE), Pearson correlation coefficient (PCC), and cosine similarity (CS). Our results demonstrate that scIALM accurately recovers the original data of the matrix with an error of 10e-4, and the mean value of the four metrics reaches 4.5072 (MSE), 0.765 (MAE), 0.8701 (PCC), 0.8896 (CS). In addition, at 10%-50% random masking noise, scIALM is the least sensitive to the masking ratio. For downstream analysis, this study uses adjusted rand index (ARI) and normalized mutual information (NMI) to evaluate the clustering effect, and the results are improved on three datasets containing real cluster labels.
Collapse
Affiliation(s)
- Xiaohong Liu
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Han Wang
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| | - Jingyang Gao
- College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, 100029, China
| |
Collapse
|
5
|
Alani M, Altarturih H, Pars S, Al-mhanawi B, Wolvetang EJ, Shaker MR. A Roadmap for Selecting and Utilizing Optimal Features in scRNA Sequencing Data Analysis for Stem Cell Research: A Comprehensive Review. Int J Stem Cells 2024; 17:347-362. [PMID: 38531607 PMCID: PMC11612217 DOI: 10.15283/ijsc23170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 02/26/2024] [Accepted: 02/27/2024] [Indexed: 03/28/2024] Open
Abstract
Stem cells and the cells they produce are unique because they vary from one cell to another. Traditional methods of studying cells often overlook these differences. However, the development of new technologies for studying individual cells has greatly changed biological research in recent years. Among these innovations, single-cell RNA sequencing (scRNA-seq) stands out. This technique allows scientists to examine the activity of genes in each cell, across thousands or even millions of cells. This makes it possible to understand the diversity of cells, identify new types of cells, and see how cells differ across different tissues, individuals, species, times, and conditions. This paper discusses the importance of scRNA-seq and the computational tools and software that are essential for analyzing the vast amounts of data generated by scRNA-seq studies. Our goal is to provide practical advice for bioinformaticians and biologists who are using scRNA-seq to study stem cells. We offer an overview of the scRNA-seq field, including the tools available, how they can be used, and how to present the results of these studies effectively. Our findings include a detailed overview and classification of tools used in scRNA-seq analysis, based on a review of 2,733 scientific publications. This review is complemented by information from the scRNA-tools database, which lists over 1,400 tools for analyzing scRNA-seq data. This database is an invaluable resource for researchers, offering a wide range of options for analyzing their scRNA-seq data.
Collapse
Affiliation(s)
- Maath Alani
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
| | - Hamza Altarturih
- Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
| | - Selin Pars
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
| | - Bahaa Al-mhanawi
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
| | - Ernst J. Wolvetang
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
| | - Mohammed R. Shaker
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Australia
| |
Collapse
|
6
|
Petrany A, Chen R, Zhang S, Chen Y. Theoretical framework for the difference of two negative binomial distributions and its application in comparative analysis of sequencing data. Genome Res 2024; 34:1636-1650. [PMID: 39406498 PMCID: PMC11529838 DOI: 10.1101/gr.278843.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 09/10/2024] [Indexed: 11/01/2024]
Abstract
High-throughput sequencing (HTS) technologies have been instrumental in investigating biological questions at the bulk and single-cell levels. Comparative analysis of two HTS data sets often relies on testing the statistical significance for the difference of two negative binomial distributions (DOTNB). Although negative binomial distributions are well studied, the theoretical results for DOTNB remain largely unexplored. Here, we derive basic analytical results for DOTNB and examine its asymptotic properties. As a state-of-the-art application of DOTNB, we introduce DEGage, a computational method for detecting differentially expressed genes (DEGs) in scRNA-seq data. DEGage calculates the mean of the sample-wise differences of gene expression levels as the test statistic and determines significant differential expression by computing the P-value with DOTNB. Extensive validation using simulated and real scRNA-seq data sets demonstrates that DEGage outperforms five popular DEG analysis tools: DEGseq2, DEsingle, edgeR, Monocle3, and scDD. DEGage is robust against high dropout levels and exhibits superior sensitivity when applied to balanced and imbalanced data sets, even with small sample sizes. We utilize DEGage to analyze prostate cancer scRNA-seq data sets and identify marker genes for 17 cell types. Furthermore, we apply DEGage to scRNA-seq data sets of mouse neurons with and without fear memory and reveal eight potential memory-related genes overlooked in previous analyses. The theoretical results and supporting software for DOTNB can be widely applied to comparative analyses of dispersed count data in HTS and broad research questions.
Collapse
Affiliation(s)
- Alicia Petrany
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, New Jersey 08028, USA
| | - Ruoyu Chen
- Moorestown High School, Moorestown, New Jersey 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, Glassboro, New Jersey 08028, USA;
| |
Collapse
|
7
|
Dollinger E, Silkwood K, Atwood S, Nie Q, Lander AD. Statistically principled feature selection for single cell transcriptomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.11.617709. [PMID: 39463971 PMCID: PMC11507810 DOI: 10.1101/2024.10.11.617709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
The high dimensionality of data in single cell transcriptomics (scRNAseq) requires investigators to choose subsets of genes (feature selection) for downstream analysis (e.g., unsupervised cell clustering). The evaluation of different approaches to feature selection is hampered by the fact that, as we show here, the performance of feature selection methods varies greatly with the task being performed. For routine cell type identification, even randomly chosen features can perform well, but for cell type differences that are subtle, both number of features and selection strategy can matter strongly. Here we present a simple feature selection method grounded in an analytical model that, without resorting to arbitrary thresholds or user-defined parameters, allows for interpretable delineation of both how many and which features to choose, facilitating identification of biologically meaningful rare cell types. We compare this method to default methods in scanpy and Seurat, as well as SCTransform, showing how greater accuracy can often be achieved with surprisingly few, well-chosen features.
Collapse
Affiliation(s)
- Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697
| | - Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697
- Department of Mathematics, University of California, Irvine, Irvine, CA 92697
| | - Arthur D. Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA 92697
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA 92697
| |
Collapse
|
8
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BMC Bioinformatics 2024; 25:305. [PMID: 39294560 PMCID: PMC11411778 DOI: 10.1186/s12859-024-05926-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 09/09/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in which ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Joshua Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Arthur D Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, USA.
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, USA.
| |
Collapse
|
9
|
Jin W, Pei J, Roy JR, Jayaraman S, Ahalliya RM, Kanniappan GV, Mironescu M, Palanisamy CP. Comprehensive review on single-cell RNA sequencing: A new frontier in Alzheimer's disease research. Ageing Res Rev 2024; 100:102454. [PMID: 39142391 DOI: 10.1016/j.arr.2024.102454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/07/2024] [Accepted: 08/09/2024] [Indexed: 08/16/2024]
Abstract
Alzheimer's disease (AD) is a multifaceted neurodegenerative condition marked by gradual cognitive deterioration and the loss of neurons. While conventional bulk RNA sequencing techniques have shed light on AD pathology, they frequently obscure the cellular diversity within brain tissues. The advent of single-cell RNA sequencing (scRNA-seq) has transformed our capability to analyze the cellular composition of AD, allowing for the detection of unique cell populations, rare cell types, and gene expression alterations at an individual cell level. This review examines the use of scRNA-seq in AD research, focusing on its contributions to understanding cellular diversity, disease progression, and potential therapeutic targets. We discuss key technological innovations, data analysis techniques, and challenges associated with scRNA-seq in studying AD. Furthermore, we highlight recent studies that have utilized scRNA-seq to identify novel biomarkers, uncover disease-associated pathways, and elucidate the role of non-neuronal cells, such as microglia and astrocytes, in AD pathogenesis. By providing a comprehensive overview of advancements in scRNA-seq for unraveling cellular heterogeneity in AD, this review highlights the transformative impact of scRNA-seq on our comprehension of disease mechanisms and the creation of targeted treatments.
Collapse
Affiliation(s)
- Wengang Jin
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, 2011 QinLing-Bashan Mountains Bioresources Comprehensive Development C. I. C, Shaanxi Province Key Laboratory of Bio-Resources, College of Bioscience and Bioengineering, Shaanxi University of Technology, Hanzhong 723001, China
| | - JinJin Pei
- Qinba State Key Laboratory of Biological Resources and Ecological Environment, 2011 QinLing-Bashan Mountains Bioresources Comprehensive Development C. I. C, Shaanxi Province Key Laboratory of Bio-Resources, College of Bioscience and Bioengineering, Shaanxi University of Technology, Hanzhong 723001, China
| | - Jeane Rebecca Roy
- Department of Anatomy, Bhaarath Medical College and hospital, Bharath Institute of Higher Education and Research (BIHER), Chennai, Tamil Nadu 600073, India
| | - Selvaraj Jayaraman
- Centre of Molecular Medicine and Diagnostics (COMManD), Department of Biochemistry, Saveetha Dental College & Hospital, Saveetha Institute of Medical & Technical Sciences, Saveetha University, Chennai 600077, India
| | - Rathi Muthaiyan Ahalliya
- Department of Biochemistry and Cancer Research Centre, FASCM, Karpagam Academy of Higher Education, Coimbatore, Tamil Nadu 641021, India
| | - Gopalakrishnan Velliyur Kanniappan
- Center for Global Health Research, Saveetha Medical College & Hospital, Saveetha Institute of Medical and Technical Sciences (SIMATS), Thandalam, Chennai, Tamil Nadu 602105, India.
| | - Monica Mironescu
- Faculty of Agricultural Sciences Food Industry and Environmental Protection, Lucian Blaga University of Sibiu, Bv. Victoriei 10, Sibiu 550024, Romania.
| | - Chella Perumal Palanisamy
- Department of Chemical Technology, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand.
| |
Collapse
|
10
|
Biswas B, Kumar N, Sugimoto M, Hoque MA. scHD4E: Novel ensemble learning-based differential expression analysis method for single-cell RNA-sequencing data. Comput Biol Med 2024; 178:108769. [PMID: 38897145 DOI: 10.1016/j.compbiomed.2024.108769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/14/2024] [Accepted: 06/15/2024] [Indexed: 06/21/2024]
Abstract
Differential expression (DE) analysis between cell types for scRNA-seq data by capturing its complicated features is crucial. Recently, different methods have been developed for targeting the scRNA-seq data analysis based on different modeling frameworks, assumptions, strategies and test statistic in considering various data features. The scDEA is an ensemble learning-based DE analysis method developed recently, yielding p-values using Lancaster's combination, generated by 12 individual DE analysis methods, and producing more accurate and stable results than individual methods. The objective of our study is to propose a new ensemble learning-based DE analysis method, scHD4E, using top performers in only 4 separate methods. The top performer 4 methods have been selected through an evaluation process using six real scRNA-seq data sets. We conducted comprehensive experiments for five experimental data sets to evaluate our proposed method based on the sample size effects, batch effects, type I error control, gene ontology enrichment analysis, runtime, identified matched DE genes, and semantic similarity measurement between methods. We also perform similar analyses (except the last 3 terms) and compute performance measures like accuracy, F1 score, Mathew's correlation coefficient etc. for a simulated data set. The results show that scHD4E is performs better than all the individual and scDEA methods in all the above perspectives. We expect that scHD4E will serve the modern data scientists for detecting the DEGs in scRNA-seq data analysis. To implement our proposed method, a Github R package scHD4E and its shiny application has been developed, and available in the following links: https://github.com/bbiswas1989/scHD4E and https://github.com/bbiswas1989/scHD4E-Shiny.
Collapse
Affiliation(s)
- Biplab Biswas
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh; Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Nishith Kumar
- Department of Statistics, Faculty of Science, Bangabandhu Sheikh Mujibur Rahman Science & Technology University, Gopalganj, 8100, Bangladesh.
| | - Masahiro Sugimoto
- Institute for Advanced Biosciences, Keio University 246-2 Mizukami, Kakuganji, Tsuruoka, Yamagata, 997-0052, Japan.
| | - Md Aminul Hoque
- Department of Statistics, Faculty of Science, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
11
|
Wu CH, Zhou X, Chen M. The curses of performing differential expression analysis using single-cell data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.28.596315. [PMID: 38853843 PMCID: PMC11160624 DOI: 10.1101/2024.05.28.596315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Differential expression analysis is pivotal in single-cell transcriptomics for unraveling cell-type- specific responses to stimuli. While numerous methods are available to identify differentially expressed genes in single-cell data, recent evaluations of both single-cell-specific methods and methods adapted from bulk studies have revealed significant shortcomings in performance. In this paper, we dissect the four major challenges in single-cell DE analysis: normalization, excessive zeros, donor effects, and cumulative biases. These "curses" underscore the limitations and conceptual pitfalls in existing workflows. In response, we introduce a novel paradigm addressing several of these issues.
Collapse
|
12
|
Ozier-Lafontaine A, Fourneaux C, Durif G, Arsenteva P, Vallot C, Gandrillon O, Gonin-Giraud S, Michel B, Picard F. Kernel-based testing for single-cell differential analysis. Genome Biol 2024; 25:114. [PMID: 38702740 PMCID: PMC11069218 DOI: 10.1186/s13059-024-03255-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 04/22/2024] [Indexed: 05/06/2024] Open
Abstract
Single-cell technologies offer insights into molecular feature distributions, but comparing them poses challenges. We propose a kernel-testing framework for non-linear cell-wise distribution comparison, analyzing gene expression and epigenomic modifications. Our method allows feature-wise and global transcriptome/epigenome comparisons, revealing cell population heterogeneities. Using a classifier based on embedding variability, we identify transitions in cell states, overcoming limitations of traditional single-cell analysis. Applied to single-cell ChIP-Seq data, our approach identifies untreated breast cancer cells with an epigenomic profile resembling persister cells. This demonstrates the effectiveness of kernel testing in uncovering subtle population variations that might be missed by other methods.
Collapse
Affiliation(s)
- A Ozier-Lafontaine
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France.
| | - C Fourneaux
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - G Durif
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - P Arsenteva
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France
| | - C Vallot
- CNRS UMR3244, Institut Curie, PSL University, Paris, France
- Translational Research Department, Institut Curie, PSL University, Paris, France
| | - O Gandrillon
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - S Gonin-Giraud
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France
| | - B Michel
- Nantes Université, Centrale Nantes, Laboratoire de Mathématiques Jean Leray, CNRS UMR 6629, F-44000, Nantes, France.
| | - F Picard
- Laboratory of Biology and Modelling of the Cell, Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS, UMR5239, Université Claude Bernard Lyon 1, Lyon, France.
| |
Collapse
|
13
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532643. [PMID: 36993765 PMCID: PMC10055147 DOI: 10.1101/2023.03.14.532643] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Josh Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Arthur D. Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| |
Collapse
|
14
|
Shakola F, Palejev D, Ivanov I. A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022; 13:2362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open
Abstract
The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.
Collapse
Affiliation(s)
- Felitsiya Shakola
- GATE Institute, Sofia University, 125 Tsarigradsko Shosse, Bl. 2, 1113 Sofia, Bulgaria
| | - Dean Palejev
- Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Acad. G. Bonchev St., Bl. 8, 1113 Sofia, Bulgaria
| | - Ivan Ivanov
- Department of Veterinary Physiology and Pharmacology, Texas A&M University, College Station, TX 77843, USA
| |
Collapse
|
15
|
Sen Puliparambil B, Tomal JH, Yan Y. A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data. BIOLOGY 2022; 11:biology11101495. [PMID: 36290397 PMCID: PMC9598401 DOI: 10.3390/biology11101495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/21/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used.
Collapse
Affiliation(s)
- Bhavithry Sen Puliparambil
- Master of Science in Data Science Program, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
- Correspondence:
| | - Jabed H. Tomal
- Department of Mathematics and Statistics, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
| | - Yan Yan
- Department of Computing Science, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
| |
Collapse
|