1
|
Wang S, Li H, Zhang K, Wu H, Pang S, Wu W, Ye L, Su J, Zhang Y. scSID: A lightweight algorithm for identifying rare cell types by capturing differential expression from single-cell sequencing data. Comput Struct Biotechnol J 2024; 23:589-600. [PMID: 38274993 PMCID: PMC10809081 DOI: 10.1016/j.csbj.2023.12.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 12/27/2023] [Accepted: 12/27/2023] [Indexed: 01/27/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently an important technology for identifying cell types and studying diseases at the genetic level. Identifying rare cell types is biologically important as one of the downstream data analyses of single-cell RNA sequencing. Although rare cell identification methods have been developed, most of these suffer from insufficient mining of intercellular similarities, low scalability, and being time-consuming. In this paper, we propose a single-cell similarity division algorithm (scSID) for identifying rare cells. It takes cell-to-cell similarity into consideration by analyzing both inter-cluster and intra-cluster similarities, and discovers rare cell types based on the similarity differences. We show that scSID outperforms other existing methods by benchmarking it on different experimental datasets. Application of scSID to multiple datasets, including 68K PBMC and intestine, highlights its exceptional scalability and remarkable ability to identify rare cell populations.
Collapse
Affiliation(s)
- Shudong Wang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hengxiao Li
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Kuijie Zhang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Hao Wu
- College of Information Engineering, Northwest A&F University, 712100, Yangling, China
- School of Software, Shandong University, 250100, Jinan, China
| | - Shanchen Pang
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Wenhao Wu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266580, China
| | - Lan Ye
- Cancer Center, the Second Hospital of Shandong University, Jinan, 250033, China
| | - Jionglong Su
- School of AI and Advanced Computing, XJTLU Entrepreneur College (Taicang), Xi'an Jiaotong-Liverpool University, Suzhou, 215123, Jiangsu, China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, 266590, China
| |
Collapse
|
2
|
Gao Y, Dong K, Gao Y, Jin X, Yang J, Yan G, Liu Q. Unified cross-modality integration and analysis of T cell receptors and T cell transcriptomes by low-resource-aware representation learning. CELL GENOMICS 2024; 4:100553. [PMID: 38688285 PMCID: PMC11099349 DOI: 10.1016/j.xgen.2024.100553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/09/2024] [Accepted: 04/06/2024] [Indexed: 05/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) and T cell receptor sequencing (TCR-seq) are pivotal for investigating T cell heterogeneity. Integrating these modalities, which is expected to uncover profound insights in immunology that might otherwise go unnoticed with a single modality, faces computational challenges due to the low-resource characteristics of the multimodal data. Herein, we present UniTCR, a novel low-resource-aware multimodal representation learning framework designed for the unified cross-modality integration, enabling comprehensive T cell analysis. By designing a dual-modality contrastive learning module and a single-modality preservation module to effectively embed each modality into a common latent space, UniTCR demonstrates versatility in connecting TCR sequences with T cell transcriptomes across various tasks, including single-modality analysis, modality gap analysis, epitope-TCR binding prediction, and TCR profile cross-modality generation, in a low-resource-aware way. Extensive evaluations conducted on multiple scRNA-seq/TCR-seq paired datasets showed the superior performance of UniTCR, exhibiting the ability of exploring the complexity of immune system.
Collapse
Affiliation(s)
- Yicheng Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Kejing Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yuli Gao
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xuan Jin
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Jingya Yang
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| | - Gang Yan
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Tongji Hospital, School of Medicine, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; State Key Laboratory of Cardiology and Medical Innovation Center, Shanghai East Hospital, Frontier Science Center for Stem Cell Research, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China; Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China; Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou 311121, China.
| |
Collapse
|
3
|
Wang X, Duan M, Li J, Ma A, Xin G, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. Nat Commun 2024; 15:338. [PMID: 38184630 PMCID: PMC10771517 DOI: 10.1038/s41467-023-44570-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 12/14/2023] [Indexed: 01/08/2024] Open
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduce MarsGT: Multi-omics Analysis for Rare population inference using a Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperforms existing tools in identifying rare cells across 550 simulated and four real human datasets. In mouse retina data, it reveals unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detects an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identifies a rare MAIT-like population impacted by a high IFN-I response and reveals the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Gang Xin
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China.
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
4
|
Møller AF, Madsen JGS. JOINTLY: interpretable joint clustering of single-cell transcriptomes. Nat Commun 2023; 14:8473. [PMID: 38123569 PMCID: PMC10733431 DOI: 10.1038/s41467-023-44279-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) is increasingly being used to characterise the transcriptomic state of cell types at homeostasis, during development and in disease. However, this is a challenging task, as biological effects can be masked by technical variation. Here, we present JOINTLY, an algorithm enabling joint clustering of sxRNA-seq datasets across batches. JOINTLY performs on par or better than state-of-the-art batch integration methods in clustering tasks and outperforms other intrinsically interpretable methods. We demonstrate that JOINTLY is robust against over-correction while retaining subtle cell state differences between biological conditions and highlight how the interpretation of JOINTLY can be used to annotate cell types and identify active signalling programs across cell types and pseudo-time. Finally, we use JOINTLY to construct a reference atlas of white adipose tissue (WATLAS), an expandable and comprehensive community resource, in which we describe four adipocyte subpopulations and map compositional changes in obesity and between depots.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark
- Sino-Danish College (SDC), University of Chinese Academy of Sciences, Beijing, China
| | - Jesper Grud Skat Madsen
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark.
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M, 5230, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
5
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
6
|
Märtens K, Bortolomeazzi M, Montorsi L, Spencer J, Ciccarelli F, Yau C. Rarity: discovering rare cell populations from single-cell imaging data. Bioinformatics 2023; 39:btad750. [PMID: 38092048 PMCID: PMC10751233 DOI: 10.1093/bioinformatics/btad750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 11/24/2023] [Accepted: 12/11/2023] [Indexed: 12/28/2023] Open
Abstract
MOTIVATION Cell type identification plays an important role in the analysis and interpretation of single-cell data and can be carried out via supervised or unsupervised clustering approaches. Supervised methods are best suited where we can list all cell types and their respective marker genes a priori, while unsupervised clustering algorithms look for groups of cells with similar expression properties. This property permits the identification of both known and unknown cell populations, making unsupervised methods suitable for discovery. Success is dependent on the relative strength of the expression signature of each group as well as the number of cells. Rare cell types therefore present a particular challenge that is magnified when they are defined by differentially expressing a small number of genes. RESULTS Typical unsupervised approaches fail to identify such rare subpopulations, and these cells tend to be absorbed into more prevalent cell types. In order to balance these competing demands, we have developed a novel statistical framework for unsupervised clustering, named Rarity, that enables the discovery process for rare cell types to be more robust, consistent, and interpretable. We achieve this by devising a novel clustering method based on a Bayesian latent variable model in which we assign cells to inferred latent binary on/off expression profiles. This lets us achieve increased sensitivity to rare cell populations while also allowing us to control and interpret potential false positive discoveries. We systematically study the challenges associated with rare cell type identification and demonstrate the utility of Rarity on various IMC datasets. AVAILABILITY AND IMPLEMENTATION Implementation of Rarity together with examples is available from the Github repository (https://github.com/kasparmartens/rarity).
Collapse
Affiliation(s)
- Kaspar Märtens
- The Alan Turing Institute, London NW1 2DB, United Kingdom
| | - Michele Bortolomeazzi
- Francis Crick Institute, London NW1 1AT, United Kingdom
- King’s College London, London WC2R 2LS, United Kingdom
| | - Lucia Montorsi
- Francis Crick Institute, London NW1 1AT, United Kingdom
- King’s College London, London WC2R 2LS, United Kingdom
| | - Jo Spencer
- King’s College London, London WC2R 2LS, United Kingdom
| | - Francesca Ciccarelli
- Francis Crick Institute, London NW1 1AT, United Kingdom
- Bart’s Cancer Institute - Centre for Cancer Genomics & Computational Biology, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, United Kingdom
| | - Christopher Yau
- The Alan Turing Institute, London NW1 2DB, United Kingdom
- Nuffield Department for Women’s & Reproductive Health, University of Oxford, Women’s Centre (Level 3), John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| |
Collapse
|
7
|
Silkwood K, Dollinger E, Gervin J, Atwood S, Nie Q, Lander AD. Leveraging gene correlations in single cell transcriptomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532643. [PMID: 36993765 PMCID: PMC10055147 DOI: 10.1101/2023.03.14.532643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
BACKGROUND Many approaches have been developed to overcome technical noise in single cell RNA-sequencing (scRNAseq). As researchers dig deeper into data-looking for rare cell types, subtleties of cell states, and details of gene regulatory networks-there is a growing need for algorithms with controllable accuracy and fewer ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data when ground truth about biological variation is unknown (i.e., usually). RESULTS We approach this problem analytically, assuming that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We analyze scRNAseq data without normalization-a step that skews distributions, particularly for sparse data-and calculate p-values associated with key statistics. We develop an improved method for selecting features for cell clustering and identifying gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify thousands of correlations that, when clustered without supervision into gene communities, align with known cellular components and biological processes, and highlight potentially novel cell biological relationships. CONCLUSIONS New insights into functionally relevant gene regulatory networks can be obtained using a statistically grounded approach to the identification of gene-gene correlations.
Collapse
Affiliation(s)
- Kai Silkwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Emmanuel Dollinger
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Josh Gervin
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Scott Atwood
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| | - Qing Nie
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
- Department of Mathematics, University of California, Irvine, Irvine CA
| | - Arthur D. Lander
- Center for Complex Biological Systems, University of California, Irvine, Irvine CA
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine CA
| |
Collapse
|
8
|
Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023; 24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
Collapse
Affiliation(s)
- Tianyuan Lei
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Ruoyu Chen
- Moorestown High School, Moorestown, NJ 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, NJ 08028, USA
| |
Collapse
|
9
|
DeMeo B, Berger B. SCA: recovering single-cell heterogeneity through information-based dimensionality reduction. Genome Biol 2023; 24:195. [PMID: 37626411 PMCID: PMC10464206 DOI: 10.1186/s13059-023-02998-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 06/28/2023] [Indexed: 08/27/2023] Open
Abstract
Dimensionality reduction summarizes the complex transcriptomic landscape of single-cell datasets for downstream analyses. Current approaches favor large cellular populations defined by many genes, at the expense of smaller and more subtly defined populations. Here, we present surprisal component analysis (SCA), a technique that newly leverages the information-theoretic notion of surprisal for dimensionality reduction to promote more meaningful signal extraction. For example, SCA uncovers clinically important cytotoxic T-cell subpopulations that are indistinguishable using existing pipelines. We also demonstrate that SCA substantially improves downstream imputation. SCA's efficient information-theoretic paradigm has broad applications to the study of complex biological tissues in health and disease.
Collapse
Affiliation(s)
- Benjamin DeMeo
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA
- Department of Biomedical Informatics, Harvard University, Cambridge, 02138, MA, USA
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, 02139, MA, USA.
- Department of Mathematics, MIT, Cambridge, 02139, MA, USA.
| |
Collapse
|
10
|
Wang X, Duan M, Li J, Ma A, Xu D, Li Z, Liu B, Ma Q. MarsGT: Multi-omics analysis for rare population inference using single-cell graph transformer. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.15.553454. [PMID: 37645917 PMCID: PMC10462017 DOI: 10.1101/2023.08.15.553454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Rare cell populations are key in neoplastic progression and therapeutic response, offering potential intervention targets. However, their computational identification and analysis often lag behind major cell types. To fill this gap, we introduced MarsGT: Multi-omics Analysis for Rare population inference using Single-cell Graph Transformer. It identifies rare cell populations using a probability-based heterogeneous graph transformer on single-cell multi-omics data. MarsGT outperformed existing tools in identifying rare cells across 400 simulated and four real human datasets. In mouse retina data, it revealed unique subpopulations of rare bipolar cells and a Müller glia cell subpopulation. In human lymph node data, MarsGT detected an intermediate B cell population potentially acting as lymphoma precursors. In human melanoma data, it identified a rare MAIT-like population impacted by a high IFN-I response and revealed the mechanism of immunotherapy. Hence, MarsGT offers biological insights and suggests potential strategies for early detection and therapeutic intervention of disease.
Collapse
Affiliation(s)
- Xiaoying Wang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Maoteng Duan
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Jingxian Li
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Zihai Li
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, Shandong, 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA
- Pelotonia Institute for Immuno-Oncology, The James Comprehensive Cancer Center, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
11
|
Leary JR, Xu Y, Morrison AB, Jin C, Shen EC, Kuhlers PC, Su Y, Rashid NU, Yeh JJ, Peng XL. Sub-Cluster Identification through Semi-Supervised Optimization of Rare-Cell Silhouettes (SCISSORS) in single-cell RNA-sequencing. Bioinformatics 2023; 39:btad449. [PMID: 37498558 PMCID: PMC10412410 DOI: 10.1093/bioinformatics/btad449] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/30/2023] [Accepted: 07/25/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Single-cell RNA-sequencing (scRNA-seq) has enabled the molecular profiling of thousands to millions of cells simultaneously in biologically heterogenous samples. Currently, the common practice in scRNA-seq is to determine cell type labels through unsupervised clustering and the examination of cluster-specific genes. However, even small differences in analysis and parameter choosing can greatly alter clustering results and thus impose great influence on which cell types are identified. Existing methods largely focus on determining the optimal number of robust clusters, which can be problematic for identifying cells of extremely low abundance due to their subtle contributions toward overall patterns of gene expression. RESULTS Here, we present a carefully designed framework, SCISSORS, which accurately profiles subclusters within broad cluster(s) for the identification of rare cell types in scRNA-seq data. SCISSORS employs silhouette scoring for the estimation of heterogeneity of clusters and reveals rare cells in heterogenous clusters by a multi-step semi-supervised reclustering process. Additionally, SCISSORS provides a method for the identification of marker genes of high specificity to the cell type. SCISSORS is wrapped around the popular Seurat R package and can be easily integrated into existing Seurat pipelines. AVAILABILITY AND IMPLEMENTATION SCISSORS, including source code and vignettes, are freely available at https://github.com/jr-leary7/SCISSORS.
Collapse
Affiliation(s)
- Jack R Leary
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Biostatistics, University of Florida, Gainesville, FL 32603, United States
| | - Yi Xu
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Ashley B Morrison
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Chong Jin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Emily C Shen
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Peyton C Kuhlers
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Ye Su
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Naim U Rashid
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Jen Jen Yeh
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Surgery, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| | - Xianlu Laura Peng
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States
| |
Collapse
|
12
|
Tosoni G, Ayyildiz D, Bryois J, Macnair W, Fitzsimons CP, Lucassen PJ, Salta E. Mapping human adult hippocampal neurogenesis with single-cell transcriptomics: Reconciling controversy or fueling the debate? Neuron 2023; 111:1714-1731.e3. [PMID: 37015226 DOI: 10.1016/j.neuron.2023.03.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Revised: 02/06/2023] [Accepted: 03/08/2023] [Indexed: 04/05/2023]
Abstract
The notion of exploiting the regenerative potential of the human brain in physiological aging or neurological diseases represents a particularly attractive alternative to conventional strategies for enhancing or restoring brain function. However, a major first question to address is whether the human brain does possess the ability to regenerate. The existence of human adult hippocampal neurogenesis (AHN) has been at the center of a fierce scientific debate for many years. The advent of single-cell transcriptomic technologies was initially viewed as a panacea to resolving this controversy. However, recent single-cell RNA sequencing studies in the human hippocampus yielded conflicting results. Here, we critically discuss and re-analyze previously published AHN-related single-cell transcriptomic datasets. We argue that, although promising, the single-cell transcriptomic profiling of AHN in the human brain can be confounded by methodological, conceptual, and biological factors that need to be consistently addressed across studies and openly discussed within the scientific community.
Collapse
Affiliation(s)
- Giorgia Tosoni
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Dilara Ayyildiz
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands
| | - Julien Bryois
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Will Macnair
- Roche Pharma Research and Early Development, Neuroscience and Rare Diseases, Roche Innovation Center, CH-4070, Basel, Switzerland
| | - Carlos P Fitzsimons
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands
| | - Paul J Lucassen
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, 1098 XH, Amsterdam, the Netherlands; Center for Urban Mental Health, University of Amsterdam, 1098 SM, Amsterdam, the Netherlands
| | - Evgenia Salta
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, 1105 BA, Amsterdam, the Netherlands.
| |
Collapse
|
13
|
Lubatti G, Stock M, Iturbide A, Ruiz Tejada Segura ML, Riepl M, Tyser RCV, Danese A, Colomé-Tatché M, Theis FJ, Srinivas S, Torres-Padilla ME, Scialdone A. CIARA: a cluster-independent algorithm for identifying markers of rare cell types from single-cell sequencing data. Development 2023; 150:dev201264. [PMID: 37294170 DOI: 10.1242/dev.201264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/25/2023] [Indexed: 05/18/2023]
Abstract
A powerful feature of single-cell genomics is the possibility of identifying cell types from their molecular profiles. In particular, identifying novel rare cell types and their marker genes is a key potential of single-cell RNA sequencing. Standard clustering approaches perform well in identifying relatively abundant cell types, but tend to miss rarer cell types. Here, we have developed CIARA (Cluster Independent Algorithm for the identification of markers of RAre cell types), a cluster-independent computational tool designed to select genes that are likely to be markers of rare cell types. Genes selected by CIARA are subsequently integrated with common clustering algorithms to single out groups of rare cell types. CIARA outperforms existing methods for rare cell type detection, and we use it to find previously uncharacterized rare populations of cells in a human gastrula and among mouse embryonic stem cells treated with retinoic acid. Moreover, CIARA can be applied more generally to any type of single-cell omic data, thus allowing the identification of rare cells across multiple data modalities. We provide implementations of CIARA in user-friendly packages available in R and Python.
Collapse
Affiliation(s)
- Gabriele Lubatti
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Marco Stock
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, D-85354 Freising, Germany
| | - Ane Iturbide
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
| | - Mayra L Ruiz Tejada Segura
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Melina Riepl
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| | - Richard C V Tyser
- Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Cambridge CB2 0AW, UK
| | - Anna Danese
- Biomedical Center Munich (BMC), Physiological Genomics, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Maria Colomé-Tatché
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Biomedical Center (BMC), Physiological Chemistry, Faculty of Medicine, Ludwig Maximilians University, D-82152 Munich, Germany
| | - Fabian J Theis
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
- Department of Mathematics, Technical University of Munich, D-85748 Munich, Germany
| | - Shankar Srinivas
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, UK
| | - Maria-Elena Torres-Padilla
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Faculty of Biology, Ludwig-Maximilians University, D-82152 Munich, Germany
| | - Antonio Scialdone
- Institute of Epigenetics and Stem Cells, Helmholtz Munich, D-81377 Munich, Germany
- Institute of Functional Epigenetics, Helmholtz Munich, D-85764 Neuherberg, Germany
- Institute of Computational Biology, Helmholtz Munich, D-85764 Neuherberg, Germany
| |
Collapse
|
14
|
Luo J, Wu X, Cheng Y, Chen G, Wang J, Song X. Expression quantitative trait locus studies in the era of single-cell omics. Front Genet 2023; 14:1182579. [PMID: 37284065 PMCID: PMC10239882 DOI: 10.3389/fgene.2023.1182579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Accepted: 04/26/2023] [Indexed: 06/08/2023] Open
Abstract
Genome-wide association studies have revealed that the regulation of gene expression bridges genetic variants and complex phenotypes. Profiling of the bulk transcriptome coupled with linkage analysis (expression quantitative trait locus (eQTL) mapping) has advanced our understanding of the relationship between genetic variants and gene regulation in the context of complex phenotypes. However, bulk transcriptomics has inherited limitations as the regulation of gene expression tends to be cell-type-specific. The advent of single-cell RNA-seq technology now enables the identification of the cell-type-specific regulation of gene expression through a single-cell eQTL (sc-eQTL). In this review, we first provide an overview of sc-eQTL studies, including data processing and the mapping procedure of the sc-eQTL. We then discuss the benefits and limitations of sc-eQTL analyses. Finally, we present an overview of the current and future applications of sc-eQTL discoveries.
Collapse
Affiliation(s)
- Jie Luo
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xinyi Wu
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Yuan Cheng
- Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Guang Chen
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Jian Wang
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| | - Xijiao Song
- State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
| |
Collapse
|
15
|
Cheng Y, Fan X, Zhang J, Li Y. A scalable sparse neural network framework for rare cell type annotation of single-cell transcriptome data. Commun Biol 2023; 6:545. [PMID: 37210444 DOI: 10.1038/s42003-023-04928-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/11/2023] [Indexed: 05/22/2023] Open
Abstract
Automatic cell type annotation methods are increasingly used in single-cell RNA sequencing (scRNA-seq) analysis due to their fast and precise advantages. However, current methods often fail to account for the imbalance of scRNA-seq datasets and ignore information from smaller populations, leading to significant biological analysis errors. Here, we introduce scBalance, an integrated sparse neural network framework that incorporates adaptive weight sampling and dropout techniques for auto-annotation tasks. Using 20 scRNA-seq datasets with varying scales and degrees of imbalance, we demonstrate that scBalance outperforms current methods in both intra- and inter-dataset annotation tasks. Additionally, scBalance displays impressive scalability in identifying rare cell types in million-level datasets, as shown in the bronchoalveolar cell landscape. scBalance is also significantly faster than commonly used tools and comes in a user-friendly format, making it a superior tool for scRNA-seq analysis on the Python-based platform.
Collapse
Affiliation(s)
- Yuqi Cheng
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| | - Xingyu Fan
- School of Information and Software Engineering, University of Electronic Science and Technology of China, 610054, Chengdu, China
| | - Jianing Zhang
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering (CSE), The Chinese University of Hong Kong (CUHK), Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Hi-Tech Park, Nanshan, 518057, Shenzhen, China.
| |
Collapse
|
16
|
Advances in Mass Spectrometry-Based Single Cell Analysis. BIOLOGY 2023; 12:biology12030395. [PMID: 36979087 PMCID: PMC10045136 DOI: 10.3390/biology12030395] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 02/27/2023] [Accepted: 03/01/2023] [Indexed: 03/06/2023]
Abstract
Technological developments and improvements in single-cell isolation and analytical platforms allow for advanced molecular profiling at the single-cell level, which reveals cell-to-cell variation within the admixture cells in complex biological or clinical systems. This helps to understand the cellular heterogeneity of normal or diseased tissues and organs. However, most studies focused on the analysis of nucleic acids (e.g., DNA and RNA) and mass spectrometry (MS)-based analysis for proteins and metabolites of a single cell lagged until recently. Undoubtedly, MS-based single-cell analysis will provide a deeper insight into cellular mechanisms related to health and disease. This review summarizes recent advances in MS-based single-cell analysis methods and their applications in biology and medicine.
Collapse
|
17
|
Salta E, Lazarov O, Fitzsimons CP, Tanzi R, Lucassen PJ, Choi SH. Adult hippocampal neurogenesis in Alzheimer's disease: A roadmap to clinical relevance. Cell Stem Cell 2023; 30:120-136. [PMID: 36736288 PMCID: PMC10082636 DOI: 10.1016/j.stem.2023.01.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 02/05/2023]
Abstract
Adult hippocampal neurogenesis (AHN) drops sharply during early stages of Alzheimer's disease (AD), via unknown mechanisms, and correlates with cognitive status in AD patients. Understanding AHN regulation in AD could provide a framework for innovative pharmacological interventions. We here combine molecular, behavioral, and clinical data and critically discuss the multicellular complexity of the AHN niche in relation to AD pathophysiology. We further present a roadmap toward a better understanding of the role of AHN in AD by probing the promises and caveats of the latest technological advancements in the field and addressing the conceptual and methodological challenges ahead.
Collapse
Affiliation(s)
- Evgenia Salta
- Laboratory of Neurogenesis and Neurodegeneration, Netherlands Institute for Neuroscience, Meibergdreef 47, 1105 BA, Amsterdam, The Netherlands
| | - Orly Lazarov
- Department of Anatomy and Cell Biology, College of Medicine, The University of Illinois at Chicago, 808 S Wood St., Chicago, IL 60612, USA
| | - Carlos P Fitzsimons
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands
| | - Rudolph Tanzi
- Genetics and Aging Research Unit, MassGeneral Institute for Neurodegenerative Disease, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, McCance Center for Brain Health, 114 16th Street, Boston, MA 02129, USA.
| | - Paul J Lucassen
- Brain Plasticity group, Swammerdam Institute for Life Sciences, Faculty of Science, University of Amsterdam, Science Park 904, 1098 XH, Amsterdam, The Netherlands; Center for Urban Mental Health, University of Amsterdam, Kruislaan 404, 1098 SM, Amsterdam, The Netherlands.
| | - Se Hoon Choi
- Genetics and Aging Research Unit, MassGeneral Institute for Neurodegenerative Disease, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, McCance Center for Brain Health, 114 16th Street, Boston, MA 02129, USA.
| |
Collapse
|
18
|
Watson ER, Mora A, Taherian Fard A, Mar JC. How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data. Brief Bioinform 2022; 23:6712300. [PMID: 36151725 PMCID: PMC9677483 DOI: 10.1093/bib/bbac387] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
Collapse
Affiliation(s)
- Ebony Rose Watson
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Atefeh Taherian Fard
- Corresponding authors. Jessica Cara Mar, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +614 90 733 703; E-mail: ; Atefeh Taherian Fard, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +61 7 3346 3894; E-mail:
| | - Jessica Cara Mar
- Corresponding authors. Jessica Cara Mar, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +614 90 733 703; E-mail: ; Atefeh Taherian Fard, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +61 7 3346 3894; E-mail:
| |
Collapse
|
19
|
Dhandapani R, Neri M, Bernhard M, Brzak I, Schweizer T, Rudin S, Joller S, Berth R, Kernen J, Neuhaus A, Waldt A, Cuttat R, Naumann U, Keller CG, Roma G, Feuerbach D, Shimshek DR, Neumann U, Gasparini F, Galimberti I. Sustained Trem2 stabilization accelerates microglia heterogeneity and Aβ pathology in a mouse model of Alzheimer's disease. Cell Rep 2022; 39:110883. [PMID: 35649351 DOI: 10.1016/j.celrep.2022.110883] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 01/08/2022] [Accepted: 05/06/2022] [Indexed: 11/03/2022] Open
Abstract
TREM2 is a transmembrane protein expressed exclusively in microglia in the brain that regulates inflammatory responses to pathological conditions. Proteolytic cleavage of membrane TREM2 affects microglial function and is associated with Alzheimer's disease, but the consequence of reduced TREM2 proteolytic cleavage has not been determined. Here, we generate a transgenic mouse model of reduced Trem2 shedding (Trem2-Ile-Pro-Asp [IPD]) through amino-acid substitution of an ADAM-protease recognition site. We show that Trem2-IPD mice display increased Trem2 cell-surface-receptor load, survival, and function in myeloid cells. Using single-cell transcriptomic profiling of mouse cortex, we show that sustained Trem2 stabilization induces a shift of fate in microglial maturation and accelerates microglial responses to Aβ pathology in a mouse model of Alzheimer's disease. Our data indicate that reduction of Trem2 proteolytic cleavage aggravates neuroinflammation during the course of Alzheimer's disease pathology, suggesting that TREM2 shedding is a critical regulator of microglial activity in pathological states.
Collapse
Affiliation(s)
- Rahul Dhandapani
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Marilisa Neri
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Mario Bernhard
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Irena Brzak
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Tatjana Schweizer
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Stefan Rudin
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Stefanie Joller
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Ramon Berth
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Jasmin Kernen
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Anna Neuhaus
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Annick Waldt
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Rachel Cuttat
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Ulrike Naumann
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Caroline Gubser Keller
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Guglielmo Roma
- Chemical Biology and Therapeutics, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Dominik Feuerbach
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Derya R Shimshek
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Ulf Neumann
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Fabrizio Gasparini
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland
| | - Ivan Galimberti
- Department of Neuroscience, Novartis Institutes for Biomedical Research, 4056 Basel, Switzerland.
| |
Collapse
|
20
|
Hou R, Huang Y. Genomic sequences and RNA binding proteins predict RNA splicing efficiency in various single-cell contexts. Bioinformatics 2022; 38:3231-3237. [PMID: 35552604 DOI: 10.1093/bioinformatics/btac321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 05/03/2022] [Accepted: 05/09/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The RNA splicing efficiency is of high interest for both understanding the regulatory machinery of gene expression and estimating the RNA velocity in single cells. However, its genomic regulation and stochasticity across contexts remain poorly understood. RESULTS Here, by leveraging the recent RNA velocity tool, we estimated the relative splicing efficiency across a variety of single-cell RNA-Seq data sets. We further extracted large sets of genomic features and 120 RNA binding protein features and found they are highly predictive to relative RNA splicing efficiency across multiple tissues and organs on human and mouse. This predictive power brings promise to reveal the complexity of RNA processing and to enhance the analysis of single-cell transcription activities. AVAILABILITY AND IMPLEMENTATION In order to ensure reproducibility, all preprocessed data sets and scripts used for the prediction and figure generation are publicly available at https://doi.org/10.5281/zenodo.6513669. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruiyan Hou
- School of Biomedical Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Yuanghua Huang
- School of Biomedical Sciences, University of Hong Kong, Hong Kong SAR, China.,Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
21
|
Lall S, Ray S, Bandyopadhyay S. A copula based topology preserving graph convolution network for clustering of single-cell RNA-seq data. PLoS Comput Biol 2022; 18:e1009600. [PMID: 35271564 PMCID: PMC8979455 DOI: 10.1371/journal.pcbi.1009600] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 04/04/2022] [Accepted: 01/27/2022] [Indexed: 11/18/2022] Open
Abstract
Annotation of cells in single-cell clustering requires a homogeneous grouping of cell populations. There are various issues in single cell sequencing that effect homogeneous grouping (clustering) of cells, such as small amount of starting RNA, limited per-cell sequenced reads, cell-to-cell variability due to cell-cycle, cellular morphology, and variable reagent concentrations. Moreover, single cell data is susceptible to technical noise, which affects the quality of genes (or features) selected/extracted prior to clustering. Here we introduce sc-CGconv (copula based graph convolution network for single clustering), a stepwise robust unsupervised feature extraction and clustering approach that formulates and aggregates cell–cell relationships using copula correlation (Ccor), followed by a graph convolution network based clustering approach. sc-CGconv formulates a cell-cell graph using Ccor that is learned by a graph-based artificial intelligence model, graph convolution network. The learned representation (low dimensional embedding) is utilized for cell clustering. sc-CGconv features the following advantages. a. sc-CGconv works with substantially smaller sample sizes to identify homogeneous clusters. b. sc-CGconv can model the expression co-variability of a large number of genes, thereby outperforming state-of-the-art gene selection/extraction methods for clustering. c. sc-CGconv preserves the cell-to-cell variability within the selected gene set by constructing a cell-cell graph through copula correlation measure. d. sc-CGconv provides a topology-preserving embedding of cells in low dimensional space. One of the important aspects of single cell downstream analysis is to classify cells into subpopulations. This immediately leads to clustering of cells into homogeneous groups, which faces lots of issues due to (i) small amount of starting RNA, (ii) cell-to-cell variability, (iii) technical noise incorporated within the single cell sequencing technology, and (iv) unavailability of discriminating selected/extracted genes (features) in the preprocessing step of downstream analysis. We proposed sc-CGconv, stepwise feature extraction and clustering framework, which leverage landmark advantage of copula and graph convolution network in single-cell analysis domain. sc-CGconv outperforms the state-of-the-art feature selection/extraction methods in the preprocessing steps, performs well with small sample size data, can preserve the cell-to-cell variability within the extracted features, provides a topology-preserving embedding of cells in low dimensional space. sc-CGconv therefore successfully addresses the above-mentioned key challenges.
Collapse
Affiliation(s)
- Snehalika Lall
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | - Sumanta Ray
- Department of Computer Science and Engineering, Aliah University, Kolkata, India
- Health Analytics Network, Pittsburgh, Pennsylvania, United States of America
- * E-mail: , (SR); (SB)
| | | |
Collapse
|
22
|
Wang CY, Gao YL, Liu JX, Kong XZ, Zheng CH. Single-Cell RNA Sequencing Data Clustering by Low-Rank Subspace Ensemble Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1154-1164. [PMID: 33026977 DOI: 10.1109/tcbb.2020.3029187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapid development of single-cell RNA sequencing (scRNA-seq)technology reveals the gene expression status and gene structure of individual cells, reflecting the heterogeneity and diversity of cells. The traditional methods of scRNA-seq data analysis treat data as the same subspace, and hide structural information in other subspaces. In this paper, we propose a low-rank subspace ensemble clustering framework (LRSEC)to analyze scRNA-seq data. Assuming that the scRNA-seq data exist in multiple subspaces, the low-rank model is used to find the lowest rank representation of the data in the subspace. It is worth noting that the penalty factor of the low-rank kernel function is uncertain, and different penalty factors correspond to different low-rank structures. Moreover, the single cluster model is difficult to find the cellular structure of all datasets. To strengthen the correlation between model solutions, we construct a new ensemble clustering framework LRSEC by using the low-rank model as the basic learner. The LRSEC framework captures the global structure of data through low-rank subspaces, which has better clustering performance than a single clustering model. We validate the performance of the LRSEC framework on seven small datasets and one large dataset and obtain satisfactory results.
Collapse
|
23
|
Chen Y, Zhang Y, Li JYH, Ouyang Z. LISA2: Learning Complex Single-Cell Trajectory and Expression Trends. Front Genet 2021; 12:681206. [PMID: 34512717 PMCID: PMC8428276 DOI: 10.3389/fgene.2021.681206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 06/01/2021] [Indexed: 12/20/2022] Open
Abstract
Single-cell transcriptional and epigenomics profiles have been applied in a variety of tissues and diseases for discovering new cell types, differentiation trajectories, and gene regulatory networks. Many methods such as Monocle 2/3, URD, and STREAM have been developed for tree-based trajectory building. Here, we propose a fast and flexible trajectory learning method, LISA2, for single-cell data analysis. This new method has two distinctive features: (1) LISA2 utilizes specified leaves and root to reduce the complexity for building the developmental trajectory, especially for some special cases such as rare cell populations and adjacent terminal cell states; and (2) LISA2 is applicable for both transcriptomics and epigenomics data. LISA2 visualizes complex trajectories using 3D Landmark ISOmetric feature MAPping (L-ISOMAP). We apply LISA2 to simulation and real datasets in cerebellum, diencephalon, and hematopoietic stem cells including both single-cell transcriptomics data and single-cell assay for transposase-accessible chromatin data. LISA2 is efficient in estimating single-cell trajectory and expression trends for different kinds of molecular state of cells.
Collapse
Affiliation(s)
- Yang Chen
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| | - Yuping Zhang
- Department of Statistics, University of Connecticut, Storrs, CT, United States
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
| | - James Y. H. Li
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut, Farmington, CT, United States
| | - Zhengqing Ouyang
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
24
|
Fa B, Wei T, Zhou Y, Johnston L, Yuan X, Ma Y, Zhang Y, Yu Z. GapClust is a light-weight approach distinguishing rare cells from voluminous single cell expression profiles. Nat Commun 2021; 12:4197. [PMID: 34234139 PMCID: PMC8263561 DOI: 10.1038/s41467-021-24489-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 06/02/2021] [Indexed: 01/07/2023] Open
Abstract
Single cell RNA sequencing (scRNA-seq) is a powerful tool in detailing the cellular landscape within complex tissues. Large-scale single cell transcriptomics provide both opportunities and challenges for identifying rare cells playing crucial roles in development and disease. Here, we develop GapClust, a light-weight algorithm to detect rare cell types from ultra-large scRNA-seq datasets with state-of-the-art speed and memory efficiency. Benchmarking on diverse experimental datasets demonstrates the superior performance of GapClust compared to other recently proposed methods. When applying our algorithm to an intestine and 68 k PBMC datasets, GapClust identifies the tuft cells and a previously unrecognised subtype of monocyte, respectively.
Collapse
Affiliation(s)
- Botao Fa
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Ting Wei
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yuan Zhou
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Luke Johnston
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Xin Yuan
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yanran Ma
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Yue Zhang
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Zhangsheng Yu
- Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
- SJTU-Yale Joint Centre for Biostatistics, Shanghai Jiao Tong University, Shanghai, China.
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China.
- Clinical Research Institute, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
| |
Collapse
|
25
|
Marín-Sedeño E, de Morentin XM, Pérez-Pomares JM, Gómez-Cabrero D, Ruiz-Villalba A. Understanding the Adult Mammalian Heart at Single-Cell RNA-Seq Resolution. Front Cell Dev Biol 2021; 9:645276. [PMID: 34055776 PMCID: PMC8149764 DOI: 10.3389/fcell.2021.645276] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 04/09/2021] [Indexed: 12/24/2022] Open
Abstract
During the last decade, extensive efforts have been made to comprehend cardiac cell genetic and functional diversity. Such knowledge allows for the definition of the cardiac cellular interactome as a reasonable strategy to increase our understanding of the normal and pathologic heart. Previous experimental approaches including cell lineage tracing, flow cytometry, and bulk RNA-Seq have often tackled the analysis of cardiac cell diversity as based on the assumption that cell types can be identified by the expression of a single gene. More recently, however, the emergence of single-cell RNA-Seq technology has led us to explore the diversity of individual cells, enabling the cardiovascular research community to redefine cardiac cell subpopulations and identify relevant ones, and even novel cell types, through their cell-specific transcriptomic signatures in an unbiased manner. These findings are changing our understanding of cell composition and in consequence the identification of potential therapeutic targets for different cardiac diseases. In this review, we provide an overview of the continuously changing cardiac cellular landscape, traveling from the pre-single-cell RNA-Seq times to the single cell-RNA-Seq revolution, and discuss the utilities and limitations of this technology.
Collapse
Affiliation(s)
- Ernesto Marín-Sedeño
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| | - Xabier Martínez de Morentin
- Traslational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra, Instituto de Investigación Sanitaria de Navarra (IdiSNA), Universidad Pública de Navarra, Pamplona, Spain
| | - Jose M. Pérez-Pomares
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| | - David Gómez-Cabrero
- Traslational Bioinformatics Unit, Navarrabiomed, Complejo Hospitalario de Navarra, Instituto de Investigación Sanitaria de Navarra (IdiSNA), Universidad Pública de Navarra, Pamplona, Spain
- Centre of Host-Microbiome Interactions, Faculty of Dentistry, Oral & Craniofacial Sciences, King’s College London, London, United Kingdom
- Biological and Environmental Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Adrián Ruiz-Villalba
- Department of Animal Biology, Faculty of Sciences, Instituto Malagueño de Biomedicina, University of Málaga, Málaga, Spain
- BIONAND, Centro Andaluz de Nanomedicina y Biotecnología, Junta de Andalucía, Universidad de Málaga, Málaga, Spain
| |
Collapse
|
26
|
Zhang C, Gao L, Wang B, Gao Y. Improving Single-Cell RNA-seq Clustering by Integrating Pathways. Brief Bioinform 2021; 22:6262246. [PMID: 33940590 DOI: 10.1093/bib/bbab147] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 03/21/2021] [Accepted: 03/26/2021] [Indexed: 01/03/2023] Open
Abstract
Single-cell clustering is an important part of analyzing single-cell RNA-sequencing data. However, the accuracy and robustness of existing methods are disturbed by noise. One promising approach for addressing this challenge is integrating pathway information, which can alleviate noise and improve performance. In this work, we studied the impact on accuracy and robustness of existing single-cell clustering methods by integrating pathways. We collected 10 state-of-the-art single-cell clustering methods, 26 scRNA-seq datasets and four pathway databases, combined the AUCell method and the similarity network fusion to integrate pathway data and scRNA-seq data, and introduced three accuracy indicators, three noise generation strategies and robustness indicators. Experiments on this framework showed that integrating pathways can significantly improve the accuracy and robustness of most single-cell clustering methods.
Collapse
Affiliation(s)
- Chenxing Zhang
- Computer Science and Technology at Xidian University, Xi'an 710071, China
| | - Lin Gao
- School of Computer Science and Technology at Xidian University, Xi'an 710071, China
| | - Bingbo Wang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China
| | - Yong Gao
- Computer Science at the University of British Columbia Okanagan (UBC Okanagan), Canada
| |
Collapse
|
27
|
Liang S, Mohanty V, Dou J, Miao Q, Huang Y, Müftüoğlu M, Ding L, Peng W, Chen K. Single-cell manifold-preserving feature selection for detecting rare cell populations. NATURE COMPUTATIONAL SCIENCE 2021; 1:374-384. [PMID: 36969355 PMCID: PMC10035340 DOI: 10.1038/s43588-021-00070-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 04/19/2021] [Indexed: 01/04/2023]
Abstract
A key challenge in studying organisms and diseases is to detect rare molecular programs and rare cell populations (RCPs) that drive development, differentiation, and transformation. Molecular features such as genes and proteins defining RCPs are often unknown and difficult to detect from unenriched single-cell data, using conventional dimensionality reduction and clustering-based approaches. Here, we propose an unsupervised approach, SCMER (Single-Cell Manifold presERving feature selection), which selects a compact set of molecular features with definitive meanings that preserve the manifold of the data. We applied SCMER in the context of hematopoiesis, lymphogenesis, tumorigenesis, and drug resistance and response. We found that SCMER can identify non-redundant features that sensitively delineate both common cell lineages and rare cellular states. SCMER can be used for discovering molecular features in a high dimensional dataset, designing targeted, cost-effective assays for clinical applications, and facilitating multi-modality integration.
Collapse
Affiliation(s)
- Shaoheng Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
- Department of Computer Science, Rice University, Houston, Texas, 77005, USA
| | - Vakul Mohanty
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Jinzhuang Dou
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Qi Miao
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, 77030, USA
| | - Yuefan Huang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston (UTHealth), Houston, Texas, 77030, USA
| | - Muharrem Müftüoğlu
- Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| | - Li Ding
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, 63108
| | - Weiyi Peng
- Department of Biology and Biochemistry, University of Houston, Houston, Texas, 77024
| | - Ken Chen
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas, 77030, USA
| |
Collapse
|
28
|
Xie K, Huang Y, Zeng F, Liu Z, Chen T. scAIDE: clustering of large-scale single-cell RNA-seq data reveals putative and rare cell types. NAR Genom Bioinform 2020; 2:lqaa082. [PMID: 33575628 PMCID: PMC7671411 DOI: 10.1093/nargab/lqaa082] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Revised: 08/20/2020] [Accepted: 09/18/2020] [Indexed: 02/07/2023] Open
Abstract
Recent advancements in both single-cell RNA-sequencing technology and computational resources facilitate the study of cell types on global populations. Up to millions of cells can now be sequenced in one experiment; thus, accurate and efficient computational methods are needed to provide clustering and post-analysis of assigning putative and rare cell types. Here, we present a novel unsupervised deep learning clustering framework that is robust and highly scalable. To overcome the high level of noise, scAIDE first incorporates an autoencoder-imputation network with a distance-preserved embedding network (AIDE) to learn a good representation of data, and then applies a random projection hashing based k-means algorithm to accommodate the detection of rare cell types. We analyzed a 1.3 million neural cell dataset within 30 min, obtaining 64 clusters which were mapped to 19 putative cell types. In particular, we further identified three different neural stem cell developmental trajectories in these clusters. We also classified two subpopulations of malignant cells in a small glioblastoma dataset using scAIDE. We anticipate that scAIDE would provide a more in-depth understanding of cell development and diseases.
Collapse
Affiliation(s)
- Kaikun Xie
- Institute for Artificial Intelligence, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Tsinghua-Fuzhou Institute of Digital Technology, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yu Huang
- Institute for Artificial Intelligence, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Tsinghua-Fuzhou Institute of Digital Technology, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Feng Zeng
- Department of Automation, Xiamen University, Xiamen 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China
| | - Zehua Liu
- Center for Computational and Integrative Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Department of Molecular Biology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Ting Chen
- Institute for Artificial Intelligence, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Tsinghua-Fuzhou Institute of Digital Technology, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
29
|
Zhang AW, Campbell KR. Computational modelling in single-cell cancer genomics: methods and future directions. Phys Biol 2020; 17:061001. [DOI: 10.1088/1478-3975/abacfe] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
30
|
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joshua Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| | - Sarah K. Nyquist
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alex K. Shalek
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bryan D. Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
31
|
Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019; 20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Dimensionality reduction is an indispensable analytic component for many areas of single-cell RNA sequencing (scRNA-seq) data analysis. Proper dimensionality reduction can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of dimensionality reduction in scRNA-seq analysis and the vast number of dimensionality reduction methods developed for scRNA-seq studies, few comprehensive comparison studies have been performed to evaluate the effectiveness of different dimensionality reduction methods in scRNA-seq. RESULTS We aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used dimensionality reduction methods for scRNA-seq studies. Specifically, we compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample sizes. We evaluate the performance of different dimensionality reduction methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluate the computational scalability of different dimensionality reduction methods by recording their computational cost. CONCLUSIONS Based on the comprehensive evaluation results, we provide important guidelines for choosing dimensionality reduction methods for scRNA-seq data analysis. We also provide all analysis scripts used in the present study at www.xzlab.org/reproduce.html.
Collapse
Affiliation(s)
- Shiquan Sun
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, People's Republic of China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jiaqiang Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|