1
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
2
|
Li W, Mirone J, Prasad A, Miolane N, Legrand C, Dao Duc K. Orthogonal outlier detection and dimension estimation for improved MDS embedding of biological datasets. FRONTIERS IN BIOINFORMATICS 2023; 3:1211819. [PMID: 37637212 PMCID: PMC10448701 DOI: 10.3389/fbinf.2023.1211819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 07/26/2023] [Indexed: 08/29/2023] Open
Abstract
Conventional dimensionality reduction methods like Multidimensional Scaling (MDS) are sensitive to the presence of orthogonal outliers, leading to significant defects in the embedding. We introduce a robust MDS method, called DeCOr-MDS (Detection and Correction of Orthogonal outliers using MDS), based on the geometry and statistics of simplices formed by data points, that allows to detect orthogonal outliers and subsequently reduce dimensionality. We validate our methods using synthetic datasets, and further show how it can be applied to a variety of large real biological datasets, including cancer image cell data, human microbiome project data and single cell RNA sequencing data, to address the task of data cleaning and visualization.
Collapse
Affiliation(s)
- Wanxin Li
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
| | - Jules Mirone
- Department of Mathematics, University of British Columbia, Vancouver, BC, Canada
- Centre de Mathématiques Appliquées, Ecole Polytechnique, Palaiseau, France
| | - Ashok Prasad
- Department of Chemical and Biological Engineering, School of Biomedical Engineering, Colorado State University, Fort Collins, CO, United States
| | - Nina Miolane
- Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Carine Legrand
- Université Paris Cité, Génomes, biologie cellulaire et thérapeutique U944, INSERM, CNRS, Paris, France
| | - Khanh Dao Duc
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
- Department of Mathematics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
3
|
Nassiri I, Fairfax B, Lee A, Wu Y, Buck D, Piazza P. scQCEA: a framework for annotation and quality control report of single-cell RNA-sequencing data. BMC Genomics 2023; 24:381. [PMID: 37415108 DOI: 10.1186/s12864-023-09447-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 06/13/2023] [Indexed: 07/08/2023] Open
Abstract
BACKGROUND Systematic description of library quality and sequencing performance of single-cell RNA sequencing (scRNA-seq) data is imperative for subsequent downstream modules, including re-pooling libraries. While several packages have been developed to visualise quality control (QC) metrics for scRNA-seq data, they do not include expression-based QC to discriminate between true variation and background noise. RESULTS We present scQCEA (acronym of the single-cell RNA sequencing Quality Control and Enrichment Analysis), an R package to generate reports of process optimisation metrics for comparing sets of samples and visual evaluation of quality scores. scQCEA can import data from 10X or other single-cell platforms and includes functions for generating an interactive report of QC metrics for multi-omics data. In addition, scQCEA provides automated cell type annotation on scRNA-seq data using differential gene expression patterns for expression-based quality control. We provide a repository of reference gene sets, including 2348 marker genes, which are exclusively expressed in 95 human and mouse cell types. Using scRNA-seq data from 56 gene expressions and V(D)J T cell replicates, we show how scQCEA can be applied for the visual evaluation of quality scores for sets of samples. In addition, we use the summary of QC measures from 342 human and mouse shallow-sequenced gene expression profiles to specify optimal sequencing requirements to run a cell-type enrichment analysis function. CONCLUSIONS The open-source R tool will allow examining biases and outliers over biological and technical measures, and objective selection of optimal cluster numbers before downstream analysis. scQCEA is available at https://isarnassiri.github.io/scQCEA/ as an R package. Full documentation, including an example, is provided on the package website.
Collapse
Affiliation(s)
- Isar Nassiri
- Oxford Genomics Centre, Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| | - Benjamin Fairfax
- MRC-Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK
- Department of Oncology, University of Oxford & Oxford Cancer Centre, Churchill Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Angela Lee
- Oxford Genomics Centre, Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Yanxia Wu
- Oxford Genomics Centre, Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - David Buck
- Oxford Genomics Centre, Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Paolo Piazza
- Oxford Genomics Centre, Nuffield Department of Medicine, Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK.
| |
Collapse
|
4
|
Cell Type Annotation Model Selection: General-Purpose vs. Pattern-Aware Feature Gene Selection in Single-Cell RNA-Seq Data. Genes (Basel) 2023; 14:genes14030596. [PMID: 36980868 PMCID: PMC10048047 DOI: 10.3390/genes14030596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 02/22/2023] [Accepted: 02/24/2023] [Indexed: 03/03/2023] Open
Abstract
With the advances in high-throughput sequencing technology, an increasing amount of research in revealing heterogeneity among cells has been widely performed. Differences between individual cells’ functionality are determined based on the differences in the gene expression profiles. Although the observations indicate a great performance of clustering methods, manual annotation of the clusters of cells is a challenge yet to be addressed more scalable and faster. On the other hand, due to the lack of enough labelled datasets, just a few supervised techniques have been used in cell type identification, and they obtained more robust results compared to clustering methods. A recent study showed that a complementary step of feature selection helped support vector machine (SVM) to outperform other classifiers in different scenarios. In this article, we compare and evaluate the performance of two state-of-the-art supervised methods, XGBoost and SVM, with information gain as a feature selection method. The results of the experiments on three standard scRNA-seq datasets indicate that XGBoost automatically annotates cell types in a simpler and more scalable framework. Additionally, it sheds light on the potential use of boosting tree approaches combined with deep neural networks to capture underlying information of single-cell RNA-Seq data more effectively. It can be used to identify marker genes and other applications in biological studies.
Collapse
|
5
|
Xu X, Li X. Structure-preserved dimension reduction using joint triplets sampling for multi-batch integration of single-cell transcriptomic data. Brief Bioinform 2023; 24:6982727. [PMID: 36627114 DOI: 10.1093/bib/bbac608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 11/18/2022] [Accepted: 12/09/2022] [Indexed: 01/12/2023] Open
Abstract
Dimension reduction (DR) plays an important role in single-cell RNA sequencing (scRNA-seq), such as data interpretation, visualization and other downstream analysis. A desired DR method should be applicable to various application scenarios, including identifying cell types, preserving the inherent structure of data and handling with batch effects. However, most of the existing DR methods fail to accommodate these requirements simultaneously, especially removing batch effects. In this paper, we develop a novel structure-preserved dimension reduction (SPDR) method using intra- and inter-batch triplets sampling. The constructed triplets jointly consider each anchor's mutual nearest neighbors from inter-batch, k-nearest neighbors from intra-batch and randomly selected cells from the whole data, which capture higher order structure information and meanwhile account for batch information of the data. Then we minimize a robust loss function for the chosen triplets to obtain a structure-preserved and batch-corrected low-dimensional representation. Comprehensive evaluations show that SPDR outperforms other competing DR methods, such as INSCT, IVIS, Trimap, Scanorama, scVI and UMAP, in removing batch effects, preserving biological variation, facilitating visualization and improving clustering accuracy. Besides, the two-dimensional (2D) embedding of SPDR presents a clear and authentic expression pattern, and can guide researchers to determine how many cell types should be identified. Furthermore, SPDR is robust to complex data characteristics (such as down-sampling, duplicates and outliers) and varying hyperparameter settings. We believe that SPDR will be a valuable tool for characterizing complex cellular heterogeneity.
Collapse
Affiliation(s)
- Xinyi Xu
- School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, 100081, China
| | - Xiangjie Li
- Changping Laboratory, Beijing, 102206, China
| |
Collapse
|
6
|
Xu Y, Chen Y, Jiang W, Yin X, Chen D, Chi Y, Wang Y, Zhang J, Zhang Q, Han Y. Identification of fatty acid metabolism-related molecular subtype biomarkers and their correlation with immune checkpoints in cutaneous melanoma. Front Immunol 2022; 13:967277. [PMID: 36466837 PMCID: PMC9716430 DOI: 10.3389/fimmu.2022.967277] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 11/04/2022] [Indexed: 10/06/2023] Open
Abstract
PURPOSE Fatty acid metabolism (FAM) affects the immune phenotype in a metabolically dynamic tumor microenvironment (TME), but the use of FAM-related genes (FAMGs) to predict the prognosis and immunotherapy response of cutaneous melanoma (CM) patients has not been investigated. In this study, we aimed to construct FAM molecular subtypes and identify key prognostic biomarkers in CM. METHODS We used a CM dataset in The Cancer Genome Atlas (TCGA) to construct FAM molecular subtypes. We performed Kaplan-Meier (K-M) analysis, gene set enrichment analysis (GSEA), and TME analysis to assess differences in the prognosis and immune phenotype between subtypes. We used weighted gene co-expression network analysis (WGCNA) to identify key biomarkers that regulate tumor metabolism and immunity between the subtypes. We compared overall survival (OS), progression-free survival (PFS), and disease-specific survival (DSS) between CM patients with high or low biomarker expression. We applied univariable and multivariable Cox analyses to verify the independent prognostic value of the FAM biomarkers. We used GSEA and TME analysis to investigate the immune-related regulation mechanism of the FAM subtype biomarker. We evaluated the immune checkpoint inhibition (ICI) response and chemotherapy sensitivity between CM patients with high or low biomarker expression. We performed real-time fluorescent quantitative PCR (qRT-PCR) and semi-quantitative analysis of the immunohistochemical (IHC) data from the Human Protein Atlas to evaluate the mRNA and protein expression levels of the FAM biomarkers in CM. RESULTS We identified 2 FAM molecular subtypes (cluster 1 and cluster 2). K-M analysis showed that cluster 2 had better OS and PFS than cluster 1 did. GSEA showed that, compared with cluster 1, cluster 2 had significantly upregulated immune response pathways. The TME analysis indicated that immune cell subpopulations and immune functions were highly enriched in cluster 2 as compared with cluster 1. WGCNA identified 6 hub genes (ACSL5, ALOX5AP, CD1D, CD74, IL4I1, and TBXAS1) as FAM biomarkers. CM patients with high expression levels of the six biomarkers had better OS, PFS, and DSS than those with low expression levels of the biomarkers. The Cox regression analyses verified that the 6 FAM biomarkers can be independent prognostic factors for CM patients. The single-gene GSEA showed that the high expression levels of the 6 genes were mainly enriched in T-cell antigen presentation, the PD-1 signaling pathway, and tumor escape. The TME analysis confirmed that the FAM subtype biomarkers were not only related to immune infiltration but also highly correlated with immune checkpoints such as PD-1, PD-L1, and CTLA-4. TIDE scores confirmed that patients with high expression levels of the 6 biomarkers had worse immunotherapy responses. The 6 genes conveyed significant sensitivity to some chemotherapy drugs. qRT-PCR and IHC analyses verified the expression levels of the 6 biomarkers in CM cells. CONCLUSION Our FAM subtypes verify that different FAM reprogramming affects the function and phenotype of infiltrating immune cells in the CM TME. The FAM molecular subtype biomarkers can be independent predictors of prognosis and immunotherapy response in CM patients.
Collapse
Affiliation(s)
- Yujian Xu
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Youbai Chen
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Weiqian Jiang
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Xiangye Yin
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Dongsheng Chen
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Yuan Chi
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Yuting Wang
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Julei Zhang
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| | - Qixu Zhang
- Department of Plastic Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Yan Han
- Department of Plastic and Reconstructive Surgery, The First Medical Center of Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
7
|
Mahalanabis A, Turinsky A, Husic M, Christensen E, Luo P, Naidas A, Brudno M, Pugh T, Ramani A, Shooshtari P. Evaluation of Single-cell RNA-seq Clustering Algorithms on Cancer Tumor Datasets. Comput Struct Biotechnol J 2022; 20:6375-6387. [DOI: 10.1016/j.csbj.2022.10.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 10/19/2022] [Accepted: 10/20/2022] [Indexed: 11/03/2022] Open
|
8
|
Wang Z, Yang S, Koga Y, Corbett SE, Shea C, Johnson W, Yajima M, Campbell JD. Celda: a Bayesian model to perform co-clustering of genes into modules and cells into subpopulations using single-cell RNA-seq data. NAR Genom Bioinform 2022; 4:lqac066. [PMID: 36110899 PMCID: PMC9469931 DOI: 10.1093/nargab/lqac066] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 08/09/2022] [Accepted: 08/25/2022] [Indexed: 11/26/2022] Open
Abstract
Single-cell RNA-seq (scRNA-seq) has emerged as a powerful technique to quantify gene expression in individual cells and to elucidate the molecular and cellular building blocks of complex tissues. We developed a novel Bayesian hierarchical model called Cellular Latent Dirichlet Allocation (Celda) to perform co-clustering of genes into transcriptional modules and cells into subpopulations. Celda can quantify the probabilistic contribution of each gene to each module, each module to each cell population and each cell population to each sample. In a peripheral blood mononuclear cell dataset, Celda identified a subpopulation of proliferating T cells and a plasma cell which were missed by two other common single-cell workflows. Celda also identified transcriptional modules that could be used to characterize unique and shared biological programs across cell types. Finally, Celda outperformed other approaches for clustering genes into modules on simulated data. Celda presents a novel method for characterizing transcriptional programs and cellular heterogeneity in scRNA-seq data.
Collapse
Affiliation(s)
- Zhe Wang
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Shiyi Yang
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Yusuke Koga
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Sean E Corbett
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Conor V Shea
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - W Evan Johnson
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Masanao Yajima
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA
| | - Joshua D Campbell
- Bioinformatics Program, Boston University, Boston, MA, USA
- Division of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
9
|
scWizard: a web-based automated tool for classifying and annotating single cells and downstream analysis of single-cell RNA-seq data in cancers. Comput Struct Biotechnol J 2022; 20:4902-4909. [PMID: 36147672 PMCID: PMC9474308 DOI: 10.1016/j.csbj.2022.08.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/27/2022] [Accepted: 08/12/2022] [Indexed: 11/22/2022] Open
Abstract
scWizard provides comprehensive analysis pipeline for integration strategies of cancer scRNA-seq data. scWizard enables classification of 47 cell subtypes within the TME based on hierarchical model by deep neural network. scWizard gives a higher accuracy for annotation cell subtypes within the TME compared with five methods. scWizard packages is a point-and-click tool helping for researchers without proficient programming skills.
The emerging number of single-cell RNA-seq (scRNA-Seq) datasets allows the characterization of cell types across various cancer types. However, there is still lack of effective tools to integrate the various analysis of single-cells, especially for making fine annotation on subtype cells within the tumor microenvironment (TME). We developed scWizard, a point-and-click tool packaging automated process including our developed cell annotation method based on deep neural network learning and 11 downstream analyses methods. scWizard used 113,976 cells across 13 cancer types as a built-in reference dataset for training the hierarchical model enabling to automatedly classify and annotate 7 major cell types and 47 cell subtypes in the TME. scWizard provides a built-in pre-training set for user’s flexible choice, and gives a higher accuracy for annotation subtypes of tumor-derived T-lymphocytes/natural killer cells (T/NK) and myeloid cells from different cancer types compared with the existing five methods. scWizard has good robustness in three independent cancer datasets, with an accuracy of 0.98 in annotating major cell types, 0.85 in annotating myeloid cell subtypes and 0.79 in annotating T/NK cell subtypes, indicting the wide applicability of scWizard in different cell types of cancers. Finally, the automatic analysis and visualization function of scWizard are presented by using the intrahepatic cholangiocarcinoma (ICC) scRNA-Seq dataset as a case. scWizard focuses on decoding TME and covers various analysis flows for cancer scRNA-Seq study, and provides an easy-to-use tool and a user-friendly interface for researchers widely, to further accelerate the biological discovery of cancer research.
Collapse
|
10
|
Daniszewski M, Senabouth A, Liang HH, Han X, Lidgerwood GE, Hernández D, Sivakumaran P, Clarke JE, Lim SY, Lees JG, Rooney L, Gulluyan L, Souzeau E, Graham SL, Chan CL, Nguyen U, Farbehi N, Gnanasambandapillai V, McCloy RA, Clarke L, Kearns LS, Mackey DA, Craig JE, MacGregor S, Powell JE, Pébay A, Hewitt AW. Retinal ganglion cell-specific genetic regulation in primary open-angle glaucoma. CELL GENOMICS 2022; 2:100142. [PMID: 36778138 PMCID: PMC9903700 DOI: 10.1016/j.xgen.2022.100142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Revised: 03/08/2021] [Accepted: 05/11/2022] [Indexed: 10/18/2022]
Abstract
To assess the transcriptomic profile of disease-specific cell populations, fibroblasts from patients with primary open-angle glaucoma (POAG) were reprogrammed into induced pluripotent stem cells (iPSCs) before being differentiated into retinal organoids and compared with those from healthy individuals. We performed single-cell RNA sequencing of a total of 247,520 cells and identified cluster-specific molecular signatures. Comparing the gene expression profile between cases and controls, we identified novel genetic associations for this blinding disease. Expression quantitative trait mapping identified a total of 4,443 significant loci across all cell types, 312 of which are specific to the retinal ganglion cell subpopulations, which ultimately degenerate in POAG. Transcriptome-wide association analysis identified genes at loci previously associated with POAG, and analysis, conditional on disease status, implicated 97 statistically significant retinal ganglion cell-specific expression quantitative trait loci. This work highlights the power of large-scale iPSC studies to uncover context-specific profiles for a genetically complex disease.
Collapse
Affiliation(s)
- Maciej Daniszewski
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Anne Senabouth
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Helena H. Liang
- Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Xikun Han
- QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Grace E. Lidgerwood
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Damián Hernández
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Priyadharshini Sivakumaran
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Jordan E. Clarke
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Shiang Y. Lim
- Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,O’Brien Institute Department of St Vincent’s Institute of Medical Research, Melbourne, Fitzroy, VIC 3065, Australia
| | - Jarmon G. Lees
- O’Brien Institute Department of St Vincent’s Institute of Medical Research, Melbourne, Fitzroy, VIC 3065, Australia,Department of Medicine, St Vincent’s Hospital, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Louise Rooney
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Lerna Gulluyan
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Emmanuelle Souzeau
- Department of Ophthalmology, Flinders University, Flinders Medical Centre, Bedford Park, SA 5042, Australia
| | - Stuart L. Graham
- Faculty of Medicine and Health Sciences, Macquarie University, Macquarie Park, NSW 2109, Australia
| | - Chia-Ling Chan
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Uyen Nguyen
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Nona Farbehi
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Vikkitharan Gnanasambandapillai
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Rachael A. McCloy
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia
| | - Linda Clarke
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - Lisa S. Kearns
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia
| | - David A. Mackey
- Lions Eye Institute, Centre for Vision Sciences, University of Western Australia, Crawley, WA 6009, Australia,School of Medicine, Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7005, Australia
| | - Jamie E. Craig
- Department of Ophthalmology, Flinders University, Flinders Medical Centre, Bedford Park, SA 5042, Australia
| | - Stuart MacGregor
- QIMR Berghofer Medical Research Institute, Brisbane, QLD 4006, Australia
| | - Joseph E. Powell
- Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, The Kinghorn Cancer Centre, Darlinghurst, NSW 2010, Australia,UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW 2052, Australia,Corresponding author
| | - Alice Pébay
- Department of Anatomy and Physiology, The University of Melbourne, Parkville, VIC 3010, Australia,Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia,Corresponding author
| | - Alex W. Hewitt
- Department of Surgery, The University of Melbourne, Parkville, VIC 3010, Australia,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC 3002, Australia,School of Medicine, Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS 7005, Australia,Corresponding author
| |
Collapse
|
11
|
M Ascensión A, Ibáñez-Solé O, Inza I, Izeta A, Araúzo-Bravo MJ. Triku: a feature selection method based on nearest neighbors for single-cell data. Gigascience 2022; 11:6547682. [PMID: 35277963 PMCID: PMC8917514 DOI: 10.1093/gigascience/giac017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Revised: 09/24/2021] [Indexed: 01/03/2023] Open
Abstract
Background Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset. Results Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes. Conclusion Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku.
Collapse
Affiliation(s)
- Alex M Ascensión
- Biodonostia Health Research Institute, Computational Biology and Systems Biomedicine Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
- Biodonostia Health Research Institute, Tissue Engineering Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
| | - Olga Ibáñez-Solé
- Biodonostia Health Research Institute, Computational Biology and Systems Biomedicine Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
- Biodonostia Health Research Institute, Tissue Engineering Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
| | - Iñaki Inza
- Intelligent Systems Group, Computer Science Faculty, University of the Basque Country, Donostia-San Sebastian, 20018, Spain
| | - Ander Izeta
- Biodonostia Health Research Institute, Tissue Engineering Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
| | - Marcos J Araúzo-Bravo
- Biodonostia Health Research Institute, Computational Biology and Systems Biomedicine Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain
- Max Planck Institute for Molecular Biomedicine, Roentgenstr. 20, 48149 Muenster, German
- IKERBASQUE, Basque Foundation for Science, Euskadi plaza 5, Bilbao, 48009, Spain
- Department of Cell Biology and Histology, Faculty of Medicine and Nursing, University of Basque Country (UPV/EHU), 48940 Leioa, Spain
| |
Collapse
|
12
|
Reed ER, Monti S. Multi-resolution characterization of molecular taxonomies in bulk and single-cell transcriptomics data. Nucleic Acids Res 2021; 49:e98. [PMID: 34226941 PMCID: PMC8464061 DOI: 10.1093/nar/gkab552] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 06/07/2021] [Accepted: 06/18/2021] [Indexed: 12/21/2022] Open
Abstract
As high-throughput genomics assays become more efficient and cost effective, their utilization has become standard in large-scale biomedical projects. These studies are often explorative, in that relationships between samples are not explicitly defined a priori, but rather emerge from data-driven discovery and annotation of molecular subtypes, thereby informing hypotheses and independent evaluation. Here, we present K2Taxonomer, a novel unsupervised recursive partitioning algorithm and associated R package that utilize ensemble learning to identify robust subgroups in a 'taxonomy-like' structure. K2Taxonomer was devised to accommodate different data paradigms, and is suitable for the analysis of both bulk and single-cell transcriptomics, and other '-omics', data. For each of these data types, we demonstrate the power of K2Taxonomer to discover known relationships in both simulated and human tissue data. We conclude with a practical application on breast cancer tumor infiltrating lymphocyte (TIL) single-cell profiles, in which we identified co-expression of translational machinery genes as a dominant transcriptional program shared by T cells subtypes, associated with better prognosis in breast cancer tissue bulk expression data.
Collapse
Affiliation(s)
- Eric R Reed
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
| | - Stefano Monti
- Section of Computational Biomedicine, Boston University School of Medicine, Boston, MA 02118, USA
- Bioinformatics Program, College of Engineering, Boston University, Boston, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA 02118, USA
| |
Collapse
|
13
|
Lewis SM, Asselin-Labat ML, Nguyen Q, Berthelet J, Tan X, Wimmer VC, Merino D, Rogers KL, Naik SH. Spatial omics and multiplexed imaging to explore cancer biology. Nat Methods 2021; 18:997-1012. [PMID: 34341583 DOI: 10.1038/s41592-021-01203-6] [Citation(s) in RCA: 229] [Impact Index Per Article: 76.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/04/2021] [Indexed: 01/19/2023]
Abstract
Understanding intratumoral heterogeneity-the molecular variation among cells within a tumor-promises to address outstanding questions in cancer biology and improve the diagnosis and treatment of specific cancer subtypes. Single-cell analyses, especially RNA sequencing and other genomics modalities, have been transformative in revealing novel biomarkers and molecular regulators associated with tumor growth, metastasis and drug resistance. However, these approaches fail to provide a complete picture of tumor biology, as information on cellular location within the tumor microenvironment is lost. New technologies leveraging multiplexed fluorescence, DNA, RNA and isotope labeling enable the detection of tens to thousands of cancer subclones or molecular biomarkers within their native spatial context. The expeditious growth in these techniques, along with methods for multiomics data integration, promises to yield a more comprehensive understanding of cell-to-cell variation within and between individual tumors. Here we provide the current state and future perspectives on the spatial technologies expected to drive the next generation of research and diagnostic and therapeutic strategies for cancer.
Collapse
Affiliation(s)
- Sabrina M Lewis
- Advanced Technology and Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.,Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Marie-Liesse Asselin-Labat
- Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia.,Personalised Oncology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Quan Nguyen
- Division of Genetics and Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Jean Berthelet
- Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria, Australia.,School of Cancer Medicine, La Trobe University, Bundoora, Victoria, Australia
| | - Xiao Tan
- Division of Genetics and Genomics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia
| | - Verena C Wimmer
- Advanced Technology and Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Delphine Merino
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia.,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia.,Olivia Newton-John Cancer Research Institute, Heidelberg, Victoria, Australia.,School of Cancer Medicine, La Trobe University, Bundoora, Victoria, Australia
| | - Kelly L Rogers
- Advanced Technology and Biology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. .,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia.
| | - Shalin H Naik
- Immunology Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia. .,Department of Medical Biology, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, Victoria, Australia.
| |
Collapse
|
14
|
Thompson M, Matsumoto M, Ma T, Senabouth A, Palpant NJ, Powell JE, Nguyen Q. scGPS: Determining Cell States and Global Fate Potential of Subpopulations. Front Genet 2021; 12:666771. [PMID: 34349778 PMCID: PMC8326972 DOI: 10.3389/fgene.2021.666771] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 06/04/2021] [Indexed: 12/20/2022] Open
Abstract
Finding cell states and their transcriptional relatedness is a main outcome from analysing single-cell data. In developmental biology, determining whether cells are related in a differentiation lineage remains a major challenge. A seamless analysis pipeline from cell clustering to estimating the probability of transitions between cell clusters is lacking. Here, we present Single Cell Global fate Potential of Subpopulations (scGPS) to characterise transcriptional relationship between cell states. scGPS decomposes mixed cell populations in one or more samples into clusters (SCORE algorithm) and estimates pairwise transitioning potential (scGPS algorithm) of any pair of clusters. SCORE allows for the assessment and selection of stable clustering results, a major challenge in clustering analysis. scGPS implements a novel approach, with machine learning classification, to flexibly construct trajectory connections between clusters. scGPS also has a feature selection functionality by network and modelling approaches to find biological processes and driver genes that connect cell populations. We applied scGPS in diverse developmental contexts and show superior results compared to a range of clustering and trajectory analysis methods. scGPS is able to identify the dynamics of cellular plasticity in a user-friendly workflow, that is fast and memory efficient. scGPS is implemented in R with optimised functions using C++ and is publicly available in Bioconductor.
Collapse
Affiliation(s)
- Michael Thompson
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Maika Matsumoto
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Tianqi Ma
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Anne Senabouth
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia
| | - Nathan J Palpant
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Joseph E Powell
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, Australia.,UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, Australia
| | - Quan Nguyen
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
15
|
Chen Y, Song J, Ruan Q, Zeng X, Wu L, Cai L, Wang X, Yang C. Single-Cell Sequencing Methodologies: From Transcriptome to Multi-Dimensional Measurement. SMALL METHODS 2021; 5:e2100111. [PMID: 34927917 DOI: 10.1002/smtd.202100111] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/26/2021] [Indexed: 06/14/2023]
Abstract
Cells are the basic building blocks of biological systems, with inherent unique molecular features and development trajectories. The study of single cells facilitates in-depth understanding of cellular diversity, disease processes, and organization of multicellular organisms. Single-cell RNA sequencing (scRNA-seq) technologies have become essential tools for the interrogation of gene expression patterns and the dynamics of single cells, allowing cellular heterogeneity to be dissected at unprecedented resolution. Nevertheless, measuring at only transcriptome level or 1D is incomplete; the cellular heterogeneity reflects in multiple dimensions, including the genome, epigenome, transcriptome, spatial, and even temporal dimensions. Hence, integrative single cell analysis is highly desired. In addition, the way to interpret sequencing data by virtue of bioinformatic tools also exerts critical roles in revealing differential gene expression. Here, a comprehensive review that summarizes the cutting-edge single-cell transcriptome sequencing methodologies, including scRNA-seq, spatial and temporal transcriptome profiling, multi-omics sequencing and computational methods developed for scRNA-seq data analysis is provided. Finally, the challenges and perspectives of this field are discussed.
Collapse
Affiliation(s)
- Yingwen Chen
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Jia Song
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Qingyu Ruan
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xi Zeng
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Lingling Wu
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| | - Linfeng Cai
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Xuanqun Wang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
| | - Chaoyong Yang
- The MOE Key Laboratory of Spectrochemical Analysis and Instrumentation, The Key Laboratory of Chemical Biology of Fujian Province, State Key Laboratory of Physical Chemistry of Solid Surfaces, Collaborative Innovation Center of Chemistry for Energy Materials, Department of Chemical Biology, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China
- Institute of Molecular Medicine, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200127, China
| |
Collapse
|
16
|
Qin C, Pan Y, Li Y, Li Y, Long W, Liu Q. Novel Molecular Hallmarks of Group 3 Medulloblastoma by Single-Cell Transcriptomics. Front Oncol 2021; 11:622430. [PMID: 33816256 PMCID: PMC8013995 DOI: 10.3389/fonc.2021.622430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Accepted: 03/01/2021] [Indexed: 12/21/2022] Open
Abstract
Medulloblastoma (MB) is a highly heterogeneous and one of the most malignant pediatric brain tumors, comprising four subgroups: Sonic Hedgehog, Wingless, Group 3, and Group 4. Group 3 MB has the worst prognosis of all MBs. However, the molecular and cellular mechanisms driving the maintenance of malignancy are poorly understood. Here, we employed high-throughput single-cell and bulk RNA sequencing to identify novel molecular features of Group 3 MB, and found that a specific cell cluster displayed a highly malignant phenotype. Then, we identified the glutamate receptor metabotropic 8 (GRM8), and AP-1 complex subunit sigma-2 (AP1S2) genes as two critical markers of Group 3 MB, corresponding to its poor prognosis. Information on 33 clinical cases was further utilized for validation. Meanwhile, a global map of the molecular cascade downstream of the MYC oncogene in Group 3 MB was also delineated using single-cell RNA sequencing. Our data yields new insights into Group 3 MB molecular characteristics and provides novel therapeutic targets for this relentless disease.
Collapse
Affiliation(s)
- Chaoying Qin
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| | - Yimin Pan
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| | - Yuzhe Li
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| | - Yue Li
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| | - Wenyong Long
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| | - Qing Liu
- Department of Neurosurgery in Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
17
|
Yu X, Abbas-Aghababazadeh F, Chen YA, Fridley BL. Statistical and Bioinformatics Analysis of Data from Bulk and Single-Cell RNA Sequencing Experiments. Methods Mol Biol 2021; 2194:143-175. [PMID: 32926366 PMCID: PMC7771369 DOI: 10.1007/978-1-0716-0849-4_9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
High-throughput sequencing (HTS) has revolutionized researchers' ability to study the human transcriptome, particularly as it relates to cancer. Recently, HTS technology has advanced to the point where now one is able to sequence individual cells (i.e., "single-cell sequencing"). Prior to single-cell sequencing technology, HTS would be completed on RNA extracted from a tissue sample consisting of multiple cell types (i.e., "bulk sequencing"). In this chapter, we review the various bioinformatics and statistical methods used in the processing, quality control, and analysis of bulk and single-cell RNA sequencing methods. Additionally, we discuss how these methods are also being used to study tumor heterogeneity.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Farnoosh Abbas-Aghababazadeh
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Y Ann Chen
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
18
|
Shim WJ, Sinniah E, Xu J, Vitrinel B, Alexanian M, Andreoletti G, Shen S, Sun Y, Balderson B, Boix C, Peng G, Jing N, Wang Y, Kellis M, Tam PPL, Smith A, Piper M, Christiaen L, Nguyen Q, Bodén M, Palpant NJ. Conserved Epigenetic Regulatory Logic Infers Genes Governing Cell Identity. Cell Syst 2020; 11:625-639.e13. [PMID: 33278344 PMCID: PMC7781436 DOI: 10.1016/j.cels.2020.11.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 08/31/2020] [Accepted: 11/09/2020] [Indexed: 01/06/2023]
Abstract
Determining genes that orchestrate cell differentiation in development and disease remains a fundamental goal of cell biology. This study establishes a genome-wide metric based on the gene-repressive trimethylation of histone H3 at lysine 27 (H3K27me3) across hundreds of diverse cell types to identify genetic regulators of cell differentiation. We introduce a computational method, TRIAGE, which uses discordance between gene-repressive tendency and expression to identify genetic drivers of cell identity. We apply TRIAGE to millions of genome-wide single-cell transcriptomes, diverse omics platforms, and eukaryotic cells and tissue types. Using a wide range of data, we validate the performance of TRIAGE in identifying cell-type-specific regulatory factors across diverse species including human, mouse, boar, bird, fish, and tunicate. Using CRISPR gene editing, we use TRIAGE to experimentally validate RNF220 as a regulator of Ciona cardiopharyngeal development and SIX3 as required for differentiation of endoderm in human pluripotent stem cells. A record of this paper’s transparent peer review process is included in the Supplemental Information. Perturbing genes controlling cell decisions have major implications in development or disease. However, identifying key regulatory genes from the thousands expressed in a cell is challenging. TRIAGE is a computational method that distills patterns of epigenetic repression across diverse cell types to infer regulatory genes using input gene expression data from any cell type. Demonstrating its utility, we combine single-cell RNA-seq and TRIAGE to identify and experimentally confirm novel regulators of heart development in evolutionarily distant species.
Collapse
Affiliation(s)
- Woo Jun Shim
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Enakshi Sinniah
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Jun Xu
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Burcu Vitrinel
- Center for Developmental Genetics, Department of Biology, New York University, New York, NY, USA
| | - Michael Alexanian
- Gladstone Institute of Cardiovascular Disease, San Francisco, CA, USA
| | - Gaia Andreoletti
- Institute for Computational Health Sciences, University of California, San Francisco, CA 94158, USA
| | - Sophie Shen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Yuliangzi Sun
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Brad Balderson
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | - Carles Boix
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Guangdun Peng
- CAS Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences and Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou, China; State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Naihe Jing
- CAS Key Laboratory of Regenerative Biology, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, University of Chinese Academy of Sciences and Bioland Laboratory (Guangzhou Regenerative Medicine and Health Guangdong Laboratory), Guangzhou, China; State Key Laboratory of Cell Biology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Shanghai, China
| | - Yuliang Wang
- Paul G. Allen School of Computer Science and Engineering and Institute for Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA, USA
| | | | - Patrick P L Tam
- The University of Sydney, Children's Medical Research Institute, and School of Medical Sciences, Faculty of Medicine and Health, Westmead, NSW 2145, Australia
| | - Aaron Smith
- Institute of Health and Biomedical Innovation, School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia; Translational Research Institute, Woolloongabba, Brisbane, Australia
| | - Michael Piper
- School of Biomedical Sciences, The University of Queensland, Brisbane, Australia; Queensland Brain Institute, The University of Queensland, Brisbane, Australia
| | - Lionel Christiaen
- Center for Developmental Genetics, Department of Biology, New York University, New York, NY, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia.
| | - Nathan J Palpant
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
19
|
Kott KA, Vernon ST, Hansen T, de Dreu M, Das SK, Powell J, Fazekas de St Groth B, Di Bartolo BA, McGuire HM, Figtree GA. Single-Cell Immune Profiling in Coronary Artery Disease: The Role of State-of-the-Art Immunophenotyping With Mass Cytometry in the Diagnosis of Atherosclerosis. J Am Heart Assoc 2020; 9:e017759. [PMID: 33251927 PMCID: PMC7955359 DOI: 10.1161/jaha.120.017759] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Coronary artery disease remains the leading cause of death globally and is a major burden to every health system in the world. There have been significant improvements in risk modification, treatments, and mortality; however, our ability to detect asymptomatic disease for early intervention remains limited. Recent discoveries regarding the inflammatory nature of atherosclerosis have prompted investigation into new methods of diagnosis and treatment of coronary artery disease. This article reviews some of the highlights of the important developments in cardioimmunology and summarizes the clinical evidence linking the immune system and atherosclerosis. It provides an overview of the major serological biomarkers that have been associated with atherosclerosis, noting the limitations of these markers attributable to low specificity, and then contrasts these serological markers with the circulating immune cell subtypes that have been found to be altered in coronary artery disease. This review then outlines the technique of mass cytometry and its ability to provide high-dimensional single-cell data and explores how this high-resolution quantification of specific immune cell subpopulations may assist in the diagnosis of early atherosclerosis in combination with other complimentary techniques such as single-cell RNA sequencing. We propose that this improved specificity has the potential to transform the detection of coronary artery disease in its early phases, facilitating targeted preventative approaches in the precision medicine era.
Collapse
Affiliation(s)
- Katharine A Kott
- Cardiothoracic and Vascular Health Kolling Institute of Medical Research Sydney Australia.,Department of Cardiology Royal North Shore Hospital Northern Sydney Local Health District Sydney Australia.,School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia
| | - Stephen T Vernon
- Cardiothoracic and Vascular Health Kolling Institute of Medical Research Sydney Australia.,Department of Cardiology Royal North Shore Hospital Northern Sydney Local Health District Sydney Australia.,School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia
| | - Thomas Hansen
- Cardiothoracic and Vascular Health Kolling Institute of Medical Research Sydney Australia.,School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia
| | - Macha de Dreu
- School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia.,Ramaciotti Facility for Human Systems Biology Charles Perkins Centre University of Sydney Sydney Australia
| | - Souvik K Das
- Department of Cardiology Royal North Shore Hospital Northern Sydney Local Health District Sydney Australia
| | - Joseph Powell
- Garvan-Weizmann Centre for Cellular Genomics Garvan Institute Sydney Australia.,UNSW Cellular Genomics Futures Institute University of New South Wales Sydney Australia
| | - Barbara Fazekas de St Groth
- School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia.,Ramaciotti Facility for Human Systems Biology Charles Perkins Centre University of Sydney Sydney Australia.,Charles Perkins Centre University of Sydney Sydney Australia
| | - Belinda A Di Bartolo
- Cardiothoracic and Vascular Health Kolling Institute of Medical Research Sydney Australia
| | - Helen M McGuire
- School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia.,Ramaciotti Facility for Human Systems Biology Charles Perkins Centre University of Sydney Sydney Australia.,Charles Perkins Centre University of Sydney Sydney Australia
| | - Gemma A Figtree
- Cardiothoracic and Vascular Health Kolling Institute of Medical Research Sydney Australia.,Department of Cardiology Royal North Shore Hospital Northern Sydney Local Health District Sydney Australia.,School of Medical Sciences Faculty of Medicine and Health University of Sydney Sydney Australia.,Charles Perkins Centre University of Sydney Sydney Australia
| |
Collapse
|
20
|
Hsu LL, Culhane AC. Impact of Data Preprocessing on Integrative Matrix Factorization of Single Cell Data. Front Oncol 2020; 10:973. [PMID: 32656082 PMCID: PMC7324639 DOI: 10.3389/fonc.2020.00973] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 05/18/2020] [Indexed: 01/04/2023] Open
Abstract
Integrative, single-cell analyses may provide unprecedented insights into cellular and spatial diversity of the tumor microenvironment. The sparsity, noise, and high dimensionality of these data present unique challenges. Whilst approaches for integrating single-cell data are emerging and are far from being standardized, most data integration, cell clustering, cell trajectory, and analysis pipelines employ a dimension reduction step, frequently principal component analysis (PCA), a matrix factorization method that is relatively fast, and can easily scale to large datasets when used with sparse-matrix representations. In this review, we provide a guide to PCA and related methods. We describe the relationship between PCA and singular value decomposition, the difference between PCA of a correlation and covariance matrix, the impact of scaling, log-transforming, and standardization, and how to recognize a horseshoe or arch effect in a PCA. We describe canonical correlation analysis (CCA), a popular matrix factorization approach for the integration of single-cell data from different platforms or studies. We discuss alternatives to CCA and why additional preprocessing or weighting datasets within the joint decomposition should be considered.
Collapse
Affiliation(s)
- Lauren L Hsu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.,Division of Biostatistics and Computational Biology, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Aedin C Culhane
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.,Division of Biostatistics and Computational Biology, Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, United States
| |
Collapse
|
21
|
Tsuyuzaki K, Sato H, Sato K, Nikaido I. Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol 2020; 21:9. [PMID: 31955711 PMCID: PMC6970290 DOI: 10.1186/s13059-019-1900-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 11/26/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. RESULTS In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. CONCLUSION We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.
Collapse
Affiliation(s)
- Koki Tsuyuzaki
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Japan Science and Technology Agency, PRESTO, 5-3, Yonbancho, Chiyoda-ku, Tokyo, 102-8666 Japan
| | - Hiroyuki Sato
- Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Kenta Sato
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Itoshi Nikaido
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Bioinformatics Course, Master’s/Doctoral Program in Life Science Innovation (T-LSI), School of Integrative and Global Majors (SIGMA), University of Tsukuba, 1-1-1, Tennodai, Tsukuba, Ibaraki, 305-8577 Japan
| |
Collapse
|
22
|
Krzak M, Raykov Y, Boukouvalas A, Cutillo L, Angelini C. Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods. Front Genet 2019; 10:1253. [PMID: 31921297 PMCID: PMC6918801 DOI: 10.3389/fgene.2019.01253] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/13/2019] [Indexed: 01/04/2023] Open
Abstract
Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.
Collapse
Affiliation(s)
- Monika Krzak
- Institute for Applied Mathematics “Mauro Picone”, Naples, Italy
| | - Yordan Raykov
- Department of Mathematics, Aston University, Birmingham, United Kingdom
| | | | - Luisa Cutillo
- School of Mathematics, University of Leeds, Leeds, United Kingdom
| | | |
Collapse
|
23
|
Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019; 20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Dimensionality reduction is an indispensable analytic component for many areas of single-cell RNA sequencing (scRNA-seq) data analysis. Proper dimensionality reduction can allow for effective noise removal and facilitate many downstream analyses that include cell clustering and lineage reconstruction. Unfortunately, despite the critical importance of dimensionality reduction in scRNA-seq analysis and the vast number of dimensionality reduction methods developed for scRNA-seq studies, few comprehensive comparison studies have been performed to evaluate the effectiveness of different dimensionality reduction methods in scRNA-seq. RESULTS We aim to fill this critical knowledge gap by providing a comparative evaluation of a variety of commonly used dimensionality reduction methods for scRNA-seq studies. Specifically, we compare 18 different dimensionality reduction methods on 30 publicly available scRNA-seq datasets that cover a range of sequencing techniques and sample sizes. We evaluate the performance of different dimensionality reduction methods for neighborhood preserving in terms of their ability to recover features of the original expression matrix, and for cell clustering and lineage reconstruction in terms of their accuracy and robustness. We also evaluate the computational scalability of different dimensionality reduction methods by recording their computational cost. CONCLUSIONS Based on the comprehensive evaluation results, we provide important guidelines for choosing dimensionality reduction methods for scRNA-seq data analysis. We also provide all analysis scripts used in the present study at www.xzlab.org/reproduce.html.
Collapse
Affiliation(s)
- Shiquan Sun
- School of Computer Science, Northwestern Polytechnical University, Xi'an, Shaanxi, 710072, People's Republic of China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Jiaqiang Zhu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
24
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.1] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2018] [Indexed: 12/21/2022] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|
25
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub (
https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor (
https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|
26
|
Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018; 7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.2] [Citation(s) in RCA: 130] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/31/2018] [Indexed: 12/31/2022] Open
Abstract
Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( https://bioconductor.org/packages/DuoClustering2018).
Collapse
Affiliation(s)
- Angelo Duò
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland
| |
Collapse
|