Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open

For:	Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open

Number

Cited by Other Article(s)

101

Fu R, Gillen AE, Sheridan RM, Tian C, Daya M, Hao Y, Hesselberth JR, Riemondy KA. clustifyr: an R package for automated single-cell RNA sequencing cluster classification. F1000Res 2020;9:223. [PMID: 32765839 PMCID: PMC7383722 DOI: 10.12688/f1000research.22969.2] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2020] [Indexed: 01/02/2023] Open

102

Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification. Genes (Basel) 2020;11:genes11040377. [PMID: 32244427 PMCID: PMC7230610 DOI: 10.3390/genes11040377] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/24/2020] [Accepted: 03/24/2020] [Indexed: 12/14/2022] Open

103

Huh R, Yang Y, Jiang Y, Shen Y, Li Y. SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble. Nucleic Acids Res 2020;48:86-95. [PMID: 31777938 PMCID: PMC6943136 DOI: 10.1093/nar/gkz959] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 10/03/2019] [Accepted: 10/10/2019] [Indexed: 12/19/2022] Open

104

Loss of the branched-chain amino acid transporter CD98hc alters the development of colonic macrophages in mice. Commun Biol 2020;3:130. [PMID: 32188932 PMCID: PMC7080761 DOI: 10.1038/s42003-020-0842-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Accepted: 02/21/2020] [Indexed: 12/14/2022] Open

105

Casey MJ, Stumpf PS, MacArthur BD. Theory of cell fate. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2020;12:e1471. [PMID: 31828979 PMCID: PMC7027507 DOI: 10.1002/wsbm.1471] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/15/2019] [Accepted: 11/06/2019] [Indexed: 11/17/2022]

106

Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 564] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open

Affiliation(s)

David Lähnemann Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Johannes Köster Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
Ewa Szczurek Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Davis J. McCarthy Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
Stephanie C. Hicks Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
Mark D. Robinson Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
Catalina A. Vallejos MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK The Alan Turing Institute, British Library, London, UK
Kieran R. Campbell Department of Statistics, University of British Columbia, Vancouver, Canada Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Data Science Institute, University of British Columbia, Vancouver, Canada
Niko Beerenwinkel Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Ahmed Mahfouz Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Luca Pinello Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA Department of Pathology, Harvard Medical School, Boston, USA Broad Institute of Harvard and MIT, Cambridge, MA USA
Pavel Skums Department of Computer Science, Georgia State University, Atlanta, USA
Alexandros Stamatakis Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Camille Stephan-Otto Attolini Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Spain
Samuel Aparicio Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
Jasmijn Baaijens Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Marleen Balvert Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
Buys de Barbanson Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Antonio Cappuccio Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
Giacomo Corleone Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
Bas E. Dutilh Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
Maria Florescu Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Victor Guryev European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Rens Holmer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Katharina Jahn Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Thamar Jessurun Lobo European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Emma M. Keizer Biometris, Wageningen University & Research, Wageningen, The Netherlands
Indu Khatri Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
Szymon M. Kielbasa Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
Jan O. Korbel Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Alexey M. Kozlov Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Tzu-Hao Kuo Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Boudewijn P.F. Lelieveldt PRB lab, Delft University of Technology, Delft, The Netherlands Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Ion I. Mandoiu Computer Science & Engineering Department, University of Connecticut, Storrs, USA
John C. Marioni Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
Tobias Marschall Center for Bioinformatics, Saarland University, Saarbrücken, Germany Max Planck Institute for Informatics, Saarbrücken, Germany
Felix Mölder Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
Amir Niknejad Computation molecular design, Zuse Institute Berlin, Berlin, Germany Mathematics Department, Mount Saint Vincent, New York, USA
Alicja Rączkowska Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Marcel Reinders Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Jeroen de Ridder Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands
Antoine-Emmanuel Saliba Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
Antonios Somarakis Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Oliver Stegle Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
Fabian J. Theis Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
Huan Yang Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
Alex Zelikovsky Department of Computer Science, Georgia State University, Atlanta, USA The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
Alice C. McHardy Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Benjamin J. Raphael Department of Computer Science, Princeton University, Princeton, USA
Sohrab P. Shah Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
Alexander Schönhuth Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands

Collapse

107

Geddes TA, Kim T, Nan L, Burchfield JG, Yang JYH, Tao D, Yang P. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis. BMC Bioinformatics 2019;20:660. [PMID: 31870278 PMCID: PMC6929272 DOI: 10.1186/s12859-019-3179-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 10/28/2019] [Indexed: 01/23/2023] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.

Results

Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used.

Conclusions

Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS

Collapse

108

Cao Y, Lin Y, Ormerod JT, Yang P, Yang JYH, Lo KK. scDC: single cell differential composition analysis. BMC Bioinformatics 2019;20:721. [PMID: 31870280 PMCID: PMC6929335 DOI: 10.1186/s12859-019-3211-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/12/2019] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite detail. However, a number of challenges remain with cell-type composition analysis - none of the existing methods can identify cell-type perfectly and variability related to cell sampling exists in any single cell experiment. This necessitates the development of method for estimating uncertainty in cell-type composition.

RESULTS

We developed a novel single cell differential composition (scDC) analysis method that performs differential cell-type composition analysis via bootstrap resampling. scDC captures the uncertainty associated with cell-type proportions of each subject via bias-corrected and accelerated bootstrap confidence intervals. We assessed the performance of our method using a number of simulated datasets and synthetic datasets curated from publicly available single cell datasets. In simulated datasets, scDC correctly recovered the true cell-type proportions. In synthetic datasets, the cell-type compositions returned by scDC were highly concordant with reference cell-type compositions from the original data. Since the majority of datasets tested in this study have only 2 to 5 subjects per condition, the addition of confidence intervals enabled better comparisons of compositional differences between subjects and across conditions.

CONCLUSIONS

scDC is a novel statistical method for performing differential cell-type composition analysis for scRNA-seq data. It uses bootstrap resampling to estimate the standard errors associated with cell-type proportion estimates and performs significance testing through GLM and GLMM models. We have made this method available to the scientific community as part of the scdney package (Single Cell Data Integrative Analysis) R package, available from https://github.com/SydneyBioX/scdney.

Collapse

109

Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019;20:295. [PMID: 31870412 PMCID: PMC6927135 DOI: 10.1186/s13059-019-1861-6] [Citation(s) in RCA: 206] [Impact Index Per Article: 41.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 12/23/2022] Open

110

Cheng C, Easton J, Rosencrance C, Li Y, Ju B, Williams J, Mulder HL, Pang Y, Chen W, Chen X. Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data. Nucleic Acids Res 2019;47:e143. [PMID: 31566233 PMCID: PMC6902034 DOI: 10.1093/nar/gkz826] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 08/30/2019] [Accepted: 09/26/2019] [Indexed: 12/21/2022] Open

111

Krzak M, Raykov Y, Boukouvalas A, Cutillo L, Angelini C. Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods. Front Genet 2019;10:1253. [PMID: 31921297 PMCID: PMC6918801 DOI: 10.3389/fgene.2019.01253] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/13/2019] [Indexed: 01/04/2023] Open

Abstract

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.

Collapse

112

Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019;20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open

113

Chaudhry F, Isherwood J, Bawa T, Patel D, Gurdziel K, Lanfear DE, Ruden DM, Levy PD. Single-Cell RNA Sequencing of the Cardiovascular System: New Looks for Old Diseases. Front Cardiovasc Med 2019;6:173. [PMID: 31921894 PMCID: PMC6914766 DOI: 10.3389/fcvm.2019.00173] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 11/12/2019] [Indexed: 12/18/2022] Open

114

Tarashansky AJ, Xue Y, Li P, Quake SR, Wang B. Self-assembling manifolds in single-cell RNA sequencing data. eLife 2019;8:e48994. [PMID: 31524596 PMCID: PMC6795480 DOI: 10.7554/elife.48994] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Accepted: 09/16/2019] [Indexed: 12/14/2022] Open

115

Abdelaal T, Michielsen L, Cats D, Hoogduin D, Mei H, Reinders MJT, Mahfouz A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 2019;20:194. [PMID: 31500660 PMCID: PMC6734286 DOI: 10.1186/s13059-019-1795-z] [Citation(s) in RCA: 305] [Impact Index Per Article: 61.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 08/17/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Single-cell transcriptomics is rapidly advancing our understanding of the cellular composition of complex tissues and organisms. A major limitation in most analysis pipelines is the reliance on manual annotations to determine cell identities, which are time-consuming and irreproducible. The exponential growth in the number of cells and samples has prompted the adaptation and development of supervised classification methods for automatic cell identification.

RESULTS

Here, we benchmarked 22 classification methods that automatically assign cell identities including single-cell-specific and general-purpose classifiers. The performance of the methods is evaluated using 27 publicly available single-cell RNA sequencing datasets of different sizes, technologies, species, and levels of complexity. We use 2 experimental setups to evaluate the performance of each method for within dataset predictions (intra-dataset) and across datasets (inter-dataset) based on accuracy, percentage of unclassified cells, and computation time. We further evaluate the methods' sensitivity to the input features, number of cells per population, and their performance across different annotation levels and datasets. We find that most classifiers perform well on a variety of datasets with decreased accuracy for complex datasets with overlapping classes or deep annotations. The general-purpose support vector machine classifier has overall the best performance across the different experiments.

CONCLUSIONS

We present a comprehensive evaluation of automatic cell identification methods for single-cell RNA sequencing data. All the code used for the evaluation is available on GitHub ( https://github.com/tabdelaal/scRNAseq_Benchmark ). Additionally, we provide a Snakemake workflow to facilitate the benchmarking and to support the extension of new methods and new datasets.

Collapse

116

Yu X, Chen YA, Conejo-Garcia JR, Chung CH, Wang X. Estimation of immune cell content in tumor using single-cell RNA-seq reference data. BMC Cancer 2019;19:715. [PMID: 31324168 PMCID: PMC6642583 DOI: 10.1186/s12885-019-5927-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 07/12/2019] [Indexed: 12/12/2022] Open

117

Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, Boulesteix AL, Saeys Y, Robinson MD. Essential guidelines for computational method benchmarking. Genome Biol 2019;20:125. [PMID: 31221194 PMCID: PMC6584985 DOI: 10.1186/s13059-019-1738-8] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

118

Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 2019;15:e8746. [PMID: 31217225 PMCID: PMC6582955 DOI: 10.15252/msb.20188746] [Citation(s) in RCA: 953] [Impact Index Per Article: 190.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 03/15/2019] [Accepted: 04/03/2019] [Indexed: 12/21/2022] Open

119

Crow M, Gillis J. Single cell RNA-sequencing: replicability of cell types. Curr Opin Neurobiol 2019;56:69-77. [PMID: 30654233 PMCID: PMC6551252 DOI: 10.1016/j.conb.2018.12.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 12/03/2018] [Accepted: 12/09/2018] [Indexed: 01/09/2023]

120

Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X. scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data. BMC Genomics 2019;20:347. [PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/29/2019] [Indexed: 12/15/2022] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data.

Results

We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools.

Conclusions

scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5747-5) contains supplementary material, which is available to authorized users.

Collapse

121

Sun Z, Chen L, Xin H, Jiang Y, Huang Q, Cillo AR, Tabib T, Kolls JK, Bruno TC, Lafyatis R, Vignali DAA, Chen K, Ding Y, Hu M, Chen W. A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies. Nat Commun 2019;10:1649. [PMID: 30967541 PMCID: PMC6456731 DOI: 10.1038/s41467-019-09639-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 03/15/2019] [Indexed: 02/08/2023] Open

Affiliation(s)

Zhe Sun Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Li Chen Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, 36849, USA
Hongyi Xin Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA
Yale Jiang Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA.,School of Medicine, Tsinghua University, Beijing, 100084, China
Qianhui Huang Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
Anthony R Cillo Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA
Tracy Tabib Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Jay K Kolls School of Medicine, Tulane University, New Orleans, LA, 70112, USA
Tullia C Bruno Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA.,Tumor Microenvironment Center, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA
Robert Lafyatis Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Dario A A Vignali Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA.,Tumor Microenvironment Center, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA.,Cancer Immunology and Immunotherapy Program, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA
Kong Chen Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
Ying Ding Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
Ming Hu Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA.
Wei Chen Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA. .,Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA.

Collapse

122

Choi YH, Kim JK. Dissecting Cellular Heterogeneity Using Single-Cell RNA Sequencing. Mol Cells 2019;42:189-199. [PMID: 30764602 PMCID: PMC6449718 DOI: 10.14348/molcells.2019.2446] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 01/09/2019] [Accepted: 01/09/2019] [Indexed: 12/22/2022] Open

123

Diaz-Mejia JJ, Meng EC, Pico AR, MacParland SA, Ketela T, Pugh TJ, Bader GD, Morris JH. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019;8:ISCB Comm J-296. [PMID: 31508207 PMCID: PMC6720041 DOI: 10.12688/f1000research.18490.3] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/09/2019] [Indexed: 01/28/2023] Open

Abstract

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Collapse

124

Diaz-Mejia JJ, Meng EC, Pico AR, MacParland SA, Ketela T, Pugh TJ, Bader GD, Morris JH. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019;8:ISCB Comm J-296. [PMID: 31508207 PMCID: PMC6720041 DOI: 10.12688/f1000research.18490.1] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/08/2019] [Indexed: 12/11/2022] Open

125

Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018;7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.1] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2018] [Indexed: 01/21/2023] Open

126

Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018;7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.2] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/14/2018] [Indexed: 12/23/2022] Open