Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.1] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2018] [Indexed: 12/21/2022] Open

For:	Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 DOI: 10.12688/f1000research.15666.1] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/20/2018] [Indexed: 12/21/2022] Open

Number

Cited by Other Article(s)

Sin DD. What Single Cell RNA Sequencing Has Taught Us about Chronic Obstructive Pulmonary Disease. Tuberc Respir Dis (Seoul) 2024;87:252-260. [PMID: 38369875 PMCID: PMC11222093 DOI: 10.4046/trd.2024.0001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/17/2024] [Indexed: 02/20/2024] Open

Yarlagadda S, Giorgio TD. A guide to single-cell RNA sequencing analysis using web-based tools for non-bioinformatician. FEBS J 2024;291:2545-2561. [PMID: 38148322 DOI: 10.1111/febs.17036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/14/2023] [Indexed: 12/28/2023]

Ali M, Yang T, He H, Zhang Y. Plant biotechnology research with single-cell transcriptome: recent advancements and prospects. PLANT CELL REPORTS 2024;43:75. [PMID: 38381195 DOI: 10.1007/s00299-024-03168-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024]

Song D, Wang Q, Yan G, Liu T, Sun T, Li JJ. scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics. Nat Biotechnol 2024;42:247-252. [PMID: 37169966 PMCID: PMC11182337 DOI: 10.1038/s41587-023-01772-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 03/30/2023] [Indexed: 05/13/2023]

Carbonetto P, Luo K, Sarkar A, Hung A, Tayeb K, Pott S, Stephens M. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership. Genome Biol 2023;24:236. [PMID: 37858253 PMCID: PMC10588049 DOI: 10.1186/s13059-023-03067-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/21/2023] Open

Pan Y, Landis JT, Moorad R, Wu D, Marron JS, Dittmer DP. The Poisson distribution model fits UMI-based single-cell RNA-sequencing data. BMC Bioinformatics 2023;24:256. [PMID: 37330471 PMCID: PMC10276395 DOI: 10.1186/s12859-023-05349-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 05/24/2023] [Indexed: 06/19/2023] Open

Gao LL, Bien J, Witten D. Selective Inference for Hierarchical Clustering. J Am Stat Assoc 2022;119:332-342. [PMID: 38660582 PMCID: PMC11036349 DOI: 10.1080/01621459.2022.2116331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 08/16/2022] [Indexed: 10/17/2022]

LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data. Commun Biol 2022;5:577. [PMID: 35688990 PMCID: PMC9187761 DOI: 10.1038/s42003-022-03473-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 05/02/2022] [Indexed: 11/08/2022] Open

Spatially informed cell-type deconvolution for spatial transcriptomics. Nat Biotechnol 2022;40:1349-1359. [PMID: 35501392 PMCID: PMC9464662 DOI: 10.1038/s41587-022-01273-7] [Citation(s) in RCA: 115] [Impact Index Per Article: 57.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 03/07/2022] [Indexed: 12/16/2022]

Jovic D, Liang X, Zeng H, Lin L, Xu F, Luo Y. Single-cell RNA sequencing technologies and applications: A brief overview. Clin Transl Med 2022;12:e694. [PMID: 35352511 PMCID: PMC8964935 DOI: 10.1002/ctm2.694] [Citation(s) in RCA: 266] [Impact Index Per Article: 133.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 12/09/2021] [Accepted: 12/20/2021] [Indexed: 12/19/2022] Open

Li Z, Feng H. A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data. Sci Rep 2022;12:910. [PMID: 35042860 PMCID: PMC8766435 DOI: 10.1038/s41598-021-04473-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/21/2021] [Indexed: 02/01/2023] Open

Bej S, Galow AM, David R, Wolfien M, Wolkenhauer O. Automated annotation of rare-cell types from single-cell RNA-sequencing data through synthetic oversampling. BMC Bioinformatics 2021;22:557. [PMID: 34798805 PMCID: PMC8603509 DOI: 10.1186/s12859-021-04469-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 11/03/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The research landscape of single-cell and single-nuclei RNA-sequencing is evolving rapidly. In particular, the area for the detection of rare cells was highly facilitated by this technology. However, an automated, unbiased, and accurate annotation of rare subpopulations is challenging. Once rare cells are identified in one dataset, it is usually necessary to generate further specific datasets to enrich the analysis (e.g., with samples from other tissues). From a machine learning perspective, the challenge arises from the fact that rare-cell subpopulations constitute an imbalanced classification problem. We here introduce a Machine Learning (ML)-based oversampling method that uses gene expression counts of already identified rare cells as an input to generate synthetic cells to then identify similar (rare) cells in other publicly available experiments. We utilize single-cell synthetic oversampling (sc-SynO), which is based on the Localized Random Affine Shadowsampling (LoRAS) algorithm. The algorithm corrects for the overall imbalance ratio of the minority and majority class.

RESULTS

We demonstrate the effectiveness of our method for three independent use cases, each consisting of already published datasets. The first use case identifies cardiac glial cells in snRNA-Seq data (17 nuclei out of 8635). This use case was designed to take a larger imbalance ratio (~1 to 500) into account and only uses single-nuclei data. The second use case was designed to jointly use snRNA-Seq data and scRNA-Seq on a lower imbalance ratio (~1 to 26) for the training step to likewise investigate the potential of the algorithm to consider both single-cell capture procedures and the impact of "less" rare-cell types. The third dataset refers to the murine data of the Allen Brain Atlas, including more than 1 million cells. For validation purposes only, all datasets have also been analyzed traditionally using common data analysis approaches, such as the Seurat workflow.

CONCLUSIONS

In comparison to baseline testing without oversampling, our approach identifies rare-cells with a robust precision-recall balance, including a high accuracy and low false positive detection rate. A practical benefit of our algorithm is that it can be readily implemented in other and existing workflows. The code basis in R and Python is publicly available at FairdomHub, as well as GitHub, and can easily be transferred to identify other rare-cell types.

Collapse

Zappia L, Theis FJ. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol 2021;22:301. [PMID: 34715899 PMCID: PMC8555270 DOI: 10.1186/s13059-021-02519-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open

Liu S, Thennavan A, Garay JP, Marron JS, Perou CM. MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data. Genome Biol 2021;22:232. [PMID: 34412669 PMCID: PMC8375188 DOI: 10.1186/s13059-021-02445-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 07/29/2021] [Indexed: 01/02/2023] Open

Lütge M, Pikor NB, Ludewig B. Differentiation and activation of fibroblastic reticular cells. Immunol Rev 2021;302:32-46. [PMID: 34046914 PMCID: PMC8361914 DOI: 10.1111/imr.12981] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/17/2021] [Accepted: 04/30/2021] [Indexed: 12/29/2022]

Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021;36:89-108. [PMID: 34305304 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Risso D. Normalization of Single-Cell RNA-Seq Data. Methods Mol Biol 2021;2284:303-329. [PMID: 33835450 DOI: 10.1007/978-1-0716-1307-8_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Li Y, Xu Q, Wu D, Chen G. Exploring Additional Valuable Information From Single-Cell RNA-Seq Data. Front Cell Dev Biol 2020;8:593007. [PMID: 33335900 PMCID: PMC7736616 DOI: 10.3389/fcell.2020.593007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 10/26/2020] [Indexed: 12/28/2022] Open

Ye P, Ye W, Ye C, Li S, Ye L, Ji G, Wu X. scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size. Bioinformatics 2020;36:789-797. [PMID: 31392316 DOI: 10.1093/bioinformatics/btz627] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 07/18/2019] [Accepted: 08/06/2019] [Indexed: 01/18/2023] Open

Abstract

MOTIVATION

Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes.

RESULTS

We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth.

AVAILABILITY AND IMPLEMENTATION

Freely available for download at https://github.com/BMILAB/scHinter.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Kim TH, Zhou X, Chen M. Demystifying "drop-outs" in single-cell UMI data. Genome Biol 2020;21:196. [PMID: 32762710 PMCID: PMC7412673 DOI: 10.1186/s13059-020-02096-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 07/08/2020] [Indexed: 01/10/2023] Open

Flexible experimental designs for valid single-cell RNA-sequencing experiments allowing batch effects correction. Nat Commun 2020;11:3274. [PMID: 32612268 PMCID: PMC7330047 DOI: 10.1038/s41467-020-16905-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Accepted: 05/29/2020] [Indexed: 01/22/2023] Open

Network-Based Single-Cell RNA-Seq Data Imputation Enhances Cell Type Identification. Genes (Basel) 2020;11:genes11040377. [PMID: 32244427 PMCID: PMC7230610 DOI: 10.3390/genes11040377] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/24/2020] [Accepted: 03/24/2020] [Indexed: 12/14/2022] Open

Huh R, Yang Y, Jiang Y, Shen Y, Li Y. SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble. Nucleic Acids Res 2020;48:86-95. [PMID: 31777938 PMCID: PMC6943136 DOI: 10.1093/nar/gkz959] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 10/03/2019] [Accepted: 10/10/2019] [Indexed: 12/19/2022] Open

Embracing the dropouts in single-cell RNA-seq analysis. Nat Commun 2020;11:1169. [PMID: 32127540 PMCID: PMC7054558 DOI: 10.1038/s41467-020-14976-9] [Citation(s) in RCA: 153] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Accepted: 02/11/2020] [Indexed: 11/08/2022] Open

Casey MJ, Stumpf PS, MacArthur BD. Theory of cell fate. WILEY INTERDISCIPLINARY REVIEWS. SYSTEMS BIOLOGY AND MEDICINE 2020;12:e1471. [PMID: 31828979 PMCID: PMC7027507 DOI: 10.1002/wsbm.1471] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/15/2019] [Accepted: 11/06/2019] [Indexed: 11/17/2022]

Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020;21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 564] [Impact Index Per Article: 141.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open

Affiliation(s)

David Lähnemann Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Johannes Köster Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
Ewa Szczurek Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Davis J. McCarthy Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
Stephanie C. Hicks Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
Mark D. Robinson Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
Catalina A. Vallejos MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK The Alan Turing Institute, British Library, London, UK
Kieran R. Campbell Department of Statistics, University of British Columbia, Vancouver, Canada Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Data Science Institute, University of British Columbia, Vancouver, Canada
Niko Beerenwinkel Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Ahmed Mahfouz Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Luca Pinello Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA Department of Pathology, Harvard Medical School, Boston, USA Broad Institute of Harvard and MIT, Cambridge, MA USA
Pavel Skums Department of Computer Science, Georgia State University, Atlanta, USA
Alexandros Stamatakis Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Camille Stephan-Otto Attolini Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Spain
Samuel Aparicio Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
Jasmijn Baaijens Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
Marleen Balvert Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
Buys de Barbanson Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Antonio Cappuccio Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
Giacomo Corleone Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
Bas E. Dutilh Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
Maria Florescu Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
Victor Guryev European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Rens Holmer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Katharina Jahn Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
Thamar Jessurun Lobo European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
Emma M. Keizer Biometris, Wageningen University & Research, Wageningen, The Netherlands
Indu Khatri Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
Szymon M. Kielbasa Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
Jan O. Korbel Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
Alexey M. Kozlov Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
Tzu-Hao Kuo Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Boudewijn P.F. Lelieveldt PRB lab, Delft University of Technology, Delft, The Netherlands Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Ion I. Mandoiu Computer Science & Engineering Department, University of Connecticut, Storrs, USA
John C. Marioni Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
Tobias Marschall Center for Bioinformatics, Saarland University, Saarbrücken, Germany Max Planck Institute for Informatics, Saarbrücken, Germany
Felix Mölder Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
Amir Niknejad Computation molecular design, Zuse Institute Berlin, Berlin, Germany Mathematics Department, Mount Saint Vincent, New York, USA
Alicja Rączkowska Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
Marcel Reinders Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
Jeroen de Ridder Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands Oncode Institute, Utrecht, The Netherlands
Antoine-Emmanuel Saliba Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
Antonios Somarakis Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
Oliver Stegle Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
Fabian J. Theis Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
Huan Yang Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
Alex Zelikovsky Department of Computer Science, Georgia State University, Atlanta, USA The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
Alice C. McHardy Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
Benjamin J. Raphael Department of Computer Science, Princeton University, Princeton, USA
Sohrab P. Shah Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
Alexander Schönhuth Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands

Collapse

Geddes TA, Kim T, Nan L, Burchfield JG, Yang JYH, Tao D, Yang P. Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis. BMC Bioinformatics 2019;20:660. [PMID: 31870278 PMCID: PMC6929272 DOI: 10.1186/s12859-019-3179-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 10/28/2019] [Indexed: 01/23/2023] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An essential task in scRNA-seq data analysis is the identification of cell types from complex samples or tissues profiled in an experiment. To this end, clustering has become a key computational technique for grouping cells based on their transcriptome profiles, enabling subsequent cell type identification from each cluster of cells. Due to the high feature-dimensionality of the transcriptome (i.e. the large number of measured genes in each cell) and because only a small fraction of genes are cell type-specific and therefore informative for generating cell type-specific clusters, clustering directly on the original feature/gene dimension may lead to uninformative clusters and hinder correct cell type identification.

Results

Here, we propose an autoencoder-based cluster ensemble framework in which we first take random subspace projections from the data, then compress each random projection to a low-dimensional space using an autoencoder artificial neural network, and finally apply ensemble clustering across all encoded datasets to generate clusters of cells. We employ four evaluation metrics to benchmark clustering performance and our experiments demonstrate that the proposed autoencoder-based cluster ensemble can lead to substantially improved cell type-specific clusters when applied with both the standard k-means clustering algorithm and a state-of-the-art kernel-based clustering algorithm (SIMLR) designed specifically for scRNA-seq data. Compared to directly using these clustering algorithms on the original datasets, the performance improvement in some cases is up to 100%, depending on the evaluation metric used.

Conclusions

Our results suggest that the proposed framework can facilitate more accurate cell type identification as well as other downstream analyses. The code for creating the proposed autoencoder-based cluster ensemble framework is freely available from https://github.com/gedcom/scCCESS

Collapse

Cao Y, Lin Y, Ormerod JT, Yang P, Yang JYH, Lo KK. scDC: single cell differential composition analysis. BMC Bioinformatics 2019;20:721. [PMID: 31870280 PMCID: PMC6929335 DOI: 10.1186/s12859-019-3211-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Accepted: 11/12/2019] [Indexed: 11/23/2022] Open

Abstract

BACKGROUND

Differences in cell-type composition across subjects and conditions often carry biological significance. Recent advancements in single cell sequencing technologies enable cell-types to be identified at the single cell level, and as a result, cell-type composition of tissues can now be studied in exquisite detail. However, a number of challenges remain with cell-type composition analysis - none of the existing methods can identify cell-type perfectly and variability related to cell sampling exists in any single cell experiment. This necessitates the development of method for estimating uncertainty in cell-type composition.

RESULTS

We developed a novel single cell differential composition (scDC) analysis method that performs differential cell-type composition analysis via bootstrap resampling. scDC captures the uncertainty associated with cell-type proportions of each subject via bias-corrected and accelerated bootstrap confidence intervals. We assessed the performance of our method using a number of simulated datasets and synthetic datasets curated from publicly available single cell datasets. In simulated datasets, scDC correctly recovered the true cell-type proportions. In synthetic datasets, the cell-type compositions returned by scDC were highly concordant with reference cell-type compositions from the original data. Since the majority of datasets tested in this study have only 2 to 5 subjects per condition, the addition of confidence intervals enabled better comparisons of compositional differences between subjects and across conditions.

CONCLUSIONS

scDC is a novel statistical method for performing differential cell-type composition analysis for scRNA-seq data. It uses bootstrap resampling to estimate the standard errors associated with cell-type proportion estimates and performs significance testing through GLM and GLMM models. We have made this method available to the scientific community as part of the scdney package (Single Cell Data Integrative Analysis) R package, available from https://github.com/SydneyBioX/scdney.

Collapse

Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019;20:295. [PMID: 31870412 PMCID: PMC6927135 DOI: 10.1186/s13059-019-1861-6] [Citation(s) in RCA: 206] [Impact Index Per Article: 41.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 12/23/2022] Open

Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019;20:295. [PMID: 31870412 DOI: 10.1101/574574] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 05/24/2023] Open

Cheng C, Easton J, Rosencrance C, Li Y, Ju B, Williams J, Mulder HL, Pang Y, Chen W, Chen X. Latent cellular analysis robustly reveals subtle diversity in large-scale single-cell RNA-seq data. Nucleic Acids Res 2019;47:e143. [PMID: 31566233 PMCID: PMC6902034 DOI: 10.1093/nar/gkz826] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 08/30/2019] [Accepted: 09/26/2019] [Indexed: 12/21/2022] Open

Krzak M, Raykov Y, Boukouvalas A, Cutillo L, Angelini C. Benchmark and Parameter Sensitivity Analysis of Single-Cell RNA Sequencing Clustering Methods. Front Genet 2019;10:1253. [PMID: 31921297 PMCID: PMC6918801 DOI: 10.3389/fgene.2019.01253] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/13/2019] [Indexed: 01/04/2023] Open

Abstract

Single-cell RNA-seq (scRNAseq) is a powerful tool to study heterogeneity of cells. Recently, several clustering based methods have been proposed to identify distinct cell populations. These methods are based on different statistical models and usually require to perform several additional steps, such as preprocessing or dimension reduction, before applying the clustering algorithm. Individual steps are often controlled by method-specific parameters, permitting the method to be used in different modes on the same datasets, depending on the user choices. The large number of possibilities that these methods provide can intimidate non-expert users, since the available choices are not always clearly documented. In addition, to date, no large studies have invistigated the role and the impact that these choices can have in different experimental contexts. This work aims to provide new insights into the advantages and drawbacks of scRNAseq clustering methods and describe the ranges of possibilities that are offered to users. In particular, we provide an extensive evaluation of several methods with respect to different modes of usage and parameter settings by applying them to real and simulated datasets that vary in terms of dimensionality, number of cell populations or levels of noise. Remarkably, the results presented here show that great variability in the performance of the models is strongly attributed to the choice of the user-specific parameter settings. We describe several tendencies in the performance attributed to their modes of usage and different types of datasets, and identify which methods are strongly affected by data dimensionality in terms of computational time. Finally, we highlight some open challenges in scRNAseq data clustering, such as those related to the identification of the number of clusters.

Collapse

Sun S, Zhu J, Ma Y, Zhou X. Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis. Genome Biol 2019;20:269. [PMID: 31823809 PMCID: PMC6902413 DOI: 10.1186/s13059-019-1898-6] [Citation(s) in RCA: 108] [Impact Index Per Article: 21.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 11/22/2019] [Indexed: 01/01/2023] Open

Chaudhry F, Isherwood J, Bawa T, Patel D, Gurdziel K, Lanfear DE, Ruden DM, Levy PD. Single-Cell RNA Sequencing of the Cardiovascular System: New Looks for Old Diseases. Front Cardiovasc Med 2019;6:173. [PMID: 31921894 PMCID: PMC6914766 DOI: 10.3389/fcvm.2019.00173] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 11/12/2019] [Indexed: 12/18/2022] Open

Xu G, Liu Y, Li H, Liu L, Zhang S, Zhang Z. Dissecting the human immune system with single cell RNA sequencing technology. J Leukoc Biol 2019;107:613-623. [PMID: 31803960 DOI: 10.1002/jlb.5mr1019-179r] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 10/24/2019] [Accepted: 11/13/2019] [Indexed: 12/23/2022] Open

Affiliation(s)

Gang Xu Institute of Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, the Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, Guangdong Province, China.,Guangdong Key Lab of Emerging Infectious Diseases, Shenzhen Third People's Hospital, Longgang District, Shenzhen, China
Yang Liu Institute of Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, the Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, Guangdong Province, China.,Guangdong Key Lab of Emerging Infectious Diseases, Shenzhen Third People's Hospital, Longgang District, Shenzhen, China
Hanjie Li Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Lei Liu Institute of Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, the Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, Guangdong Province, China.,Guangdong Key Lab of Emerging Infectious Diseases, Shenzhen Third People's Hospital, Longgang District, Shenzhen, China
Shuye Zhang Shanghai Public Health Clinical Center and Institute of Biomedical Sciences, Fudan University, Shanghai, China
Zheng Zhang Institute of Hepatology, National Clinical Research Center for Infectious Disease, Shenzhen Third People's Hospital, the Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, Guangdong Province, China.,Guangdong Key Lab of Emerging Infectious Diseases, Shenzhen Third People's Hospital, Longgang District, Shenzhen, China.,Key Laboratory of Immunology, Sino-French Hoffmann Institute, School of Basic Medical Sciences; Guangdong Provincial Key Laboratory of Allergy & Clinical Immunology, The Second Affiliated Hospital, Guangzhou Medical University, Guangzhou, China

Collapse

Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat Methods 2019;16:1007-1015. [PMID: 31501550 DOI: 10.1038/s41592-019-0529-1] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 07/16/2019] [Indexed: 01/23/2023]

Yu X, Chen YA, Conejo-Garcia JR, Chung CH, Wang X. Estimation of immune cell content in tumor using single-cell RNA-seq reference data. BMC Cancer 2019;19:715. [PMID: 31324168 PMCID: PMC6642583 DOI: 10.1186/s12885-019-5927-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Accepted: 07/12/2019] [Indexed: 12/12/2022] Open

Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, Boulesteix AL, Saeys Y, Robinson MD. Essential guidelines for computational method benchmarking. Genome Biol 2019;20:125. [PMID: 31221194 PMCID: PMC6584985 DOI: 10.1186/s13059-019-1738-8] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 2019;15:e8746. [PMID: 31217225 PMCID: PMC6582955 DOI: 10.15252/msb.20188746] [Citation(s) in RCA: 953] [Impact Index Per Article: 190.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 03/15/2019] [Accepted: 04/03/2019] [Indexed: 12/21/2022] Open

Crow M, Gillis J. Single cell RNA-sequencing: replicability of cell types. Curr Opin Neurobiol 2019;56:69-77. [PMID: 30654233 PMCID: PMC6551252 DOI: 10.1016/j.conb.2018.12.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Revised: 12/03/2018] [Accepted: 12/09/2018] [Indexed: 01/09/2023]

Tian L, Dong X, Freytag S, Lê Cao KA, Su S, JalalAbadi A, Amann-Zalcenstein D, Weber TS, Seidi A, Jabbari JS, Naik SH, Ritchie ME. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat Methods 2019;16:479-487. [DOI: 10.1038/s41592-019-0425-8] [Citation(s) in RCA: 183] [Impact Index Per Article: 36.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 04/18/2019] [Indexed: 11/09/2022]

Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X. scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data. BMC Genomics 2019;20:347. [PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/29/2019] [Indexed: 12/15/2022] Open

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data.

Results

We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools.

Conclusions

scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5747-5) contains supplementary material, which is available to authorized users.

Collapse

Sun Z, Chen L, Xin H, Jiang Y, Huang Q, Cillo AR, Tabib T, Kolls JK, Bruno TC, Lafyatis R, Vignali DAA, Chen K, Ding Y, Hu M, Chen W. A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies. Nat Commun 2019;10:1649. [PMID: 30967541 PMCID: PMC6456731 DOI: 10.1038/s41467-019-09639-3] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 03/15/2019] [Indexed: 02/08/2023] Open

Affiliation(s)

Zhe Sun Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Li Chen Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, 36849, USA
Hongyi Xin Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA
Yale Jiang Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA.,School of Medicine, Tsinghua University, Beijing, 100084, China
Qianhui Huang Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI, 48109, USA
Anthony R Cillo Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA
Tracy Tabib Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Jay K Kolls School of Medicine, Tulane University, New Orleans, LA, 70112, USA
Tullia C Bruno Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA.,Tumor Microenvironment Center, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA
Robert Lafyatis Division of Rheumatology and Clinical Immunology, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA
Dario A A Vignali Department of Immunology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15262, USA.,Tumor Microenvironment Center, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA.,Cancer Immunology and Immunotherapy Program, UPMC Hillman Cancer Center, Pittsburgh, PA, 15232, USA
Kong Chen Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15213, USA
Ying Ding Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
Ming Hu Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, USA.
Wei Chen Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, 15261, USA. .,Division of Pulmonary Medicine, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh, Pittsburgh, PA, 15224, USA.

Collapse

Diaz-Mejia JJ, Meng EC, Pico AR, MacParland SA, Ketela T, Pugh TJ, Bader GD, Morris JH. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019;8:ISCB Comm J-296. [PMID: 31508207 PMCID: PMC6720041 DOI: 10.12688/f1000research.18490.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/19/2019] [Indexed: 10/15/2023] Open

Abstract

Background: Identification of cell type subpopulations from complex cell mixtures using single-cell RNA-sequencing (scRNA-seq) data includes automated steps from normalization to cell clustering. However, assigning cell type labels to cell clusters is often conducted manually, resulting in limited documentation, low reproducibility and uncontrolled vocabularies. This is partially due to the scarcity of reference cell type signatures and because some methods support limited cell type signatures. Methods: In this study, we benchmarked five methods representing first-generation enrichment analysis (ORA), second-generation approaches (GSEA and GSVA), machine learning tools (CIBERSORT) and network-based neighbor voting (METANEIGHBOR), for the task of assigning cell type labels to cell clusters from scRNA-seq data. We used five scRNA-seq datasets: human liver, 11 Tabula Muris mouse tissues, two human peripheral blood mononuclear cell datasets, and mouse retinal neurons, for which reference cell type signatures were available. The datasets span Drop-seq, 10X Chromium and Seq-Well technologies and range in size from ~3,700 to ~68,000 cells. Results: Our results show that, in general, all five methods perform well in the task as evaluated by receiver operating characteristic curve analysis (average area under the curve (AUC) = 0.91, sd = 0.06), whereas precision-recall analyses show a wide variation depending on the method and dataset (average AUC = 0.53, sd = 0.24). We observed an influence of the number of genes in cell type signatures on performance, with smaller signatures leading more frequently to incorrect results. Conclusions: GSVA was the overall top performer and was more robust in cell type signature subsampling simulations, although different methods performed well using different datasets. METANEIGHBOR and GSVA were the fastest methods. CIBERSORT and METANEIGHBOR were more influenced than the other methods by analyses including only expected cell types. We provide an extensible framework that can be used to evaluate other methods and datasets at https://github.com/jdime/scRNAseq_cell_cluster_labeling.

Collapse

Diaz-Mejia JJ, Meng EC, Pico AR, MacParland SA, Ketela T, Pugh TJ, Bader GD, Morris JH. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019;8:ISCB Comm J-296. [PMID: 31508207 PMCID: PMC6720041 DOI: 10.12688/f1000research.18490.3] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/09/2019] [Indexed: 01/28/2023] Open

Abstract

Collapse

Diaz-Mejia JJ, Meng EC, Pico AR, MacParland SA, Ketela T, Pugh TJ, Bader GD, Morris JH. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000Res 2019;8:ISCB Comm J-296. [PMID: 31508207 PMCID: PMC6720041 DOI: 10.12688/f1000research.18490.1] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/08/2019] [Indexed: 12/11/2022] Open

Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018;7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.1] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2018] [Indexed: 01/21/2023] Open

Freytag S, Tian L, Lönnstedt I, Ng M, Bahlo M. Comparison of clustering tools in R for medium-sized 10x Genomics single-cell RNA-sequencing data. F1000Res 2018;7:1297. [PMID: 30228881 PMCID: PMC6124389 DOI: 10.12688/f1000research.15809.2] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/14/2018] [Indexed: 12/23/2022] Open