Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open

For:	Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Res 2018;7:1141. [PMID: 30271584 PMCID: PMC6134335 DOI: 10.12688/f1000research.15666.3] [Citation(s) in RCA: 120] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/04/2020] [Indexed: 02/05/2023] Open

Number

Cited by Other Article(s)

Luo Q, Chen Y, Lan X. COMSE: analysis of single-cell RNA-seq data using community detection-based feature selection. BMC Biol 2024;22:167. [PMID: 39113021 PMCID: PMC11304914 DOI: 10.1186/s12915-024-01963-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 07/10/2024] [Indexed: 08/10/2024] Open

Singh A, Khiabanian H. Feature selection followed by a novel residuals-based normalization that includes variance stabilization simplifies and improves single-cell gene expression analysis. BMC Bioinformatics 2024;25:248. [PMID: 39080559 PMCID: PMC11290295 DOI: 10.1186/s12859-024-05872-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open

Fang C, Selega A, Campbell KR. Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance. Genome Biol 2024;25:159. [PMID: 38886757 PMCID: PMC11184819 DOI: 10.1186/s13059-024-03304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open

Caron DP, Specht WL, Chen D, Wells SB, Szabo PA, Jensen IJ, Farber DL, Sims PA. Multimodal hierarchical classification of CITE-seq data delineates immune cell states across lineages and tissues. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.06.547944. [PMID: 37461466 PMCID: PMC10350048 DOI: 10.1101/2023.07.06.547944] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]

Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform 2024;25:bbae216. [PMID: 38725155 PMCID: PMC11082074 DOI: 10.1093/bib/bbae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/01/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024] Open

Zhang K, Zemke NR, Armand EJ, Ren B. A fast, scalable and versatile tool for analysis of single-cell omics data. Nat Methods 2024;21:217-227. [PMID: 38191932 PMCID: PMC10864184 DOI: 10.1038/s41592-023-02139-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 11/23/2023] [Indexed: 01/10/2024]

Swaminath S, Russell AB. The use of single-cell RNA-seq to study heterogeneity at varying levels of virus-host interactions. PLoS Pathog 2024;20:e1011898. [PMID: 38236826 PMCID: PMC10796064 DOI: 10.1371/journal.ppat.1011898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open

Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.572214. [PMID: 38187768 PMCID: PMC10769271 DOI: 10.1101/2023.12.18.572214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]

Karakurt HU, Pir P. SUMA: a lightweight machine learning model-powered shared nearest neighbour-based clustering application interface for scRNA-Seq data. Turk J Biol 2023;47:413-422. [PMID: 38681777 PMCID: PMC11045205 DOI: 10.55730/1300-0152.2675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open

Abstract

Background/aim

Single-cell transcriptomics (scRNA-Seq) explores cellular diversity at the gene expression level. Due to the inherent sparsity and noise in scRNA-Seq data and the uncertainty on the types of sequenced cells, effective clustering and cell type annotation are essential. The graph-based clustering of scRNA-Seq data is a simple yet powerful approach that presents data as a "shared nearest neighbour" graph and clusters the cells using graph clustering algorithms. These algorithms are dependent on several user-defined parameters.Here we present SUMA, a lightweight tool that uses a random forest model to predict the optimum number of neighbours to obtain the optimum clustering results. Moreover, we integrated our method with other commonly used methods in an RShiny application. SUMA can be used in a local environment (https://github.com/hkarakurt8742/SUMA) or as a browser tool (https://hkarakurt.shinyapps.io/suma/).

Materials and methods

Publicly available scRNA-Seq datasets and 3 different graph-based clustering algorithms were used to develop SUMA, and a large range for number of neighbours and variant genes was taken into consideration. The quality of clustering was assessed using the adjusted Rand index (ARI) and true labels of each dataset. The data were split into training and test datasets, and the model was built and optimised using Scikit-learn (Python) and randomForest (R) libraries.

Results

The accuracy of our machine learning model was 0.96, while the AUC of the ROC curve was 0.98. The model indicated that the number of cells in scRNA-Seq data is the most important feature when deciding the number of neighbours.

Conclusion

We developed and evaluated the SUMA model and implemented the method in the SUMAShiny app, which integrates SUMA with different clustering methods and enables nonbioinformatician users to cluster and visualise their scRNA data easily. The SUMAShiny app is available both for desktop and browser use.

Collapse

Chen YT, Gao LL. Testing for a difference in means of a single feature after clustering. ARXIV 2023:arXiv:2311.16375v1. [PMID: 38076519 PMCID: PMC10705581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]

Atitey K, Motsinger-Reif AA, Anchang B. Model-based evaluation of spatiotemporal data reduction methods with unknown ground truth through optimal visualization and interpretability metrics. Brief Bioinform 2023;25:bbad455. [PMID: 38113074 PMCID: PMC10729792 DOI: 10.1093/bib/bbad455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/06/2023] [Accepted: 11/20/2023] [Indexed: 12/21/2023] Open

Abstract

Optimizing and benchmarking data reduction methods for dynamic or spatial visualization and interpretation (DSVI) face challenges due to many factors, including data complexity, lack of ground truth, time-dependent metrics, dimensionality bias and different visual mappings of the same data. Current studies often focus on independent static visualization or interpretability metrics that require ground truth. To overcome this limitation, we propose the MIBCOVIS framework, a comprehensive and interpretable benchmarking and computational approach. MIBCOVIS enhances the visualization and interpretability of high-dimensional data without relying on ground truth by integrating five robust metrics, including a novel time-ordered Markov-based structural metric, into a semi-supervised hierarchical Bayesian model. The framework assesses method accuracy and considers interaction effects among metric features. We apply MIBCOVIS using linear and nonlinear dimensionality reduction methods to evaluate optimal DSVI for four distinct dynamic and spatial biological processes captured by three single-cell data modalities: CyTOF, scRNA-seq and CODEX. These data vary in complexity based on feature dimensionality, unknown cell types and dynamic or spatial differences. Unlike traditional single-summary score approaches, MIBCOVIS compares accuracy distributions across methods. Our findings underscore the joint evaluation of visualization and interpretability, rather than relying on separate metrics. We reveal that prioritizing average performance can obscure method feature performance. Additionally, we explore the impact of data complexity on visualization and interpretability. Specifically, we provide optimal parameters and features and recommend methods, like the optimized variational contractive autoencoder, for targeted DSVI for various data complexities. MIBCOVIS shows promise for evaluating dynamic single-cell atlases and spatiotemporal data reduction models.

Collapse

Domingo J, Kutsyr-Kolesnyk O, Leon T, Perez-Moraga R, Ayala G, Roson B. A cell abundance analysis based on efficient PAM clustering for a better understanding of the dynamics of endometrial remodelling. BMC Bioinformatics 2023;24:440. [PMID: 37990148 PMCID: PMC10664584 DOI: 10.1186/s12859-023-05569-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 11/15/2023] [Indexed: 11/23/2023] Open

Abstract

BACKGROUND

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for investigating cell abundance changes during tissue regeneration and remodeling processes. Differential cell abundance supports the initial clustering of all cells; then, the number of cells per cluster and sample are evaluated, and the dependence of these counts concerning the phenotypic covariates of the samples is studied. Analysis heavily depends on the clustering method. Partitioning Around Medoids (PAM or k-medoids) represents a well-established clustering procedure that leverages the downstream interpretation of clusters by pinpointing real individuals in the dataset as cluster centers (medoids) without reducing dimensions. Of note, PAM suffers from high computational costs and memory requirements.

RESULTS

This paper proposes a method for differential abundance analysis using PAM as a clustering method and negative binomial regression as a statistical model to relate covariates to cluster/cell counts. We used this approach to study the differential cell abundance of human endometrial cell types throughout the natural secretory phase of the menstrual cycle. We developed a new R package -scellpam-, that incorporates an efficient parallel C++ implementation of PAM, and applied this package in this study. We compared the PAM-BS clustering method with other methods and evaluated both the computational aspects of its implementation and the quality of the classifications obtained using distinct published datasets with known subpopulations that demonstrate promising results.

CONCLUSIONS

The implementation of PAM-BS, included in the scellpam package, exhibits robust performance in terms of speed and memory usage compared to other related methods. PAM allowed quick and robust clustering of sets of cells with a size ranging from 70,000 to 300,000 cells. https://cran.r-project.org/web/packages/scellpam/index.html . Finally, our approach provides important new insights into the transient subpopulations associated with the fertile time frame when applied to the study of changes in the human endometrium during the secretory phase of the menstrual cycle.

Collapse

Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of single cell multi-omics and spatial data guided by gene regulatory networks and cell-cell interactions. RESEARCH SQUARE 2023:rs.3.rs-3301625. [PMID: 37790516 PMCID: PMC10543280 DOI: 10.21203/rs.3.rs-3301625/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]

Zhang K, Zemke NR, Armand EJ, Ren B. SnapATAC2: a fast, scalable and versatile tool for analysis of single-cell omics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557221. [PMID: 37745443 PMCID: PMC10515871 DOI: 10.1101/2023.09.11.557221] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]

Domingo J, Leon T, Dura E. Scellpam: an R package/C++ library to perform parallel partitioning around medoids on scRNAseq data sets. BMC Bioinformatics 2023;24:342. [PMID: 37710192 PMCID: PMC10503022 DOI: 10.1186/s12859-023-05471-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 09/08/2023] [Indexed: 09/16/2023] Open

Abstract

BACKGROUND

Partitioning around medoids (PAM) is one of the most widely used and successful clustering method in many fields. One of its key advantages is that it only requires a distance or a dissimilarity between the individuals, and the fact that cluster centers are actual points in the data set means they can be taken as reliable representatives of their classes. However, its wider application is hampered by the large amount of memory needed to store the distance matrix (quadratic on the number of individuals) and also by the high computational cost of computing such distance matrix and, less importantly, by the cost of the clustering algorithm itself.

RESULTS

Therefore, new software has been provided that addresses these issues. This software, provided under GPL license and usable as either an R package or a C++ library, calculates in parallel the distance matrix for different distances/dissimilarities ([Formula: see text], [Formula: see text], Pearson, cosine and weighted Euclidean) and also implements a parallel fast version of PAM (FASTPAM1) using any data type to reduce memory usage. Moreover, the parallel implementation uses all the cores available in modern computers which greatly reduces the execution time. Besides its general application, the software is especially useful for processing data of single cell experiments. It has been tested in problems including clustering of single cell experiments with up to 289,000 cells with the expression of about 29,000 genes per cell.

CONCLUSIONS

Comparisons with other current packages in terms of execution time have been made. The method greatly outperforms the available R packages for distance matrix calculation and also improves the packages that implement the PAM itself. The software is available as an R package at https://CRAN.R-project.org/package=scellpam and as C++ libraries at https://github.com/JdMDE/jmatlib and https://github.com/JdMDE/ppamlib The package is useful for single cell RNA-seq studies but it is also applicable in other contexts where clustering of large data sets is required.

Collapse

Odaka M, Magnin M, Inoue K. Gene network inference from single-cell omics data and domain knowledge for constructing COVID-19-specific ICAM1-associated pathways. Front Genet 2023;14:1250545. [PMID: 37719701 PMCID: PMC10501835 DOI: 10.3389/fgene.2023.1250545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 08/16/2023] [Indexed: 09/19/2023] Open

Abstract

Introduction: Intercellular adhesion molecule 1 (ICAM-1) is a critical molecule responsible for interactions between cells. Previous studies have suggested that ICAM-1 triggers cell-to-cell transmission of HIV-1 or HTLV-1, that SARS-CoV-2 shares several features with these viruses via interactions between cells, and that SARS-CoV-2 cell-to-cell transmission is associated with COVID-19 severity. From these previous arguments, it is assumed that ICAM-1 can be related to SARS-CoV-2 cell-to-cell transmission in COVID-19 patients. Indeed, the time-dependent change of the ICAM-1 expression level has been detected in COVID-19 patients. However, signaling pathways that consist of ICAM-1 and other molecules interacting with ICAM-1 are not identified in COVID-19. For example, the current COVID-19 Disease Map has no entry for those pathways. Therefore, discovering unknown ICAM1-associated pathways will be indispensable for clarifying the mechanism of COVID-19. Materials and methods: This study builds ICAM1-associated pathways by gene network inference from single-cell omics data and multiple knowledge bases. First, single-cell omics data analysis extracts coexpressed genes with significant differences in expression levels with spurious correlations removed. Second, knowledge bases validate the models. Finally, mapping the models onto existing pathways identifies new ICAM1-associated pathways. Results: Comparison of the obtained pathways between different cell types and time points reproduces the known pathways and indicates the following two unknown pathways: (1) upstream pathway that includes proteins in the non-canonical NF-κB pathway and (2) downstream pathway that contains integrins and cytoskeleton or motor proteins for cell transformation. Discussion: In this way, data-driven and knowledge-based approaches are integrated into gene network inference for ICAM1-associated pathway construction. The results can contribute to repairing and completing the COVID-19 Disease Map, thereby improving our understanding of the mechanism of COVID-19.

Collapse

Song D, Li K, Ge X, Li JJ. ClusterDE: a post-clustering differential expression (DE) method robust to false-positive inflation caused by double dipping. RESEARCH SQUARE 2023:rs.3.rs-3211191. [PMID: 37577698 PMCID: PMC10418557 DOI: 10.21203/rs.3.rs-3211191/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]

Dudchenko O, Ordovas-Montanes J, Bingle CD. Respiratory epithelial cell types, states and fates in the era of single-cell RNA-sequencing. Biochem J 2023;480:921-939. [PMID: 37410389 PMCID: PMC10422933 DOI: 10.1042/bcj20220572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 06/19/2023] [Accepted: 06/20/2023] [Indexed: 07/07/2023]

Arts JA, Laberthonnière C, Lima Cunha D, Zhou H. Single-Cell RNA Sequencing: Opportunities and Challenges for Studies on Corneal Biology in Health and Disease. Cells 2023;12:1808. [PMID: 37443842 PMCID: PMC10340756 DOI: 10.3390/cells12131808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/15/2023] Open

Eid SA, Noureldein M, Kim B, Hinder LM, Mendelson FE, Hayes JM, Hur J, Feldman EL. Single-cell RNA-seq uncovers novel metabolic functions of Schwann cells beyond myelination. J Neurochem 2023;166:367-388. [PMID: 37328915 PMCID: PMC11141588 DOI: 10.1111/jnc.15877] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/18/2023]

Van de Sande B, Lee JS, Mutasa-Gottgens E, Naughton B, Bacon W, Manning J, Wang Y, Pollard J, Mendez M, Hill J, Kumar N, Cao X, Chen X, Khaladkar M, Wen J, Leach A, Ferran E. Applications of single-cell RNA sequencing in drug discovery and development. Nat Rev Drug Discov 2023;22:496-520. [PMID: 37117846 PMCID: PMC10141847 DOI: 10.1038/s41573-023-00688-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2023] [Indexed: 04/30/2023]

Zhu J, Yang Y. scMEB: a fast and clustering-independent method for detecting differentially expressed genes in single-cell RNA-seq data. BMC Genomics 2023;24:280. [PMID: 37231345 DOI: 10.1186/s12864-023-09374-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2022] [Accepted: 05/11/2023] [Indexed: 05/27/2023] Open

Chen YT, Witten DM. Selective inference for k-means clustering. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2023;24:152. [PMID: 38264325 PMCID: PMC10805457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]

Zhang S, Li X, Lin J, Lin Q, Wong KC. Review of single-cell RNA-seq data clustering for cell-type identification and characterization. RNA (NEW YORK, N.Y.) 2023;29:517-530. [PMID: 36737104 PMCID: PMC10158997 DOI: 10.1261/rna.078965.121] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 01/03/2023] [Indexed: 05/06/2023]

Qiu Y, Yan C, Zhao P, Zou Q. SSNMDI: a novel joint learning model of semi-supervised non-negative matrix factorization and data imputation for clustering of single-cell RNA-seq data. Brief Bioinform 2023;24:7147025. [PMID: 37122068 DOI: 10.1093/bib/bbad149] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 03/28/2023] [Indexed: 05/02/2023] Open

Crowell HL, Morillo Leonardo SX, Soneson C, Robinson MD. The shaky foundations of simulating single-cell RNA sequencing data. Genome Biol 2023;24:62. [PMID: 36991470 PMCID: PMC10061781 DOI: 10.1186/s13059-023-02904-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 03/20/2023] [Indexed: 03/31/2023] Open

Li H, Zhang Z, Squires M, Chen X, Zhang X. scMultiSim: simulation of multi-modality single cell data guided by cell-cell interactions and gene regulatory networks. RESEARCH SQUARE 2023:rs.3.rs-2675530. [PMID: 36993284 PMCID: PMC10055660 DOI: 10.21203/rs.3.rs-2675530/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]

Ding L, Shi H, Qian C, Burdyshaw C, Veloso JP, Khatamian A, Pan Q, Dhungana Y, Xie Z, Risch I, Yang X, Huang X, Yan L, Rusch M, Brewer M, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for identifying hidden drivers from single-cell omics data. RESEARCH SQUARE 2023:rs.3.rs-2476875. [PMID: 36747874 PMCID: PMC9901036 DOI: 10.21203/rs.3.rs-2476875/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Affiliation(s)

Liang Ding Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Hao Shi Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Chenxi Qian Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Chad Burdyshaw Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Joao Pedro Veloso Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Alireza Khatamian Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Qingfei Pan Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Yogesh Dhungana Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA Graduate School of Biomedical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Zhen Xie Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA Department of Physiology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
Isabel Risch Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Xu Yang Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Xin Huang Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Lei Yan Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Michael Rusch Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Michael Brewer Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Koon-Kiu Yan Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Hongbo Chi Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Jiyang Yu Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

Collapse

Hsu LL, Culhane AC. Correspondence analysis for dimension reduction, batch integration, and visualization of single-cell RNA-seq data. Sci Rep 2023;13:1197. [PMID: 36681709 PMCID: PMC9867729 DOI: 10.1038/s41598-022-26434-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/14/2022] [Indexed: 01/22/2023] Open

Harmanci A, Harmanci AS, Klisch TJ, Patel AJ. XCVATR: detection and characterization of variant impact on the Embeddings of single -cell and bulk RNA-sequencing samples. BMC Genomics 2022;23:841. [PMID: 36539717 PMCID: PMC9764736 DOI: 10.1186/s12864-022-09004-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 11/09/2022] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

RNA-sequencing has become a standard tool for analyzing gene activity in bulk samples and at the single-cell level. By increasing sample sizes and cell counts, this technique can uncover substantial information about cellular transcriptional states. Beyond quantification of gene expression, RNA-seq can be used for detecting variants, including single nucleotide polymorphisms, small insertions/deletions, and larger variants, such as copy number variants. Notably, joint analysis of variants with cellular transcriptional states may provide insights into the impact of mutations, especially for complex and heterogeneous samples. However, this analysis is often challenging due to a prohibitively high number of variants and cells, which are difficult to summarize and visualize. Further, there is a dearth of methods that assess and summarize the association between detected variants and cellular transcriptional states.

RESULTS

Here, we introduce XCVATR (eXpressed Clusters of Variant Alleles in Transcriptome pRofiles), a method that identifies variants and detects local enrichment of expressed variants within embedding of samples and cells in single-cell and bulk RNA-seq datasets. XCVATR visualizes local "clumps" of small and large-scale variants and searches for patterns of association between each variant and cellular states, as described by the coordinates of cell embedding, which can be computed independently using any type of distance metrics, such as principal component analysis or t-distributed stochastic neighbor embedding. Through simulations and analysis of real datasets, we demonstrate that XCVATR can detect enrichment of expressed variants and provide insight into the transcriptional states of cells and samples. We next sequenced 2 new single cell RNA-seq tumor samples and applied XCVATR. XCVATR revealed subtle differences in CNV impact on tumors.

CONCLUSIONS

XCVATR is publicly available to download from https://github.com/harmancilab/XCVATR .

Collapse

Su M, Pan T, Chen QZ, Zhou WW, Gong Y, Xu G, Yan HY, Li S, Shi QZ, Zhang Y, He X, Jiang CJ, Fan SC, Li X, Cairns MJ, Wang X, Li YS. Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications. Mil Med Res 2022;9:68. [PMID: 36461064 PMCID: PMC9716519 DOI: 10.1186/s40779-022-00434-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open

Affiliation(s)

Min Su State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Tao Pan College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiu-Zhen Chen State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Wei-Wei Zhou College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
Yi Gong State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.,Department of Immunology, Nanjing Medical University, Nanjing, 211166, China
Gang Xu College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Huan-Yu Yan State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Si Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Qiao-Zhen Shi State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China
Ya Zhang College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China
Xiao He Department of Laboratory Medicine, Women and Children's Hospital of Chongqing Medical University, Chongqing, 401174, China
Chun-Jie Jiang Baylor College of Medicine, Houston, TX, 77030, USA
Shi-Cai Fan Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518110, Guangdong, China
Xia Li College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
Murray J Cairns School of Biomedical Sciences and Pharmacy, Faculty of Health and Medicine, the University of Newcastle, University Drive, Callaghan, NSW, 2308, Australia. .,Precision Medicine Research Program, Hunter Medical Research Institute, New Lambton Heights, NSW, 2305, Australia.
Xi Wang State Key Laboratory of Reproductive Medicine, Nanjing Medical University, Nanjing, 211166, China.
Yong-Sheng Li College of Biomedical Information and Engineering, the First Affiliated Hospital of Hainan Medical University, Hainan Medical University, Haikou, 571199, Hainan, China.

Collapse

Jiménez‐Santos MJ, García‐Martín S, Fustero‐Torre C, Di Domenico T, Gómez‐López G, Al‐Shahrour F. Bioinformatics roadmap for therapy selection in cancer genomics. Mol Oncol 2022;16:3881-3908. [PMID: 35811332 PMCID: PMC9627786 DOI: 10.1002/1878-0261.13286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 06/22/2022] [Accepted: 07/08/2022] [Indexed: 12/24/2022] Open

Couckuyt A, Seurinck R, Emmaneel A, Quintelier K, Novak D, Van Gassen S, Saeys Y. Challenges in translational machine learning. Hum Genet 2022;141:1451-1466. [PMID: 35246744 PMCID: PMC8896412 DOI: 10.1007/s00439-022-02439-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 02/08/2022] [Indexed: 11/25/2022]

Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol 2022;23:168. [PMID: 35927760 PMCID: PMC9351148 DOI: 10.1186/s13059-022-02734-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 07/21/2022] [Indexed: 02/08/2023] Open

Algabri YA, Li L, Liu ZP. scGENA: A Single-Cell Gene Coexpression Network Analysis Framework for Clustering Cell Types and Revealing Biological Mechanisms. Bioengineering (Basel) 2022;9:bioengineering9080353. [PMID: 36004879 PMCID: PMC9405199 DOI: 10.3390/bioengineering9080353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/27/2022] [Accepted: 07/27/2022] [Indexed: 11/16/2022] Open

Huang H, Wang Y, Rudin C, Browne EP. Towards a comprehensive evaluation of dimension reduction methods for transcriptomic data visualization. Commun Biol 2022;5:719. [PMID: 35853932 PMCID: PMC9296444 DOI: 10.1038/s42003-022-03628-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/23/2022] [Indexed: 12/11/2022] Open

Ellis D, Wu D, Datta S. SAREV: A review on statistical analytics of single-cell RNA sequencing data. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022;14:e1558. [PMID: 36034329 PMCID: PMC9400796 DOI: 10.1002/wics.1558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 04/09/2021] [Indexed: 06/15/2023]

Anchang B, Mendez-Giraldez R, Xu X, Archer TK, Chen Q, Hu G, Plevritis SK, Motsinger-Reif AA, Li JL. Visualization, benchmarking and characterization of nested single-cell heterogeneity as dynamic forest mixtures. Brief Bioinform 2022;23:6534382. [PMID: 35192692 PMCID: PMC8921621 DOI: 10.1093/bib/bbac017] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/19/2021] [Accepted: 01/13/2022] [Indexed: 11/13/2022] Open

Abstract

A major topic of debate in developmental biology centers on whether development is continuous, discontinuous, or a mixture of both. Pseudo-time trajectory models, optimal for visualizing cellular progression, model cell transitions as continuous state manifolds and do not explicitly model real-time, complex, heterogeneous systems and are challenging for benchmarking with temporal models. We present a data-driven framework that addresses these limitations with temporal single-cell data collected at discrete time points as inputs and a mixture of dependent minimum spanning trees (MSTs) as outputs, denoted as dynamic spanning forest mixtures (DSFMix). DSFMix uses decision-tree models to select genes that account for variations in multimodality, skewness and time. The genes are subsequently used to build the forest using tree agglomerative hierarchical clustering and dynamic branch cutting. We first motivate the use of forest-based algorithms compared to single-tree approaches for visualizing and characterizing developmental processes. We next benchmark DSFMix to pseudo-time and temporal approaches in terms of feature selection, time correlation, and network similarity. Finally, we demonstrate how DSFMix can be used to visualize, compare and characterize complex relationships during biological processes such as epithelial-mesenchymal transition, spermatogenesis, stem cell pluripotency, early transcriptional response from hormones and immune response to coronavirus disease. Our results indicate that the expression of genes during normal development exhibits a high proportion of non-uniformly distributed profiles that are mostly right-skewed and multimodal; the latter being a characteristic of major steady states during development. Our study also identifies and validates gene signatures driving complex dynamic processes during somatic or germline differentiation.

Collapse

A Primer for Single-Cell Sequencing in Non-Model Organisms. Genes (Basel) 2022;13:genes13020380. [PMID: 35205423 PMCID: PMC8872538 DOI: 10.3390/genes13020380] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 02/12/2022] [Accepted: 02/17/2022] [Indexed: 02/05/2023] Open

Yu L, Cao Y, Yang JYH, Yang P. Benchmarking clustering algorithms on estimating the number of cell types from single-cell RNA-sequencing data. Genome Biol 2022;23:49. [PMID: 35135612 PMCID: PMC8822786 DOI: 10.1186/s13059-022-02622-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 01/27/2022] [Indexed: 01/24/2023] Open

Baruzzo G, Patuzzi I, Di Camillo B. Beware to ignore the rare: how imputing zero-values can improve the quality of 16S rRNA gene studies results. BMC Bioinformatics 2022;22:618. [PMID: 35130833 PMCID: PMC8822630 DOI: 10.1186/s12859-022-04587-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

16S rRNA-gene sequencing is a valuable approach to characterize the taxonomic content of the whole bacterial population inhabiting a metabolic and spatial niche, providing an important opportunity to study bacteria and their role in many health and environmental mechanisms. The analysis of data produced by amplicon sequencing, however, brings very specific methodological issues that need to be properly addressed to obtain reliable biological conclusions. Among these, 16S count data tend to be very sparse, with many null values reflecting species that are present but got unobserved due to the multiplexing constraints. However, current data workflows do not consider a step in which the information about unobserved species is recovered.

RESULTS

In this work, we evaluate for the first time the effects of introducing in the 16S data workflow a new preprocessing step, zero-imputation, to recover this lost information. Due to the lack of published zero-imputation methods specifically designed for 16S count data, we considered a set of zero-imputation strategies available for other frameworks, and benchmarked them using in silico 16S count data reflecting different experimental designs. Additionally, we assessed the effect of combining zero-imputation and normalization, i.e. the only preprocessing step in current 16S workflow. Overall, we benchmarked 35 16S preprocessing pipelines assessing their ability to handle data sparsity, identify species presence/absence, recovery sample proportional abundance distributions, and improve typical downstream analyses such as computation of alpha and beta diversity indices and differential abundance analysis.

CONCLUSIONS

The results clearly show that 16S data analysis greatly benefits from a properly-performed zero-imputation step, despite the choice of the right zero-imputation method having a pivotal role. In addition, we identify a set of best-performing pipelines that could be a valuable indication for data analysts.

Collapse

Li Z, Feng H. A neural network-based method for exhaustive cell label assignment using single cell RNA-seq data. Sci Rep 2022;12:910. [PMID: 35042860 PMCID: PMC8766435 DOI: 10.1038/s41598-021-04473-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 12/21/2021] [Indexed: 02/01/2023] Open

Schiebout C, Frost HR. CAMML: Multi-Label Immune Cell-Typing and Stemness Analysis for Single-Cell RNA-sequencing. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022;27:199-210. [PMID: 34890149 PMCID: PMC8669732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

You Y, Tian L, Su S, Dong X, Jabbari JS, Hickey PF, Ritchie ME. Benchmarking UMI-based single-cell RNA-seq preprocessing workflows. Genome Biol 2021;22:339. [PMID: 34906205 PMCID: PMC8672463 DOI: 10.1186/s13059-021-02552-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 11/22/2021] [Indexed: 12/13/2022] Open

Affiliation(s)

Yue You Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia
Luyi Tian Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia
Shian Su Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia
Xueyi Dong Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia
Jafar S. Jabbari Australian Genome Research Facility, Victorian Comprehensive Cancer Centre, Melbourne, Australia Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, The University of Melbourne at The Peter Doherty Institute for Infection and Immunity, Melbourne, Australia
Peter F. Hickey Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia Single-Cell Open Research Endeavour (SCORE), The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia
Matthew E. Ritchie Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Australia Department of Medical Biology, The University of Melbourne, Parkville, Australia School of Mathematics and Statistics, The University of Melbourne, Parkville, Australia

Collapse

Ostner J, Carcy S, Müller CL. tascCODA: Bayesian Tree-Aggregated Analysis of Compositional Amplicon and Single-Cell Data. Front Genet 2021;12:766405. [PMID: 34950190 PMCID: PMC8689185 DOI: 10.3389/fgene.2021.766405] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/01/2021] [Indexed: 12/11/2022] Open

Morelli L, Giansanti V, Cittaro D. Nested Stochastic Block Models applied to the analysis of single cell data. BMC Bioinformatics 2021;22:576. [PMID: 34847879 PMCID: PMC8630903 DOI: 10.1186/s12859-021-04489-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/19/2021] [Indexed: 12/30/2022] Open

Chan A, Jiang W, Blyth E, Yang J, Patrick E. treekoR: identifying cellular-to-phenotype associations by elucidating hierarchical relationships in high-dimensional cytometry data. Genome Biol 2021;22:324. [PMID: 34844647 PMCID: PMC8628061 DOI: 10.1186/s13059-021-02526-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 10/26/2021] [Indexed: 12/13/2022] Open

Bej S, Galow AM, David R, Wolfien M, Wolkenhauer O. Automated annotation of rare-cell types from single-cell RNA-sequencing data through synthetic oversampling. BMC Bioinformatics 2021;22:557. [PMID: 34798805 PMCID: PMC8603509 DOI: 10.1186/s12859-021-04469-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Accepted: 11/03/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The research landscape of single-cell and single-nuclei RNA-sequencing is evolving rapidly. In particular, the area for the detection of rare cells was highly facilitated by this technology. However, an automated, unbiased, and accurate annotation of rare subpopulations is challenging. Once rare cells are identified in one dataset, it is usually necessary to generate further specific datasets to enrich the analysis (e.g., with samples from other tissues). From a machine learning perspective, the challenge arises from the fact that rare-cell subpopulations constitute an imbalanced classification problem. We here introduce a Machine Learning (ML)-based oversampling method that uses gene expression counts of already identified rare cells as an input to generate synthetic cells to then identify similar (rare) cells in other publicly available experiments. We utilize single-cell synthetic oversampling (sc-SynO), which is based on the Localized Random Affine Shadowsampling (LoRAS) algorithm. The algorithm corrects for the overall imbalance ratio of the minority and majority class.

RESULTS

We demonstrate the effectiveness of our method for three independent use cases, each consisting of already published datasets. The first use case identifies cardiac glial cells in snRNA-Seq data (17 nuclei out of 8635). This use case was designed to take a larger imbalance ratio (~1 to 500) into account and only uses single-nuclei data. The second use case was designed to jointly use snRNA-Seq data and scRNA-Seq on a lower imbalance ratio (~1 to 26) for the training step to likewise investigate the potential of the algorithm to consider both single-cell capture procedures and the impact of "less" rare-cell types. The third dataset refers to the murine data of the Allen Brain Atlas, including more than 1 million cells. For validation purposes only, all datasets have also been analyzed traditionally using common data analysis approaches, such as the Seurat workflow.

CONCLUSIONS

In comparison to baseline testing without oversampling, our approach identifies rare-cells with a robust precision-recall balance, including a high accuracy and low false positive detection rate. A practical benefit of our algorithm is that it can be readily implemented in other and existing workflows. The code basis in R and Python is publicly available at FairdomHub, as well as GitHub, and can easily be transferred to identify other rare-cell types.

Collapse

Zappia L, Theis FJ. Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol 2021;22:301. [PMID: 34715899 PMCID: PMC8555270 DOI: 10.1186/s13059-021-02519-4] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/14/2021] [Indexed: 11/16/2022] Open

Xie B, Jiang Q, Mora A, Li X. Automatic cell type identification methods for single-cell RNA sequencing. Comput Struct Biotechnol J 2021;19:5874-5887. [PMID: 34815832 PMCID: PMC8572862 DOI: 10.1016/j.csbj.2021.10.027] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 09/23/2021] [Accepted: 10/18/2021] [Indexed: 11/24/2022] Open