1
|
Yang J, Wang W, Zhang X. scSemiGCN: boosting cell-type annotation from noise-resistant graph neural networks with extremely limited supervision. Bioinformatics 2024; 40:btae091. [PMID: 38366925 PMCID: PMC10904148 DOI: 10.1093/bioinformatics/btae091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 01/14/2024] [Accepted: 02/14/2024] [Indexed: 02/19/2024] Open
Abstract
MOTIVATION Cell-type annotation is fundamental in revealing cell heterogeneity for single-cell data analysis. Although a host of works have been developed, the low signal-to-noise-ratio single-cell RNA-sequencing data that suffers from batch effects and dropout still poses obstacles in discovering grouped patterns for cell types by unsupervised learning and its alternative-semi-supervised learning that utilizes a few labeled cells as guidance for cell-type annotation. RESULTS We propose a robust cell-type annotation method scSemiGCN based on graph convolutional networks. Built upon a denoised network structure that characterizes reliable cell-to-cell connections, scSemiGCN generates pseudo labels for unannotated cells. Then supervised contrastive learning follows to refine the noisy single-cell data. Finally, message passing with the refined features over the denoised network structure is conducted for semi-supervised cell-type annotation. Comparison over several datasets with six methods under extremely limited supervision validates the effectiveness and efficiency of scSemiGCN for cell-type annotation. AVAILABILITY AND IMPLEMENTATION Implementation of scSemiGCN is available at https://github.com/Jane9898/scSemiGCN.
Collapse
Affiliation(s)
- Jue Yang
- School of Mathematics, Sun Yat-sen University, Guangzhou 510000, China
| | - Weiwen Wang
- Department of Mathematics, School of Information Science and Technology, Jinan University, Guangzhou 510000, China
| | - Xiwen Zhang
- Department of Bioinformatics, College of Medical Information Engineering, Guangdong Pharmaceutical University, Guangzhou 510000, China
| |
Collapse
|
2
|
Cao Y, Tran A, Kim H, Robertson N, Lin Y, Torkel M, Yang P, Patrick E, Ghazanfar S, Yang J. Thinking process templates for constructing data stories with SCDNEY. F1000Res 2023; 12:261. [PMID: 38434622 PMCID: PMC10905113 DOI: 10.12688/f1000research.130623.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/08/2023] [Indexed: 03/05/2024] Open
Abstract
Background Globally, scientists now have the ability to generate a vast amount of high throughput biomedical data that carry critical information for important clinical and public health applications. This data revolution in biology is now creating a plethora of new single-cell datasets. Concurrently, there have been significant methodological advances in single-cell research. Integrating these two resources, creating tailor-made, efficient, and purpose-specific data analysis approaches can assist in accelerating scientific discovery. Methods We developed a series of living workshops for building data stories, using Single-cell data integrative analysis (scdney). scdney is a wrapper package with a collection of single-cell analysis R packages incorporating data integration, cell type annotation, higher order testing and more. Results Here, we illustrate two specific workshops. The first workshop examines how to characterise the identity and/or state of cells and the relationship between them, known as phenotyping. The second workshop focuses on extracting higher-order features from cells to predict disease progression. Conclusions Through these workshops, we not only showcase current solutions, but also highlight critical thinking points. In particular, we highlight the Thinking Process Template that provides a structured framework for the decision-making process behind such single-cell analyses. Furthermore, our workshop will incorporate dynamic contributions from the community in a collaborative learning approach, thus the term 'living'.
Collapse
Affiliation(s)
- Yue Cao
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Andy Tran
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Hani Kim
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Nick Robertson
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Yingxin Lin
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Marni Torkel
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Pengyi Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
- Children's Medical Research Institute, The University of Sydney, Westmead, NSW, 2145, Australia
| | - Ellis Patrick
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Shila Ghazanfar
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Jean Yang
- Laboratory of Data Discovery for Health Limited (D24H), Science Park, Hong Kong SAR, China
- Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
3
|
Maden SK, Kwon SH, Huuki-Myers LA, Collado-Torres L, Hicks SC, Maynard KR. Challenges and opportunities to computationally deconvolve heterogeneous tissue with varying cell sizes using single-cell RNA-sequencing datasets. Genome Biol 2023; 24:288. [PMID: 38098055 PMCID: PMC10722720 DOI: 10.1186/s13059-023-03123-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Deconvolution of cell mixtures in "bulk" transcriptomic samples from homogenate human tissue is important for understanding disease pathologies. However, several experimental and computational challenges impede transcriptomics-based deconvolution approaches using single-cell/nucleus RNA-seq reference atlases. Cells from the brain and blood have substantially different sizes, total mRNA, and transcriptional activities, and existing approaches may quantify total mRNA instead of cell type proportions. Further, standards are lacking for the use of cell reference atlases and integrative analyses of single-cell and spatial transcriptomics data. We discuss how to approach these key challenges with orthogonal "gold standard" datasets for evaluating deconvolution methods.
Collapse
Affiliation(s)
- Sean K Maden
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Sang Ho Kwon
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Louise A Huuki-Myers
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, USA.
| | - Kristen R Maynard
- Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA.
- The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA.
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
4
|
Nie X, Qin D, Zhou X, Duo H, Hao Y, Li B, Liang G. Clustering ensemble in scRNA-seq data analysis: Methods, applications and challenges. Comput Biol Med 2023; 159:106939. [PMID: 37075602 DOI: 10.1016/j.compbiomed.2023.106939] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/31/2023] [Accepted: 04/14/2023] [Indexed: 04/21/2023]
Abstract
With the rapid development of single-cell RNA-sequencing techniques, various computational methods and tools were proposed to analyze these high-throughput data, which led to an accelerated reveal of potential biological information. As one of the core steps of single-cell transcriptome data analysis, clustering plays a crucial role in identifying cell types and interpreting cellular heterogeneity. However, the results generated by different clustering methods showed distinguishing, and those unstable partitions can affect the accuracy of the analysis to a certain extent. To overcome this challenge and obtain more accurate results, currently clustering ensemble is frequently applied to cluster analysis of single-cell transcriptome datasets, and the results generated by all clustering ensembles are nearly more reliable than those from most of the single clustering partitions. In this review, we summarize applications and challenges of the clustering ensemble method in single-cell transcriptome data analysis, and provide constructive thoughts and references for researchers in this field.
Collapse
Affiliation(s)
- Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China; College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Dan Qin
- Department of Biology, College of Science, Northeastern University, Boston, MA, 02115, USA
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China.
| |
Collapse
|
5
|
Kim HJ, O’Hara-Wright M, Kim D, Loi TH, Lim BY, Jamieson RV, Gonzalez-Cordero A, Yang P. Comprehensive characterization of fetal and mature retinal cell identity to assess the fidelity of retinal organoids. Stem Cell Reports 2023; 18:175-189. [PMID: 36630901 PMCID: PMC9860116 DOI: 10.1016/j.stemcr.2022.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 12/07/2022] [Accepted: 12/07/2022] [Indexed: 01/12/2023] Open
Abstract
Characterizing cell identity in complex tissues such as the human retina is essential for studying its development and disease. While retinal organoids derived from pluripotent stem cells have been widely used to model development and disease of the human retina, there is a lack of studies that have systematically evaluated the molecular and cellular fidelity of the organoids derived from various culture protocols in recapitulating their in vivo counterpart. To this end, we performed an extensive meta-atlas characterization of cellular identities of the human eye, covering a wide range of developmental stages. The resulting map uncovered previously unknown biomarkers of major retinal cell types and those associated with cell-type-specific maturation. Using our retinal-cell-identity map from the fetal and adult tissues, we systematically assessed the fidelity of the retinal organoids in mimicking the human eye, enabling us to comprehensively benchmark the current protocols for retinal organoid generation.
Collapse
Affiliation(s)
- Hani Jieun Kim
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia,School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Michelle O’Hara-Wright
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia,Stem Cell Medicine Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Daniel Kim
- Computational Systems Biology Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia,School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia
| | - To Ha Loi
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia,Eye Genetics Research Unit, Children’s Medical Research Institute, Sydney Children’s Hospitals Network, Save Sight Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Benjamin Y. Lim
- Stem Cell Medicine Group, Children’s Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Robyn V. Jamieson
- Specialty of Genomic Medicine, Faculty of Medicine and Health, University of Sydney, Westmead, NSW 2145, Australia,Eye Genetics Research Unit, Children’s Medical Research Institute, Sydney Children’s Hospitals Network, Save Sight Institute, The University of Sydney, Westmead, NSW 2145, Australia
| | - Anai Gonzalez-Cordero
- School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia; Stem Cell Medicine Group, Children's Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia.
| | - Pengyi Yang
- Computational Systems Biology Group, Children's Medical Research Institute, The University of Sydney, Westmead, NSW 2145, Australia; School of Mathematics and Statistics, The University of Sydney, Sydney, NSW 2006, Australia; School of Medical Sciences, Faculty of Medicine and Health, The University of Sydney, Camperdown, NSW 2006, Australia.
| |
Collapse
|
6
|
Huang Y, Chang H, Chen X, Meng J, Han M, Huang T, Yuan L, Zhang G. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. QUANTITATIVE BIOLOGY 2023. [DOI: 10.15302/j-qb-022-0311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
7
|
CASSL: A cell-type annotation method for single cell transcriptomics data using semi-supervised learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03440-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
8
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:159-176. [DOI: 10.1093/bfgp/elac002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/20/2022] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open
|
9
|
Qin G, Du L, Ma Y, Yin Y, Wang L. Gene biomarker prediction in glioma by integrating scRNA-seq data and gene regulatory network. BMC Med Genomics 2021; 14:287. [PMID: 34863158 PMCID: PMC8643020 DOI: 10.1186/s12920-021-01115-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 11/01/2021] [Indexed: 12/22/2022] Open
Abstract
Background Although great efforts have been made to study the occurrence and development of glioma, the molecular mechanisms of glioma are still unclear. Single-cell sequencing technology provides a new perspective for researchers to explore the pathogens of tumors to further help make treatment and prognosis decisions for patients with tumors. Methods In this study, we proposed an algorithm framework to explore the molecular mechanisms of glioma by integrating single-cell gene expression profiles and gene regulatory relations. First, since there were great differences among malignant cells from different glioma samples, we analyzed the expression status of malignant cells for each sample, and then tumor consensus genes were identified by constructing and analyzing cell-specific networks. Second, to comprehensively analyze the characteristics of glioma, we integrated transcriptional regulatory relationships and consensus genes to construct a tumor-specific regulatory network. Third, we performed a hybrid clustering analysis to identify glioma cell types. Finally, candidate tumor gene biomarkers were identified based on cell types and known glioma-related genes. Results We got six identified cell types using the method we proposed and for these cell types, we performed functional and biological pathway enrichment analyses. The candidate tumor gene biomarkers were analyzed through survival analysis and verified using literature from PubMed. Conclusions The results showed that these candidate tumor gene biomarkers were closely related to glioma and could provide clues for the diagnosis and prognosis of patients with glioma. In addition, we found that four of the candidate tumor gene biomarkers (NDUFS5, NDUFA1, NDUFA13, and NDUFB8) belong to the NADH ubiquinone oxidoreductase subunit gene family, so we inferred that this gene family may be strongly related to glioma.
Collapse
Affiliation(s)
- Guimin Qin
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Longting Du
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Yuying Ma
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Yu Yin
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China
| | - Liming Wang
- School of Computer Science and Technology, Xidian University, Xi'an, 710071, China.
| |
Collapse
|
10
|
Mangiola S, Doyle MA, Papenfuss AT. Interfacing Seurat with the R tidy universe. Bioinformatics 2021; 37:4100-4107. [PMID: 34028547 PMCID: PMC9502154 DOI: 10.1093/bioinformatics/btab404] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 05/19/2021] [Accepted: 05/22/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Seurat is one of the most popular software suites for the analysis of single-cell RNA sequencing data. Considering the popularity of the tidyverse ecosystem, which offers a large set of data display, query, manipulation, integration and visualization utilities, a great opportunity exists to interface the Seurat object with the tidyverse. This interface gives the large data science community of tidyverse users the possibility to operate with familiar grammar. Results To provide Seurat with a tidyverse-oriented interface without compromising efficiency, we developed tidyseurat, a lightweight adapter to the tidyverse. Tidyseurat displays cell information as a tibble abstraction, allowing intuitively interfacing Seurat with dplyr, tidyr, ggplot2 and plotly packages powering efficient data manipulation, integration and visualization. Iterative analyses on data subsets are enabled by interfacing with the popular nest-map framework. Availability and implementation The software is freely available at cran.r-project.org/web/packages/tidyseurat and github.com/stemangiola/tidyseurat. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stefano Mangiola
- Bioinformatics Division, The Walter and Eliza Hall Institute, Parkville, Victoria, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, Victoria, Australia
| | - Maria A Doyle
- Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia
| | - Anthony T Papenfuss
- Bioinformatics Division, The Walter and Eliza Hall Institute, Parkville, Victoria, Australia.,Department of Medical Biology, University of Melbourne, Melbourne, Victoria, Australia.,Peter MacCallum Cancer Centre, Melbourne, VIC, 3000, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne, Victoria, Australia.,School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
11
|
Gustafsson J, Robinson J, Inda-Díaz JS, Björnson E, Jörnsten R, Nielsen J. DSAVE: Detection of misclassified cells in single-cell RNA-Seq data. PLoS One 2020; 15:e0243360. [PMID: 33270740 PMCID: PMC7714356 DOI: 10.1371/journal.pone.0243360] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 11/19/2020] [Indexed: 11/19/2022] Open
Abstract
Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.
Collapse
Affiliation(s)
- Johan Gustafsson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
| | - Jonathan Robinson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
| | - Juan S. Inda-Díaz
- Mathematical Sciences, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
| | - Elias Björnson
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Department of Molecular and Clinical Medicine, Wallenberg Laboratory for Cardiovascular and Metabolic Research, University of Gothenburg, Gothenburg, Sweden
| | - Rebecka Jörnsten
- Mathematical Sciences, University of Gothenburg and Chalmers University of Technology, Gothenburg, Sweden
| | - Jens Nielsen
- Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden
- Wallenberg Center for Protein Research, Chalmers University of Technology, Gothenburg, Sweden
- BioInnovation Institute, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
12
|
Abstract
Nowadays, biomedical data are generated exponentially, creating datasets for analysis with ultra-high dimensionality and complexity. An indicative example is emerging single-cell RNA-sequencing (scRNA-seq) technology, which isolates and measures individual cells. The analysis of scRNA-seq data consists of a major challenge because of its ultra-high dimensionality and complexity. Towards this direction, we study the generalization of the MRPV, a recently published ensemble classification algorithm, which combines multiple ultra-low dimensional random projected spaces with a voting scheme, while exposing its ability to enhance the performance of base classifiers. We empirically showed that we can design a reliable ensemble classification technique using random projected subspaces in an extremely small fixed number of dimensions, without following the restrictions of the classical random projection method. Therefore, the MPRV acquires the ability to efficiently and rapidly perform classification tasks even for data with extremely high dimensionality. Furthermore, through the experimental analysis in six scRNA-seq data, we provided evidence that the most critical advantage of MRPV is the dramatic reduction in data dimensionality that allows for the utilization of computational demanding classifiers that are considered as non-practical in real-life applications. The scalability, the simplicity, and the capabilities of our proposed framework render it as a tool-guide for single-cell RNA-seq data which are characterized by ultra-high dimensionality. MRPV is available on GitHub in MATLAB implementation.
Collapse
|
13
|
Choi JH, In Kim H, Woo HG. scTyper: a comprehensive pipeline for the cell typing analysis of single-cell RNA-seq data. BMC Bioinformatics 2020; 21:342. [PMID: 32753029 PMCID: PMC7430822 DOI: 10.1186/s12859-020-03700-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Accepted: 07/23/2020] [Indexed: 01/02/2023] Open
Abstract
Background Recent advances in single-cell RNA sequencing (scRNA-seq) technology have enabled the identification of individual cell types, such as epithelial cells, immune cells, and fibroblasts, in tissue samples containing complex cell populations. Cell typing is one of the key challenges in scRNA-seq data analysis that is usually achieved by estimating the expression of cell marker genes. However, there is no standard practice for cell typing, often resulting in variable and inaccurate outcomes. Results We have developed a comprehensive and user-friendly R-based scRNA-seq analysis and cell typing package, scTyper. scTyper also provides a database of cell type markers, scTyper.db, which contains 213 cell marker sets collected from literature. These marker sets include but are not limited to markers for malignant cells, cancer-associated fibroblasts, and tumor-infiltrating T cells. Additionally, scTyper provides three customized methods for estimating cell-type marker expression, including nearest template prediction (NTP), gene set enrichment analysis (GSEA), and average expression values. DNA copy number inference method (inferCNV) has been implemented with an improved modification that can be used for malignant cell typing. The package also supports the data preprocessing pipelines by Cell Ranger from 10X Genomics and the Seurat package. A summary reporting system is also implemented, which may facilitate users to perform reproducible analyses. Conclusions scTyper provides a comprehensive and user-friendly analysis pipeline for cell typing of scRNA-seq data with a curated cell marker database, scTyper.db.
Collapse
Affiliation(s)
- Ji-Hye Choi
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea.,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea
| | - Hye In Kim
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea.,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea
| | - Hyun Goo Woo
- Department of Physiology, Ajou University School of Medicine, 164 Worldcup-ro, Yeongtong-gu, Suwon, 16499, Republic of Korea. .,Department of Biomedical Science, Graduate School, Ajou University, Suwon, Republic of Korea.
| |
Collapse
|