1
|
Sun Y, Kong L, Huang J, Deng H, Bian X, Li X, Cui F, Dou L, Cao C, Zou Q, Zhang Z. A comprehensive survey of dimensionality reduction and clustering methods for single-cell and spatial transcriptomics data. Brief Funct Genomics 2024:elae023. [PMID: 38860675 DOI: 10.1093/bfgp/elae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/12/2024] Open
Abstract
In recent years, the application of single-cell transcriptomics and spatial transcriptomics analysis techniques has become increasingly widespread. Whether dealing with single-cell transcriptomic or spatial transcriptomic data, dimensionality reduction and clustering are indispensable. Both single-cell and spatial transcriptomic data are often high-dimensional, making the analysis and visualization of such data challenging. Through dimensionality reduction, it becomes possible to visualize the data in a lower-dimensional space, allowing for the observation of relationships and differences between cell subpopulations. Clustering enables the grouping of similar cells into the same cluster, aiding in the identification of distinct cell subpopulations and revealing cellular diversity, providing guidance for downstream analyses. In this review, we systematically summarized the most widely recognized algorithms employed for the dimensionality reduction and clustering analysis of single-cell transcriptomic and spatial transcriptomic data. This endeavor provides valuable insights and ideas that can contribute to the development of novel tools in this rapidly evolving field.
Collapse
Affiliation(s)
- Yidi Sun
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lingling Kong
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Jiayi Huang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Hongyan Deng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xinling Bian
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Xingfeng Li
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Lijun Dou
- Genomic Medicine Institute, Lerner Research Institute, Cleveland, OH 44106, United States
| | - Chen Cao
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing 210029, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
2
|
Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023; 24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
Collapse
Affiliation(s)
- Tianyuan Lei
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Ruoyu Chen
- Moorestown High School, Moorestown, NJ 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, NJ 08028, USA
| |
Collapse
|
3
|
Venkat A, Bhaskar D, Krishnaswamy S. Multiscale geometric and topological analyses for characterizing and predicting immune responses from single cell data. Trends Immunol 2023; 44:551-563. [PMID: 37301677 DOI: 10.1016/j.it.2023.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 05/03/2023] [Accepted: 05/04/2023] [Indexed: 06/12/2023]
Abstract
Single cell genomics has revolutionized our ability to map immune heterogeneity and responses. With the influx of large-scale data sets from diverse modalities, the resolution achieved has supported the long-held notion that immune cells are naturally organized into hierarchical relationships, characterized at multiple levels. Such a multigranular structure corresponds to key geometric and topological features. Given that differences between an effective and ineffective immunological response may not be found at one level, there is vested interest in characterizing and predicting outcomes from such features. In this review, we highlight single cell methods and principles for learning geometric and topological properties of data at multiple scales, discussing their contributions to immunology. Ultimately, multiscale approaches go beyond classical clustering, revealing a more comprehensive picture of cellular heterogeneity.
Collapse
Affiliation(s)
- Aarthi Venkat
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA
| | | | - Smita Krishnaswamy
- Computational Biology and Bioinformatics Program, Yale University, New Haven, CT, USA; Department of Genetics, Yale University, New Haven, CT, USA; Department of Computer Science, Yale University, New Haven, CT, USA.
| |
Collapse
|
4
|
Shen X, Li M, Shao K, Li Y, Ge Z. Post-ischemic inflammatory response in the brain: Targeting immune cell in ischemic stroke therapy. Front Mol Neurosci 2023; 16:1076016. [PMID: 37078089 PMCID: PMC10106693 DOI: 10.3389/fnmol.2023.1076016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/13/2023] [Indexed: 04/05/2023] Open
Abstract
An ischemic stroke occurs when the blood supply is obstructed to the vascular basin, causing the death of nerve cells and forming the ischemic core. Subsequently, the brain enters the stage of reconstruction and repair. The whole process includes cellular brain damage, inflammatory reaction, blood–brain barrier destruction, and nerve repair. During this process, the proportion and function of neurons, immune cells, glial cells, endothelial cells, and other cells change. Identifying potential differences in gene expression between cell types or heterogeneity between cells of the same type helps to understand the cellular changes that occur in the brain and the context of disease. The recent emergence of single-cell sequencing technology has promoted the exploration of single-cell diversity and the elucidation of the molecular mechanism of ischemic stroke, thus providing new ideas and directions for the diagnosis and clinical treatment of ischemic stroke.
Collapse
Affiliation(s)
- Xueyang Shen
- Department of Neurology, Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
| | - Mingming Li
- Department of Neurology, Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
- Gansu Provincial Neurology Clinical Medical Research Center, The Second Hospital of Lanzhou University, Lanzhou, China
- Expert Workstation of Academician Wang Longde, The Second Hospital of Lanzhou University, Lanzhou, China
| | - Kangmei Shao
- Department of Neurology, Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
| | - Yongnan Li
- Department of Cardiac Surgery, Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
- Yongnan Li,
| | - Zhaoming Ge
- Department of Neurology, Lanzhou University Second Hospital, Lanzhou University, Lanzhou, China
- Gansu Provincial Neurology Clinical Medical Research Center, The Second Hospital of Lanzhou University, Lanzhou, China
- Expert Workstation of Academician Wang Longde, The Second Hospital of Lanzhou University, Lanzhou, China
- *Correspondence: Zhaoming Ge,
| |
Collapse
|
5
|
Ren J, Zhang Q, Zhou Y, Hu Y, Lyu X, Fang H, Yang J, Yu R, Shi X, Li Q. A downsampling Method Enables Robust Clustering and Integration of Single-Cell Transcriptome Data. J Biomed Inform 2022; 130:104093. [DOI: 10.1016/j.jbi.2022.104093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 04/06/2022] [Accepted: 05/03/2022] [Indexed: 11/27/2022]
|
6
|
Wang CY, Gao YL, Liu JX, Kong XZ, Zheng CH. Single-Cell RNA Sequencing Data Clustering by Low-Rank Subspace Ensemble Framework. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1154-1164. [PMID: 33026977 DOI: 10.1109/tcbb.2020.3029187] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The rapid development of single-cell RNA sequencing (scRNA-seq)technology reveals the gene expression status and gene structure of individual cells, reflecting the heterogeneity and diversity of cells. The traditional methods of scRNA-seq data analysis treat data as the same subspace, and hide structural information in other subspaces. In this paper, we propose a low-rank subspace ensemble clustering framework (LRSEC)to analyze scRNA-seq data. Assuming that the scRNA-seq data exist in multiple subspaces, the low-rank model is used to find the lowest rank representation of the data in the subspace. It is worth noting that the penalty factor of the low-rank kernel function is uncertain, and different penalty factors correspond to different low-rank structures. Moreover, the single cluster model is difficult to find the cellular structure of all datasets. To strengthen the correlation between model solutions, we construct a new ensemble clustering framework LRSEC by using the low-rank model as the basic learner. The LRSEC framework captures the global structure of data through low-rank subspaces, which has better clustering performance than a single clustering model. We validate the performance of the LRSEC framework on seven small datasets and one large dataset and obtain satisfactory results.
Collapse
|
7
|
Wu W, Liu Z, Ma X. jSRC: a flexible and accurate joint learning algorithm for clustering of single-cell RNA-sequencing data. Brief Bioinform 2021; 22:bbaa433. [PMID: 33535230 PMCID: PMC7953970 DOI: 10.1093/bib/bbaa433] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 12/21/2020] [Accepted: 12/24/2020] [Indexed: 02/01/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) explores the transcriptome of genes at cell level, which sheds light on revealing the heterogeneity and dynamics of cell populations. Advances in biotechnologies make it possible to generate scRNA-seq profiles for large-scale cells, requiring effective and efficient clustering algorithms to identify cell types and informative genes. Although great efforts have been devoted to clustering of scRNA-seq, the accuracy, scalability and interpretability of available algorithms are not desirable. In this study, we solve these problems by developing a joint learning algorithm [a.k.a. joints sparse representation and clustering (jSRC)], where the dimension reduction (DR) and clustering are integrated. Specifically, DR is employed for the scalability and joint learning improves accuracy. To increase the interpretability of patterns, we assume that cells within the same type have similar expression patterns, where the sparse representation is imposed on features. We transform clustering of scRNA-seq into an optimization problem and then derive the update rules to optimize the objective of jSRC. Fifteen scRNA-seq datasets from various tissues and organisms are adopted to validate the performance of jSRC, where the number of single cells varies from 49 to 110 824. The experimental results demonstrate that jSRC significantly outperforms 12 state-of-the-art methods in terms of various measurements (on average 20.29% by improvement) with fewer running time. Furthermore, jSRC is efficient and robust across different scRNA-seq datasets from various tissues. Finally, jSRC also accurately identifies dynamic cell types associated with progression of COVID-19. The proposed model and methods provide an effective strategy to analyze scRNA-seq data (the software is coded using MATLAB and is free for academic purposes; https://github.com/xkmaxidian/jSRC).
Collapse
Affiliation(s)
- Wenming Wu
- School of Computer Science and Technology, Xidian University, Xi’an, 710071, China
| | - Zaiyi Liu
- Department of Radiology, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Zhongshan Road, Guangzhou, 510080, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi’an, 710071, China
| |
Collapse
|
8
|
Peng M, Wamsley B, Elkins AG, Geschwind DH, Wei Y, Roeder K. Cell type hierarchy reconstruction via reconciliation of multi-resolution cluster tree. Nucleic Acids Res 2021; 49:e91. [PMID: 34125905 PMCID: PMC8450107 DOI: 10.1093/nar/gkab481] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 05/12/2021] [Accepted: 05/18/2021] [Indexed: 02/01/2023] Open
Abstract
A wealth of clustering algorithms are available for single-cell RNA sequencing (scRNA-seq) data to enable the identification of functionally distinct subpopulations that each possess a different pattern of gene expression activity. Implementation of these methods requires a choice of resolution parameter to determine the number of clusters, and critical judgment from the researchers is required to determine the desired resolution. This supervised process takes significant time and effort. Moreover, it can be difficult to compare and characterize the evolution of cell clusters from results obtained at one single resolution. To overcome these challenges, we built Multi-resolution Reconciled Tree (MRtree), a highly flexible tree-construction algorithm that generates a cluster hierarchy from flat clustering results attained for a range of resolutions. Because MRtree can be coupled with most scRNA-seq clustering algorithms, it inherits the robustness and versatility of a flat clustering approach, while maintaining the hierarchical structure of cells. The constructed trees from multiple scRNA-seq datasets effectively reflect the extent of transcriptional distinctions among cell groups and align well with levels of functional specializations among cells. Importantly, application to fetal brain cells identified subtypes of cells determined mainly by maturation states, spatial location and terminal specification.
Collapse
Affiliation(s)
- Minshi Peng
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Brie Wamsley
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Andrew G Elkins
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Daniel H Geschwind
- Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
- Program in Neurobehavioral Genetics and Center for Autism Research and Treatment Semel Institute and Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| | - Yuting Wei
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kathryn Roeder
- To whom correspondence should be addressed. Tel: +1 412 268 577; Fax: +1 412 268 7828;
| |
Collapse
|
9
|
Lu T, Mar JC. Investigating transcriptome-wide sex dimorphism by multi-level analysis of single-cell RNA sequencing data in ten mouse cell types. Biol Sex Differ 2020; 11:61. [PMID: 33153500 PMCID: PMC7643324 DOI: 10.1186/s13293-020-00335-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2020] [Accepted: 10/11/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND It is a long established fact that sex is an important factor that influences the transcriptional regulatory processes of an organism. However, understanding sex-based differences in gene expression has been limited because existing studies typically sequence and analyze bulk tissue from female or male individuals. Such analyses average cell-specific gene expression levels where cell-to-cell variation can easily be concealed. We therefore sought to utilize data generated by the rapidly developing single cell RNA sequencing (scRNA-seq) technology to explore sex dimorphism and its functional consequences at the single cell level. METHODS Our study included scRNA-seq data of ten well-defined cell types from the brain and heart of female and male young adult mice in the publicly available tissue atlas dataset, Tabula Muris. We combined standard differential expression analysis with the identification of differential distributions in single cell transcriptomes to test for sex-based gene expression differences in each cell type. The marker genes that had sex-specific inter-cellular changes in gene expression formed the basis for further characterization of the cellular functions that were differentially regulated between the female and male cells. We also inferred activities of transcription factor-driven gene regulatory networks by leveraging knowledge of multidimensional protein-to-genome and protein-to-protein interactions and analyzed pathways that were potential modulators of sex differentiation and dimorphism. RESULTS For each cell type in this study, we identified marker genes with significantly different mean expression levels or inter-cellular distribution characteristics between female and male cells. These marker genes were enriched in pathways that were closely related to the biological functions of each cell type. We also identified sub-cell types that possibly carry out distinct biological functions that displayed discrepancies between female and male cells. Additionally, we found that while genes under differential transcriptional regulation exhibited strong cell type specificity, six core transcription factor families responsible for most sex-dimorphic transcriptional regulation activities were conserved across the cell types, including ASCL2, EGR, GABPA, KLF/SP, RXRα, and ZF. CONCLUSIONS We explored novel gene expression-based biomarkers, functional cell group compositions, and transcriptional regulatory networks associated with sex dimorphism with a novel computational pipeline. Our findings indicated that sex dimorphism might be widespread across the transcriptomes of cell types, cell type-specific, and impactful for regulating cellular activities.
Collapse
Affiliation(s)
- Tianyuan Lu
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia.,Quantitative Life Sciences Program, McGill University, Montreal, QC, H3A 0G4, Canada
| | - Jessica C Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
10
|
Peng L, Tian X, Tian G, Xu J, Huang X, Weng Y, Yang J, Zhou L. Single-cell RNA-seq clustering: datasets, models, and algorithms. RNA Biol 2020; 17:765-783. [PMID: 32116127 PMCID: PMC7549635 DOI: 10.1080/15476286.2020.1728961] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/10/2020] [Accepted: 01/11/2020] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies allow numerous opportunities for revealing novel and potentially unexpected biological discoveries. scRNA-seq clustering helps elucidate cell-to-cell heterogeneity and uncover cell subgroups and cell dynamics at the group level. Two important aspects of scRNA-seq data analysis were introduced and discussed in the present review: relevant datasets and analytical tools. In particular, we reviewed popular scRNA-seq datasets and discussed scRNA-seq clustering models including K-means clustering, hierarchical clustering, consensus clustering, and so on. Seven state-of-the-art scRNA clustering methods were compared on five public available datasets. Two primary evaluation metrics, the Adjusted Rand Index (ARI) and the Normalized Mutual Information (NMI), were used to evaluate these methods. Although unsupervised models can effectively cluster scRNA-seq data, these methods also have challenges. Some suggestions were provided for future research directions.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, Beijing, China
| | - Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xin Huang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yanbin Weng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | | | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
11
|
de Kanter JK, Lijnzaad P, Candelli T, Margaritis T, Holstege FCP. CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing. Nucleic Acids Res 2019; 47:e95. [PMID: 31226206 PMCID: PMC6895264 DOI: 10.1093/nar/gkz543] [Citation(s) in RCA: 119] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 06/05/2019] [Accepted: 06/08/2019] [Indexed: 01/06/2023] Open
Abstract
Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH’s accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.
Collapse
Affiliation(s)
- Jurrian K de Kanter
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Philip Lijnzaad
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Tito Candelli
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Thanasis Margaritis
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Frank C P Holstege
- Princess Máxima Center for Pediatric Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| |
Collapse
|
12
|
Hajkarim MC, Won KJ. Single Cell RNA-Sequencing for the Study of Atherosclerosis. J Lipid Atheroscler 2019; 8:152-161. [PMID: 32821705 PMCID: PMC7379113 DOI: 10.12997/jla.2019.8.2.152] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 06/28/2019] [Accepted: 07/16/2019] [Indexed: 12/12/2022] Open
Abstract
Atherosclerosis is a major cause of coronary artery disease and stroke. A massive and new type of data has finally arrived in the field of atherosclerosis: single cell RNA sequencing (scRNAseq). Recently, scRNAseq has been successfully applied to the study of atherosclerosis to identify previously uncharacterized cell populations. scRNAseq is an effective approach to evaluate heterogeneous cell populations by measuring the transcriptomic profiles at the single cell level. Besides the studies of atherosclerosis, scRNAseq is being employed in various areas of biology, including cancer research and organ development. In order to analyze these new massive datasets, various analytic approaches have been developed. This review aims to enhance the understanding of this new technology by exploring how the single cell transcriptome has been applied to the study of atherosclerosis and further discuss potential analysis of using scRNAseq.
Collapse
Affiliation(s)
- Morteza Chalabi Hajkarim
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), Faculty of Health Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kyoung Jae Won
- Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology (DanStem), Faculty of Health Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
13
|
He Z, Zhang J, Yuan X, Xi J, Liu Z, Zhang Y. Stratification of Breast Cancer by Integrating Gene Expression Data and Clinical Variables. Molecules 2019; 24:molecules24030631. [PMID: 30754661 PMCID: PMC6385100 DOI: 10.3390/molecules24030631] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 01/26/2019] [Accepted: 02/03/2019] [Indexed: 11/25/2022] Open
Abstract
Breast cancer is a heterogeneous disease. Although gene expression profiling has led to the definition of several subtypes of breast cancer, the precise discovery of the subtypes remains a challenge. Clinical data is another promising source. In this study, clinical variables are utilized and integrated to gene expressions for the stratification of breast cancer. We adopt two phases: gene selection and clustering, where the integration is in the gene selection phase; only genes whose expressions are most relevant to each clinical variable and least redundant among themselves are selected for further clustering. In practice, we simply utilize maximum relevance minimum redundancy (mRMR) for gene selection and k-means for clustering. We compare the results of our method with those of two commonly used only expression-based breast cancer stratification methods: prediction analysis of microarray 50 (PAM50) and highest variability (HV). The result is that our method outperforms them in identifying subtypes significantly associated with five-year survival and recurrence time. Specifically, our method identified recurrence-associated breast cancer subtypes that were not identified by PAM50 and HV. Additionally, our analysis discovered three survival-associated luminal-A subgroups and two survival-associated luminal-B subgroups. The study indicates that screening clinically relevant gene expressions yields improved breast cancer stratification.
Collapse
Affiliation(s)
- Zongzhen He
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
| | - Junying Zhang
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
| | - Jianing Xi
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
| | - Zhaowen Liu
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA.
| | - Yuanyuan Zhang
- School of Computer Engineering, Qingdao University of Technology, Qingdao 266033, China.
| |
Collapse
|