1
|
Katoh M, Nomura S, Yamada S, Ito M, Hayashi H, Katagiri M, Heryed T, Fujiwara T, Takeda N, Nishida M, Sugaya M, Kato M, Osawa T, Abe H, Sakurai Y, Ko T, Fujita K, Zhang B, Hatsuse S, Yamada T, Inoue S, Dai Z, Kubota M, Sawami K, Ono M, Morita H, Kubota Y, Mizuno S, Takahashi S, Nakanishi M, Ushiku T, Nakagami H, Aburatani H, Komuro I. Vaccine Therapy for Heart Failure Targeting the Inflammatory Cytokine Igfbp7. Circulation 2024; 150:374-389. [PMID: 38991046 DOI: 10.1161/circulationaha.123.064719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 05/29/2024] [Indexed: 07/13/2024]
Abstract
BACKGROUND The heart comprises many types of cells such as cardiomyocytes, endothelial cells (ECs), fibroblasts, smooth muscle cells, pericytes, and blood cells. Every cell type responds to various stressors (eg, hemodynamic overload and ischemia) and changes its properties and interrelationships among cells. To date, heart failure research has focused mainly on cardiomyocytes; however, other types of cells and their cell-to-cell interactions might also be important in the pathogenesis of heart failure. METHODS Pressure overload was imposed on mice by transverse aortic constriction and the vascular structure of the heart was examined using a tissue transparency technique. Functional and molecular analyses including single-cell RNA sequencing were performed on the hearts of wild-type mice and EC-specific gene knockout mice. Metabolites in heart tissue were measured by capillary electrophoresis-time of flight-mass spectrometry system. The vaccine was prepared by conjugating the synthesized epitope peptides with keyhole limpet hemocyanin and administered to mice with aluminum hydroxide as an adjuvant. Tissue samples from heart failure patients were used for single-nucleus RNA sequencing to examine gene expression in ECs and perform pathway analysis in cardiomyocytes. RESULTS Pressure overload induced the development of intricately entwined blood vessels in murine hearts, leading to the accumulation of replication stress and DNA damage in cardiac ECs. Inhibition of cell proliferation by a cyclin-dependent kinase inhibitor reduced DNA damage in ECs and ameliorated transverse aortic constriction-induced cardiac dysfunction. Single-cell RNA sequencing analysis revealed upregulation of Igfbp7 (insulin-like growth factor-binding protein 7) expression in the senescent ECs and downregulation of insulin signaling and oxidative phosphorylation in cardiomyocytes of murine and human failing hearts. Overexpression of Igfbp7 in the murine heart using AAV9 (adeno-associated virus serotype 9) exacerbated cardiac dysfunction, while EC-specific deletion of Igfbp7 and the vaccine targeting Igfbp7 ameliorated cardiac dysfunction with increased oxidative phosphorylation in cardiomyocytes under pressure overload. CONCLUSIONS Igfbp7 produced by senescent ECs causes cardiac dysfunction and vaccine therapy targeting Igfbp7 may be useful to prevent the development of heart failure.
Collapse
Affiliation(s)
- Manami Katoh
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
- Frontier Cardiovascular Science (M.Katoh, T.K., S.I., S.N., I.K.), The University of Tokyo, Japan
- Genome Science Division (M.Katoh, S.N., H. Aburatani), The University of Tokyo, Japan
| | - Seitaro Nomura
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
- Frontier Cardiovascular Science (M.Katoh, T.K., S.I., S.N., I.K.), The University of Tokyo, Japan
- Genome Science Division (M.Katoh, S.N., H. Aburatani), The University of Tokyo, Japan
| | - Shintaro Yamada
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Masamichi Ito
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Hiroki Hayashi
- Department of Health Development and Medicine, Graduate School of Medicine, Osaka University, Suita, Japan (H.H., H.N.)
| | - Mikako Katagiri
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Tuolisi Heryed
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Takayuki Fujiwara
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Norifumi Takeda
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Miyuki Nishida
- Division of Integrative Nutriomics and Oncology, Research Center for Advanced Science and Technology (M. Nishida, M.S., M.K., T.O.), The University of Tokyo, Japan
| | - Maki Sugaya
- Division of Integrative Nutriomics and Oncology, Research Center for Advanced Science and Technology (M. Nishida, M.S., M.K., T.O.), The University of Tokyo, Japan
| | - Miki Kato
- Division of Integrative Nutriomics and Oncology, Research Center for Advanced Science and Technology (M. Nishida, M.S., M.K., T.O.), The University of Tokyo, Japan
| | - Tsuyoshi Osawa
- Division of Integrative Nutriomics and Oncology, Research Center for Advanced Science and Technology (M. Nishida, M.S., M.K., T.O.), The University of Tokyo, Japan
| | - Hiroyuki Abe
- Pathology (H. Abe, T.U.), The University of Tokyo, Japan
| | - Yoshitaka Sakurai
- Diabetes and Metabolic Diseases, Graduate School of Medicine (Y.S.), The University of Tokyo, Japan
| | - Toshiyuki Ko
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
- Frontier Cardiovascular Science (M.Katoh, T.K., S.I., S.N., I.K.), The University of Tokyo, Japan
| | - Kanna Fujita
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Bo Zhang
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Satoshi Hatsuse
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Takanobu Yamada
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Shunsuke Inoue
- Frontier Cardiovascular Science (M.Katoh, T.K., S.I., S.N., I.K.), The University of Tokyo, Japan
| | - Zhehao Dai
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Masayuki Kubota
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Kousuke Sawami
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Minoru Ono
- Cardiothoracic Surgery (M.O.), The University of Tokyo, Japan
| | - Hiroyuki Morita
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Yoshiaki Kubota
- Department of Anatomy, Keio University School of Medicine, Tokyo, Japan (Y.K.)
| | - Seiya Mizuno
- Laboratory Animal Resource Center, Transborder Medical Research Center, Institute of Medicine, University of Tsukuba, Ibaraki, Japan (S.M., S.T.)
| | - Satoru Takahashi
- Laboratory Animal Resource Center, Transborder Medical Research Center, Institute of Medicine, University of Tsukuba, Ibaraki, Japan (S.M., S.T.)
| | - Makoto Nakanishi
- Division of Cancer Cell Biology, The Institute of Medical Science (M. Nakanishi), The University of Tokyo, Japan
| | - Tetsuo Ushiku
- Pathology (H. Abe, T.U.), The University of Tokyo, Japan
| | - Hironori Nakagami
- Departments of Cardiovascular Medicine (M.Katoh, S.N., S.Y., M.I., M.Katagiri, T.H., T.F., N.T., T.K., K.F., B.Z., S.H., T.Y., S.I., Z.D., M.Kubota, K.S., H.M., I.K.), The University of Tokyo, Japan
| | - Hiroyuki Aburatani
- Genome Science Division (M.Katoh, S.N., H. Aburatani), The University of Tokyo, Japan
| | - Issei Komuro
- Frontier Cardiovascular Science (M.Katoh, T.K., S.I., S.N., I.K.), The University of Tokyo, Japan
- Laboratory Animal Resource Center, Transborder Medical Research Center, Institute of Medicine, University of Tsukuba, Ibaraki, Japan (S.M., S.T.)
| |
Collapse
|
2
|
Lodi MK, Lodi M, Osei K, Ranganathan V, Hwang P, Ghosh P. CHAI: Consensus Clustering Through Similarity Matrix Integration for Cell-Type Identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585758. [PMID: 38562750 PMCID: PMC10983883 DOI: 10.1101/2024.03.19.585758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state of the art clustering methods: CHAI-AvgSim and CHAI-SNF. Both methods demonstrate improved performance on a diverse selection of benchmarking datasets, besides also outperforming a previous consensus clustering method. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI is intuitive and easily customizable; it provides a way for users to add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284
| | - Muzammil Lodi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| | - Kezie Osei
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284
| | | | - Priscilla Hwang
- Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA 23284
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| |
Collapse
|
3
|
Wang YM, Sun Y, Wang B, Wu Z, He XY, Zhao Y. Transfer learning for clustering single-cell RNA-seq data crossing-species and batch, case on uterine fibroids. Brief Bioinform 2023; 25:bbad426. [PMID: 37991248 PMCID: PMC10664408 DOI: 10.1093/bib/bbad426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 10/12/2023] [Accepted: 10/30/2023] [Indexed: 11/23/2023] Open
Abstract
Due to the high dimensionality and sparsity of the gene expression matrix in single-cell RNA-sequencing (scRNA-seq) data, coupled with significant noise generated by shallow sequencing, it poses a great challenge for cell clustering methods. While numerous computational methods have been proposed, the majority of existing approaches center on processing the target dataset itself. This approach disregards the wealth of knowledge present within other species and batches of scRNA-seq data. In light of this, our paper proposes a novel method named graph-based deep embedding clustering (GDEC) that leverages transfer learning across species and batches. GDEC integrates graph convolutional networks, effectively overcoming the challenges posed by sparse gene expression matrices. Additionally, the incorporation of DEC in GDEC enables the partitioning of cell clusters within a lower-dimensional space, thereby mitigating the adverse effects of noise on clustering outcomes. GDEC constructs a model based on existing scRNA-seq datasets and then applying transfer learning techniques to fine-tune the model using a limited amount of prior knowledge gleaned from the target dataset. This empowers GDEC to adeptly cluster scRNA-seq data cross different species and batches. Through cross-species and cross-batch clustering experiments, we conducted a comparative analysis between GDEC and conventional packages. Furthermore, we implemented GDEC on the scRNA-seq data of uterine fibroids. Compared results obtained from the Seurat package, GDEC unveiled a novel cell type (epithelial cells) and identified a notable number of new pathways among various cell types, thus underscoring the enhanced analytical capabilities of GDEC. Availability and implementation: https://github.com/YuzhiSun/GDEC/tree/main.
Collapse
Affiliation(s)
- Yu Mei Wang
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Yuzhi Sun
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Beiying Wang
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Zhiping Wu
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Xiao Ying He
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai , China
- Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai,China
| | - Yuansong Zhao
- University of Texas Health Science Center at Houston, 77030-5400, USA
| |
Collapse
|
4
|
Wang LP, Liu JX, Shang JL, Kong XZ, Guan BX, Wang J. KGLRR: A low-rank representation K-means with graph regularization constraint method for Single-cell type identification. Comput Biol Chem 2023; 104:107862. [PMID: 37031647 DOI: 10.1016/j.compbiolchem.2023.107862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023]
Abstract
Single-cell RNA sequencing technology provides a tremendous opportunity for studying disease mechanisms at the single-cell level. Cell type identification is a key step in the research of disease mechanisms. Many clustering algorithms have been proposed to identify cell types. Most clustering algorithms perform similarity calculation before cell clustering. Because clustering and similarity calculation are independent, a low-rank matrix obtained only by similarity calculation may be unable to fully reveal the patterns in single-cell data. In this study, to capture accurate single-cell clustering information, we propose a novel method based on a low-rank representation model, called KGLRR, that combines the low-rank representation approach with K-means clustering. The cluster centroid is updated as the cell dimension decreases to better from new clusters and improve the quality of clustering information. In addition, the low-rank representation model ignores local geometric information, so the graph regularization constraint is introduced. KGLRR is tested on both simulated and real single-cell datasets to validate the effectiveness of the new method. The experimental results show that KGLRR is more robust and accurate in cell type identification than other advanced algorithms.
Collapse
Affiliation(s)
- Lin-Ping Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jun-Liang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Xiang-Zhen Kong
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Bo-Xin Guan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China.
| |
Collapse
|
5
|
Rudar J, Golding GB, Kremer SC, Hajibabaei M. Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data. Microbiol Spectr 2023; 11:e0206522. [PMID: 36877086 PMCID: PMC10100742 DOI: 10.1128/spectrum.02065-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 02/11/2023] [Indexed: 03/07/2023] Open
Abstract
Developing an understanding of how microbial communities vary across conditions is an important analytical step. We used 16S rRNA data isolated from human stool samples to investigate whether learned dissimilarities, such as those produced using unsupervised decision tree ensembles, can be used to improve the analysis of the composition of bacterial communities in patients suffering from Crohn's disease and adenomas/colorectal cancers. We also introduce a workflow capable of learning dissimilarities, projecting them into a lower dimensional space, and identifying features that impact the location of samples in the projections. For example, when used with the centered log ratio transformation, our new workflow (TreeOrdination) could identify differences in the microbial communities of Crohn's disease patients and healthy controls. Further investigation of our models elucidated the global impact amplicon sequence variants (ASVs) had on the locations of samples in the projected space and how each ASV impacted individual samples in this space. Furthermore, this approach can be used to integrate patient data easily into the model and results in models that generalize well to unseen data. Models employing multivariate splits can improve the analysis of complex high-throughput sequencing data sets because they are better able to learn about the underlying structure of the data set. IMPORTANCE There is an ever-increasing level of interest in accurately modeling and understanding the roles that commensal organisms play in human health and disease. We show that learned representations can be used to create informative ordinations. We also demonstrate that the application of modern model introspection algorithms can be used to investigate and quantify the impacts of taxa in these ordinations, and that the taxa identified by these approaches have been associated with immune-mediated inflammatory diseases and colorectal cancer.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Stefan C. Kremer
- School of Computer Science, University of Guelph, Guelph, Ontario, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
6
|
Wang Y, Yu Z, Li S, Bian C, Liang Y, Wong KC, Li X. scBGEDA: deep single-cell clustering analysis via a dual denoising autoencoder with bipartite graph ensemble clustering. Bioinformatics 2023; 39:7025496. [PMID: 36734596 PMCID: PMC9925104 DOI: 10.1093/bioinformatics/btad075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 12/08/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION Single-cell RNA sequencing (scRNA-seq) is an increasingly popular technique for transcriptomic analysis of gene expression at the single-cell level. Cell-type clustering is the first crucial task in the analysis of scRNA-seq data that facilitates accurate identification of cell types and the study of the characteristics of their transcripts. Recently, several computational models based on a deep autoencoder and the ensemble clustering have been developed to analyze scRNA-seq data. However, current deep autoencoders are not sufficient to learn the latent representations of scRNA-seq data, and obtaining consensus partitions from these feature representations remains under-explored. RESULTS To address this challenge, we propose a single-cell deep clustering model via a dual denoising autoencoder with bipartite graph ensemble clustering called scBGEDA, to identify specific cell populations in single-cell transcriptome profiles. First, a single-cell dual denoising autoencoder network is proposed to project the data into a compressed low-dimensional space and that can learn feature representation via explicit modeling of synergistic optimization of the zero-inflated negative binomial reconstruction loss and denoising reconstruction loss. Then, a bipartite graph ensemble clustering algorithm is designed to exploit the relationships between cells and the learned latent embedded space by means of a graph-based consensus function. Multiple comparison experiments were conducted on 20 scRNA-seq datasets from different sequencing platforms using a variety of clustering metrics. The experimental results indicated that scBGEDA outperforms other state-of-the-art methods on these datasets, and also demonstrated its scalability to large-scale scRNA-seq datasets. Moreover, scBGEDA was able to identify cell-type specific marker genes and provide functional genomic analysis by quantifying the influence of genes on cell clusters, bringing new insights into identifying cell types and characterizing the scRNA-seq data from different perspectives. AVAILABILITY AND IMPLEMENTATION The source code of scBGEDA is available at https://github.com/wangyh082/scBGEDA. The software and the supporting data can be downloaded from https://figshare.com/articles/software/scBGEDA/19657911. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yunhe Wang
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Zhuohan Yu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Shaochuan Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Chuang Bian
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yanchun Liang
- Zhuhai Laboratory of Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Science and Technology, Zhuhai, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| |
Collapse
|
7
|
Sen Puliparambil B, Tomal JH, Yan Y. A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data. BIOLOGY 2022; 11:biology11101495. [PMID: 36290397 PMCID: PMC9598401 DOI: 10.3390/biology11101495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 09/21/2022] [Accepted: 09/30/2022] [Indexed: 11/05/2022]
Abstract
With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used.
Collapse
Affiliation(s)
- Bhavithry Sen Puliparambil
- Master of Science in Data Science Program, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
- Correspondence:
| | - Jabed H. Tomal
- Department of Mathematics and Statistics, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
| | - Yan Yan
- Department of Computing Science, Thompson Rivers University, 805 TRU Way, Kamloops, BC V2C 0C8, Canada
| |
Collapse
|
8
|
Mcloughlin A, Huang H. Shared Differential Expression-Based Distance Reflects Global Cell Type Relationships in Single-Cell RNA Sequencing Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022; 29:867-879. [PMID: 35793527 PMCID: PMC9419948 DOI: 10.1089/cmb.2021.0652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Unsupervised cell clustering on the basis of meaningful biological variation in single-cell RNA sequencing (scRNA seq) data has received significant attention, as it assists with ontological subpopulation identification among the data. A key step in the clustering process is to compute distances between the cells under a specified distance measure. Although particular distance measures may successfully separate cells into biologically relevant clusters, they may fail to retain global structure of the data, such as relative similarity between the cell clusters. In this article, we modify a biologically motivated distance measure, SIDEseq, for use of aggregate comparisons of cell types in large single-cell assays, and demonstrate that, across simulated and real scRNA seq data, the distance matrix more consistently retains global cell type relationships than commonly used distance measures for scRNA seq clustering. We call the modified distance measure "SIDEREF." We explore spectral dimension reduction of the SIDEREF distance matrix as a means of noise filtering, similar to principal components analysis applied directly to expression data. We utilize a summary measure of relative cell type distances to better display the cell group relationships. SIDEREF visualizations more consistently reflect global structures in the data than other commonly considered distance measures. We utilize relative cell type distances and the SIDEREF distance measure to uncover compositional differences between annotated leukocyte cell groups in a compendium of Mus musculus scRNA seq assays comprising 12 tissues. SIDEREF and associated analysis is openly available on GitHub.
Collapse
Affiliation(s)
- Aidan Mcloughlin
- Division of Biostatistics and Department of Statistics,Berkeley, Berkeley, California, USA
| | - Haiyan Huang
- Division of Biostatistics University of California, Berkeley, Berkeley, California, USA
| |
Collapse
|
9
|
Feng W, Schriever H, Jiang S, Bais A, Wu H, Kostka D, Li G. Computational profiling of hiPSC-derived heart organoids reveals chamber defects associated with NKX2-5 deficiency. Commun Biol 2022; 5:399. [PMID: 35488063 PMCID: PMC9054831 DOI: 10.1038/s42003-022-03346-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 04/10/2022] [Indexed: 11/29/2022] Open
Abstract
Heart organoids have the potential to generate primary heart-like anatomical structures and hold great promise as in vitro models for cardiac disease. However, their properties have not yet been fully studied, which hinders their wide spread application. Here we report the development of differentiation systems for ventricular and atrial heart organoids, enabling the study of heart diseases with chamber defects. We show that our systems generate chamber-specific organoids comprising of the major cardiac cell types, and we use single cell RNA sequencing together with sample multiplexing to characterize the cells we generate. To that end, we developed a machine learning label transfer approach leveraging cell type, chamber, and laterality annotations available for primary human fetal heart cells. We then used this model to analyze organoid cells from an isogeneic line carrying an Ebstein’s anomaly associated genetic variant in NKX2-5, and we successfully recapitulated the disease’s atrialized ventricular defects. In summary, we have established a workflow integrating heart organoids and computational analysis to model heart development in normal and disease states. A human cardiac organoid system, coupled with single cell RNA sequencing and machine learning for transcriptional phenotyping, was developed. This allowed investigation of a genetic variant associated with Ebstein’s Anomaly, a congenital heart disease with chamber defects.
Collapse
Affiliation(s)
- Wei Feng
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Hannah Schriever
- Joint Carnegie Mellon, University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA, USA
| | - Shan Jiang
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Abha Bais
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Haodi Wu
- Vascular Medicine Institute Division of Cardiology, University of Pittsburgh Department of Medicine, Pittsburgh, PA, USA
| | - Dennis Kostka
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. .,Joint Carnegie Mellon, University of Pittsburgh Ph.D. Program in Computational Biology, Pittsburgh, PA, USA. .,Department of Computational & Systems Biology and Pittsburgh Center for Evolutionary Biology and Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| | - Guang Li
- Department of Developmental Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
10
|
Helmy M, Agrawal R, Ali J, Soudy M, Bui TT, Selvarajoo K. GeneCloudOmics: A Data Analytic Cloud Platform for High-Throughput Gene Expression Analysis. FRONTIERS IN BIOINFORMATICS 2021; 1:693836. [PMID: 36303746 PMCID: PMC9581002 DOI: 10.3389/fbinf.2021.693836] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 10/14/2021] [Indexed: 11/18/2022] Open
Abstract
Gene expression profiling techniques, such as DNA microarray and RNA-Sequencing, have provided significant impact on our understanding of biological systems. They contribute to almost all aspects of biomedical research, including studying developmental biology, host-parasite relationships, disease progression and drug effects. However, the high-throughput data generations present challenges for many wet experimentalists to analyze and take full advantage of such rich and complex data. Here we present GeneCloudOmics, an easy-to-use web server for high-throughput gene expression analysis that extends the functionality of our previous ABioTrans with several new tools, including protein datasets analysis, and a web interface. GeneCloudOmics allows both microarray and RNA-Seq data analysis with a comprehensive range of data analytics tools in one package that no other current standalone software or web-based tool can do. In total, GeneCloudOmics provides the user access to 23 different data analytical and bioinformatics tasks including reads normalization, scatter plots, linear/non-linear correlations, PCA, clustering (hierarchical, k-means, t-SNE, SOM), differential expression analyses, pathway enrichments, evolutionary analyses, pathological analyses, and protein-protein interaction (PPI) identifications. Furthermore, GeneCloudOmics allows the direct import of gene expression data from the NCBI Gene Expression Omnibus database. The user can perform all tasks rapidly through an intuitive graphical user interface that overcomes the hassle of coding, installing tools/packages/libraries and dealing with operating systems compatibility and version issues, complications that make data analysis tasks challenging for biologists. Thus, GeneCloudOmics is a one-stop open-source tool for gene expression data analysis and visualization. It is freely available at http://combio-sifbi.org/GeneCloudOmics.
Collapse
Affiliation(s)
- Mohamed Helmy
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Computer Science, Lakehead University, Thunder Bay, ON, Canada
| | - Rahul Agrawal
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Javed Ali
- Department of Geology and Geophysics, Indian Institute of Technology (IIT) Kharagpur, Kharagpur, India
| | - Mohamed Soudy
- Proteomics and Metabolomics Unit, Children Cancer Hospital (CCHE-57357), Cairo, Egypt
| | - Thuy Tien Bui
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kumar Selvarajoo
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Singapore Institute of Food and Biotechnology Innovation (SIFBI), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Synthetic Biology for Clinical and Technological Innovation (SynCTI), National University of Singapore (NUS), Singapore, Singapore
- *Correspondence: Kumar Selvarajoo,
| |
Collapse
|
11
|
Zhao Y, Fang ZY, Lin CX, Deng C, Xu YP, Li HD. RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest. Front Genet 2021; 12:665843. [PMID: 34386033 PMCID: PMC8354212 DOI: 10.3389/fgene.2021.665843] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/01/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, the application of single cell RNA-seq (scRNA-seq) has become more and more popular in fields such as biology and medical research. Analyzing scRNA-seq data can discover complex cell populations and infer single-cell trajectories in cell development. Clustering is one of the most important methods to analyze scRNA-seq data. In this paper, we focus on improving scRNA-seq clustering through gene selection, which also reduces the dimensionality of scRNA-seq data. Studies have shown that gene selection for scRNA-seq data can improve clustering accuracy. Therefore, it is important to select genes with cell type specificity. Gene selection not only helps to reduce the dimensionality of scRNA-seq data, but also can improve cell type identification in combination with clustering methods. Here, we proposed RFCell, a supervised gene selection method, which is based on permutation and random forest classification. We first use RFCell and three existing gene selection methods to select gene sets on 10 scRNA-seq data sets. Then, three classical clustering algorithms are used to cluster the cells obtained by these gene selection methods. We found that the gene selection performance of RFCell was better than other gene selection methods.
Collapse
Affiliation(s)
- Yuan Zhao
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, China
| | - Cui-Xiang Lin
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chao Deng
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yun-Pei Xu
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
12
|
Fang ZY, Lin CX, Xu YP, Li HD, Xu QS. REBET: a method to determine the number of cell clusters based on batch effect removal. Brief Bioinform 2021; 22:6299206. [PMID: 34131702 DOI: 10.1093/bib/bbab204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/20/2021] [Accepted: 03/12/2021] [Indexed: 01/01/2023] Open
Abstract
In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).
Collapse
Affiliation(s)
- Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Cui-Xiang Lin
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
13
|
Liang Z, Li M, Zheng R, Tian Y, Yan X, Chen J, Wu FX, Wang J. SSRE: Cell Type Detection Based on Sparse Subspace Representation and Similarity Enhancement. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:282-291. [PMID: 33647482 PMCID: PMC8602764 DOI: 10.1016/j.gpb.2020.09.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 08/13/2020] [Accepted: 10/29/2020] [Indexed: 11/25/2022]
Abstract
Accurate identification of cell types from single-cell RNA sequencing (scRNA-seq) data plays a critical role in a variety of scRNA-seq analysis studies. This task corresponds to solving an unsupervised clustering problem, in which the similarity measurement between cells affects the result significantly. Although many approaches for cell type identification have been proposed, the accuracy still needs to be improved. In this study, we proposed a novel single-cell clustering framework based on similarity learning, called SSRE. SSRE models the relationships between cells based on subspace assumption, and generates a sparse representation of the cell-to-cell similarity. The sparse representation retains the most similar neighbors for each cell. Besides, three classical pairwise similarities are incorporated with a gene selection and enhancement strategy to further improve the effectiveness of SSRE. Tested on ten real scRNA-seq datasets and five simulated datasets, SSRE achieved the superior performance in most cases compared to several state-of-the-art single-cell clustering methods. In addition, SSRE can be extended to visualization of scRNA-seq data and identification of differentially expressed genes. The matlab and python implementations of SSRE are available at https://github.com/CSUBioGroup/SSRE.
Collapse
Affiliation(s)
- Zhenlan Liang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yu Tian
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xuhua Yan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jin Chen
- College of Medicine, University of Kentucky, Lexington, KY 40536, USA
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
14
|
Pasquini G, Rojo Arias JE, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNA-seq data. Comput Struct Biotechnol J 2021; 19:961-969. [PMID: 33613863 PMCID: PMC7873570 DOI: 10.1016/j.csbj.2021.01.015] [Citation(s) in RCA: 84] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 01/13/2021] [Accepted: 01/13/2021] [Indexed: 12/22/2022] Open
Abstract
The advent of single-cell sequencing started a new era of transcriptomic and genomic research, advancing our knowledge of the cellular heterogeneity and dynamics. Cell type annotation is a crucial step in analyzing single-cell RNA sequencing data, yet manual annotation is time-consuming and partially subjective. As an alternative, tools have been developed for automatic cell type identification. Different strategies have emerged to ultimately associate gene expression profiles of single cells with a cell type either by using curated marker gene databases, correlating reference expression data, or transferring labels by supervised classification. In this review, we present an overview of the available tools and the underlying approaches to perform automated cell type annotations on scRNA-seq data.
Collapse
Affiliation(s)
- Giovanni Pasquini
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
- Universitäts-Augenklinik Bonn, University of Bonn, Department of Ophthalmology, Bonn 53127, Germany
| | - Jesus Eduardo Rojo Arias
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
| | - Patrick Schäfer
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
| | - Volker Busskamp
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
- Universitäts-Augenklinik Bonn, University of Bonn, Department of Ophthalmology, Bonn 53127, Germany
| |
Collapse
|
15
|
Li J, Jiang W, Han H, Liu J, Liu B, Wang Y. ScGSLC: An unsupervised graph similarity learning framework for single-cell RNA-seq data clustering. Comput Biol Chem 2020; 90:107415. [PMID: 33307360 DOI: 10.1016/j.compbiolchem.2020.107415] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 09/30/2020] [Accepted: 10/06/2020] [Indexed: 01/18/2023]
Abstract
Accurate clustering of cells from single-cell RNA sequencing (scRNA-seq) data is an essential step for biological analysis such as putative cell type identification. However, scRNA-seq data has high dimension and high sparsity, which makes traditional clustering methods less effective to reflect the similarity between cells. Since genetic network fundamentally defines the functions of cell and deep learning shows strong advantages in network representation learning, we propose a novel scRNA-seq clustering framework ScGSLC based on graph similarity learning. ScGSLC effectively integrates scRNA-seq data and protein-protein interaction network to a graph. Then graph convolution network is employed by ScGSLC to embedding graph and clustering the cells by the calculated similarity between graphs. Unsupervised clustering results of nine public data sets demonstrate that ScGSLC shows better performance than the state-of-the-art methods.
Collapse
Affiliation(s)
- Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China.
| | - Wei Jiang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Henry Han
- Department of Computer and Information Science, Fordham University, New York, NY 10023, USA; School of Computer Science, Qinghai Normal University, Xining 810008, China
| | - Jing Liu
- South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, Guangdong 510530, China
| | - Bo Liu
- Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China; Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang 150001, China.
| |
Collapse
|
16
|
Olah M, Menon V, Habib N, Taga MF, Ma Y, Yung CJ, Cimpean M, Khairallah A, Coronas-Samano G, Sankowski R, Grün D, Kroshilina AA, Dionne D, Sarkis RA, Cosgrove GR, Helgager J, Golden JA, Pennell PB, Prinz M, Vonsattel JPG, Teich AF, Schneider JA, Bennett DA, Regev A, Elyaman W, Bradshaw EM, De Jager PL. Single cell RNA sequencing of human microglia uncovers a subset associated with Alzheimer's disease. Nat Commun 2020; 11:6129. [PMID: 33257666 PMCID: PMC7704703 DOI: 10.1038/s41467-020-19737-2] [Citation(s) in RCA: 336] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2019] [Accepted: 10/26/2020] [Indexed: 01/05/2023] Open
Abstract
The extent of microglial heterogeneity in humans remains a central yet poorly explored question in light of the development of therapies targeting this cell type. Here, we investigate the population structure of live microglia purified from human cerebral cortex samples obtained at autopsy and during neurosurgical procedures. Using single cell RNA sequencing, we find that some subsets are enriched for disease-related genes and RNA signatures. We confirm the presence of four of these microglial subpopulations histologically and illustrate the utility of our data by characterizing further microglial cluster 7, enriched for genes depleted in the cortex of individuals with Alzheimer's disease (AD). Histologically, these cluster 7 microglia are reduced in frequency in AD tissue, and we validate this observation in an independent set of single nucleus data. Thus, our live human microglia identify a range of subtypes, and we prioritize one of these as being altered in AD.
Collapse
Affiliation(s)
- Marta Olah
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Vilas Menon
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Naomi Habib
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
- Edmond & Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Mariko F Taga
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Yiyi Ma
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Christina J Yung
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
| | - Maria Cimpean
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
| | - Anthony Khairallah
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
| | - Guillermo Coronas-Samano
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Roman Sankowski
- Institute of Neuropathology, Medical Faculty, University of Freiburg, Freiburg, Germany
- Berta-Ottenstein-Programme for Clinician Scientists, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Dominic Grün
- Max-Planck-Institute of Immunobiology and Epigenetics, Freiburg, Germany
| | - Alexandra A Kroshilina
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA
| | | | - Rani A Sarkis
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA
| | - Garth R Cosgrove
- Department of Neurosurgery, Brigham and Women's Hospital, Boston, MA, USA
| | - Jeffrey Helgager
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Jeffrey A Golden
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Page B Pennell
- Department of Neurology, Brigham and Women's Hospital, Boston, MA, USA
| | - Marco Prinz
- Institute of Neuropathology, Medical Faculty, University of Freiburg, Freiburg, Germany
- Signaling Research Centers BIOSS and CIBSS, University of Freiburg, Freiburg, Germany
- Center for NeuroModulation, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Jean Paul G Vonsattel
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Andrew F Teich
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Department of Pathology and Cell Biology, Columbia University Medical Center, New York, NY, USA
| | - Julie A Schneider
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - David A Bennett
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Aviv Regev
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Howard Hughes Medical Institute, Department of Biology, MIT, Cambridge, MA, 02140, USA
- Genentech, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - Wassim Elyaman
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Elizabeth M Bradshaw
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA
| | - Philip L De Jager
- Center for Translational and Computational Neuroimmunology, Columbia University Medical Center, New York, NY, USA.
- Taub Institute for Research on Alzheimer's Disease and Aging Brain, Columbia University Medical Center, New York, NY, USA.
- Department of Neurology, Columbia University Medical Center, New York, NY, USA.
- Cell Circuits Program, Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
17
|
Sun YS, Ou-Yang L, Dai DQ. LRSK: a low-rank self-representation K-means method for clustering single-cell RNA-sequencing data. Mol Omics 2020; 16:465-473. [PMID: 32572422 DOI: 10.1039/d0mo00034e] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The development of single-cell RNA-sequencing (scRNA-seq) technologies brings tremendous opportunities for quantitative research and analyses at the cellular level. In particular, as a crucial task of scRNA-seq analysis, single cell clustering shines a light on natural groupings of cells to give new insights into the biological mechanisms and disease studies. However, it remains a challenge to identify cell clusters from lots of cell mixtures effectively and accurately. In this paper, we propose a novel adaptive joint clustering framework, named the low-rank self-representation K-means method (LRSK), to learn the data representation matrix and cluster indicator matrix jointly from scRNA-seq data. Specifically, instead of calculating the similarities among cells from the original data, we seek a low-rank representation of the original data to better reflect the underlying relationships among cells. Moreover, an Augmented Lagrangian Multiplier (ALM) based optimization algorithm is adopted to solve this problem. Experimental results on various scRNA-seq datasets and case studies demonstrate that our method performs better than other state-of-the-art single cell clustering algorithms. The analysis of unlabeled large single-cell liver cancer sequencing data further shows that our prediction results are more reasonable and interpretable.
Collapse
Affiliation(s)
- Ye-Sen Sun
- Intelligent Data Center, School of Mathematics, Sun Yat-sen University, Guangzhou, China.
| | | | | |
Collapse
|
18
|
Xie R, Li J, Wang J, Dai W, Leier A, Marquez-Lago TT, Akutsu T, Lithgow T, Song J, Zhang Y. DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy. Brief Bioinform 2020; 22:5864586. [PMID: 32599617 DOI: 10.1093/bib/bbaa125] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 05/22/2020] [Accepted: 05/22/2020] [Indexed: 12/14/2022] Open
Abstract
Virulence factors (VFs) enable pathogens to infect their hosts. A wealth of individual, disease-focused studies has identified a wide variety of VFs, and the growing mass of bacterial genome sequence data provides an opportunity for computational methods aimed at predicting VFs. Despite their attractive advantages and performance improvements, the existing methods have some limitations and drawbacks. Firstly, as the characteristics and mechanisms of VFs are continually evolving with the emergence of antibiotic resistance, it is more and more difficult to identify novel VFs using existing tools that were previously developed based on the outdated data sets; secondly, few systematic feature engineering efforts have been made to examine the utility of different types of features for model performances, as the majority of tools only focused on extracting very few types of features. By addressing the aforementioned issues, the accuracy of VF predictors can likely be significantly improved. This, in turn, would be particularly useful in the context of genome wide predictions of VFs. In this work, we present a deep learning (DL)-based hybrid framework (termed DeepVF) that is utilizing the stacking strategy to achieve more accurate identification of VFs. Using an enlarged, up-to-date dataset, DeepVF comprehensively explores a wide range of heterogeneous features with popular machine learning algorithms. Specifically, four classical algorithms, including random forest, support vector machines, extreme gradient boosting and multilayer perceptron, and three DL algorithms, including convolutional neural networks, long short-term memory networks and deep neural networks are employed to train 62 baseline models using these features. In order to integrate their individual strengths, DeepVF effectively combines these baseline models to construct the final meta model using the stacking strategy. Extensive benchmarking experiments demonstrate the effectiveness of DeepVF: it achieves a more accurate and stable performance compared with baseline models on the benchmark dataset and clearly outperforms state-of-the-art VF predictors on the independent test. Using the proposed hybrid ensemble model, a user-friendly online predictor of DeepVF (http://deepvf.erc.monash.edu/) is implemented. Furthermore, its utility, from the user's viewpoint, is compared with that of existing toolkits. We believe that DeepVF will be exploited as a useful tool for screening and identifying potential VFs from protein-coding gene sequences in bacterial genomes.
Collapse
Affiliation(s)
- Ruopeng Xie
- Bioinformatics Lab at Guilin University of Electronic Technology
| | - Jiahui Li
- Bioinformatics Lab at Guilin University of Electronic Technology
| | - Jiawei Wang
- Biomedicine Discovery Institute and the Department of Microbiology at Monash University, Australia
| | - Wei Dai
- School of Computer Science and Information Security, Guilin University of Electronic Technology, China
| | - André Leier
- Department of Genetics and the Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | - Tatiana T Marquez-Lago
- Department of Genetics and the Department of Cell, Developmental and Integrative Biology, University of Alabama at Birmingham (UAB) School of Medicine, USA
| | | | - Trevor Lithgow
- Biomedicine Discovery Institute and the Director of the Centre to Impact AMR at Monash University, Australia
| | - Jiangning Song
- Group Leader in the Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Yanju Zhang
- Leiden Institute of Advanced Computer Science, Leiden University
| |
Collapse
|
19
|
Wang H, Sham P, Tong T, Pang H. Pathway-Based Single-Cell RNA-Seq Classification, Clustering, and Construction of Gene-Gene Interactions Networks Using Random Forests. IEEE J Biomed Health Inform 2020; 24:1814-1822. [DOI: 10.1109/jbhi.2019.2944865] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
20
|
Peng L, Tian X, Tian G, Xu J, Huang X, Weng Y, Yang J, Zhou L. Single-cell RNA-seq clustering: datasets, models, and algorithms. RNA Biol 2020; 17:765-783. [PMID: 32116127 PMCID: PMC7549635 DOI: 10.1080/15476286.2020.1728961] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/10/2020] [Accepted: 01/11/2020] [Indexed: 12/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) technologies allow numerous opportunities for revealing novel and potentially unexpected biological discoveries. scRNA-seq clustering helps elucidate cell-to-cell heterogeneity and uncover cell subgroups and cell dynamics at the group level. Two important aspects of scRNA-seq data analysis were introduced and discussed in the present review: relevant datasets and analytical tools. In particular, we reviewed popular scRNA-seq datasets and discussed scRNA-seq clustering models including K-means clustering, hierarchical clustering, consensus clustering, and so on. Seven state-of-the-art scRNA clustering methods were compared on five public available datasets. Two primary evaluation metrics, the Adjusted Rand Index (ARI) and the Normalized Mutual Information (NMI), were used to evaluate these methods. Although unsupervised models can effectively cluster scRNA-seq data, these methods also have challenges. Some suggestions were provided for future research directions.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, Beijing, China
| | - Junlin Xu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xin Huang
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yanbin Weng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | | | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
21
|
Zheng R, Liang Z, Chen X, Tian Y, Cao C, Li M. An Adaptive Sparse Subspace Clustering for Cell Type Identification. Front Genet 2020; 11:407. [PMID: 32425984 PMCID: PMC7212354 DOI: 10.3389/fgene.2020.00407] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 03/31/2020] [Indexed: 01/04/2023] Open
Abstract
The rapid development of single-cell transcriptome sequencing technology has provided us with a cell-level perspective to study biological problems. Identification of cell types is one of the fundamental issues in computational analysis of single-cell data. Due to the large amount of noise from single-cell technologies and high dimension of expression profiles, traditional clustering methods are not so applicable to solve it. To address the problem, we have designed an adaptive sparse subspace clustering method, called AdaptiveSSC, to identify cell types. AdaptiveSSC is based on the assumption that the expression of cells with the same type lies in the same subspace; one cell can be expressed as a linear combination of the other cells. Moreover, it uses a data-driven adaptive sparse constraint to construct the similarity matrix. The comparison results of 10 scRNA-seq datasets show that AdaptiveSSC outperforms original subspace clustering and other state-of-art methods in most cases. Moreover, the learned similarity matrix can also be integrated with a modified t-SNE to obtain an improved visualization result.
Collapse
Affiliation(s)
- Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Zhenlan Liang
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiang Chen
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Yu Tian
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, AB, Canada
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
22
|
Nguyen B, Rubbens P, Kerckhof FM, Boon N, De Baets B, Waegeman W. Learning Single-Cell Distances from Cytometry Data. Cytometry A 2019; 95:782-791. [PMID: 31099963 DOI: 10.1002/cyto.a.23792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/31/2019] [Accepted: 04/23/2019] [Indexed: 12/27/2022]
Abstract
Recent years have seen an increased interest in employing data analysis techniques for the automated identification of cell populations in the field of cytometry. These techniques highly depend on the use of a distance metric, a function that quantifies the distances between single-cell measurements. In most cases, researchers simply use the Euclidean distance metric. In this article, we exploit the availability of single-cell labels to find an optimal Mahalanobis distance metric derived from the data. We show that such a Mahalanobis distance metric results in an improved identification of cell populations compared with the Euclidean distance metric. Once determined, it can be used for the analysis of multiple samples that were measured under the same experimental setup. We illustrate this approach for cytometry data from two different origins, that is, flow cytometry applied to microbial cells and mass cytometry for the analysis of human blood cells. We also illustrate that such a distance metric results in an improved identification of cell populations when clustering methods are employed. Generally, these results imply that the performance of data analysis techniques can be improved by using a more advanced distance metric. © 2019 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Bac Nguyen
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Peter Rubbens
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Frederiek-Maarten Kerckhof
- Center for Microbial Ecology and Technology, Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Nico Boon
- Center for Microbial Ecology and Technology, Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
23
|
Ye W, Ji G, Ye P, Long Y, Xiao X, Li S, Su Y, Wu X. scNPF: an integrative framework assisted by network propagation and network fusion for preprocessing of single-cell RNA-seq data. BMC Genomics 2019; 20:347. [PMID: 31068142 PMCID: PMC6505295 DOI: 10.1186/s12864-019-5747-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 04/29/2019] [Indexed: 12/15/2022] Open
Abstract
Background Single-cell RNA-sequencing (scRNA-seq) is fast becoming a powerful tool for profiling genome-scale transcriptomes of individual cells and capturing transcriptome-wide cell-to-cell variability. However, scRNA-seq technologies suffer from high levels of technical noise and variability, hindering reliable quantification of lowly and moderately expressed genes. Since most downstream analyses on scRNA-seq, such as cell type clustering and differential expression analysis, rely on the gene-cell expression matrix, preprocessing of scRNA-seq data is a critical preliminary step in the analysis of scRNA-seq data. Results We presented scNPF, an integrative scRNA-seq preprocessing framework assisted by network propagation and network fusion, for recovering gene expression loss, correcting gene expression measurements, and learning similarities between cells. scNPF leverages the context-specific topology inherent in the given data and the priori knowledge derived from publicly available molecular gene-gene interaction networks to augment gene-gene relationships in a data driven manner. We have demonstrated the great potential of scNPF in scRNA-seq preprocessing for accurately recovering gene expression values and learning cell similarity networks. Comprehensive evaluation of scNPF across a wide spectrum of scRNA-seq data sets showed that scNPF achieved comparable or higher performance than the competing approaches according to various metrics of internal validation and clustering accuracy. We have made scNPF an easy-to-use R package, which can be used as a versatile preprocessing plug-in for most existing scRNA-seq analysis pipelines or tools. Conclusions scNPF is a universal tool for preprocessing of scRNA-seq data, which jointly incorporates the global topology of priori interaction networks and the context-specific information encapsulated in the scRNA-seq data to capture both shared and complementary knowledge from diverse data sources. scNPF could be used to recover gene signatures and learn cell-to-cell similarities from emerging scRNA-seq data to facilitate downstream analyses such as dimension reduction, cell type clustering, and visualization. Electronic supplementary material The online version of this article (10.1186/s12864-019-5747-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenbin Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China.,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China
| | - Pengchao Ye
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Yuqi Long
- Software Quality Testing Engineering Research Center, China Electronic Product Reliability and Environmental Testing Research Institute, Guangzhou, 510610, China
| | - Xuesong Xiao
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Shuchao Li
- Department of Automation, Xiamen University, Xiamen, 361005, China.,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China
| | - Yaru Su
- College of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350116, China
| | - Xiaohui Wu
- Department of Automation, Xiamen University, Xiamen, 361005, China. .,Xiamen Research Institute of National Center of Healthcare Big Data, Xiamen, China. .,Innovation Center for Cell Biology, Xiamen University, Xiamen, 361005, China.
| |
Collapse
|
24
|
Liu R, Zhang G, Yang Z. Towards rapid prediction of drug-resistant cancer cell phenotypes: single cell mass spectrometry combined with machine learning. Chem Commun (Camb) 2019; 55:616-619. [PMID: 30525135 DOI: 10.1039/c8cc08296k] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Combined single cell mass spectrometry and machine learning methods is demonstrated for the first time to achieve rapid and reliable prediction of the phenotype of unknown single cells based on their metabolomic profiles, with experimental validation. This approach can be potentially applied towards prediction of drug-resistant phenotypes prior to chemotherapy.
Collapse
Affiliation(s)
- Renmeng Liu
- Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73019, USA.
| | | | | |
Collapse
|
25
|
Zhu X, Li HD, Xu Y, Guo L, Wu FX, Duan G, Wang J. A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data. Genes (Basel) 2019; 10:E98. [PMID: 30700040 PMCID: PMC6409843 DOI: 10.3390/genes10020098] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 01/24/2019] [Accepted: 01/25/2019] [Indexed: 02/01/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq . However, there are several limitations to overcome, including high dimensionality, clustering result instability, and parameter adjustment complexity. In this study, we propose a method by combining structure entropy and k nearest neighbor to identify cell subpopulations in scRNA-seq data. In contrast to existing clustering methods for identifying cell subtypes, minimized structure entropy results in natural communities without specifying the number of clusters. To investigate the performance of our model, we applied it to eight scRNA-seq datasets and compared our method with three existing methods (nonnegative matrix factorization, single-cell interpretation via multikernel learning, and structural entropy minimization principle). The experimental results showed that our approach achieves, on average, better performance in these datasets compared to the benchmark methods.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
- School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi 537000, China.
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
| | - Yunpei Xu
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
| | - Lilu Guo
- School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi 537000, China.
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada.
| | - Guihua Duan
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China.
| |
Collapse
|