1
|
Ediriwickrema A, Nakauchi Y, Fan AC, Köhnke T, Hu X, Luca BA, Kim Y, Ramakrishnan S, Nakamoto M, Karigane D, Linde MH, Azizi A, Newman AM, Gentles AJ, Majeti R. A single cell framework identifies functionally and molecularly distinct multipotent progenitors in adult human hematopoiesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.07.592983. [PMID: 38766031 PMCID: PMC11100686 DOI: 10.1101/2024.05.07.592983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Hematopoietic multipotent progenitors (MPPs) regulate blood cell production to appropriately meet the biological demands of the human body. Human MPPs remain ill-defined whereas mouse MPPs have been well characterized with distinct immunophenotypes and lineage potencies. Using multiomic single cell analyses and complementary functional assays, we identified new human MPPs and oligopotent progenitor populations within Lin-CD34+CD38dim/lo adult bone marrow with distinct biomolecular and functional properties. These populations were prospectively isolated based on expression of CD69, CLL1, and CD2 in addition to classical markers like CD90 and CD45RA. We show that within the canonical Lin-CD34+CD38dim/loCD90CD45RA-MPP population, there is a CD69+ MPP with long-term engraftment and multilineage differentiation potential, a CLL1+ myeloid-biased MPP, and a CLL1-CD69-erythroid-biased MPP. We also show that the canonical Lin-CD34+CD38dim/loCD90-CD45RA+ LMPP population can be separated into a CD2+ LMPP with lymphoid and myeloid potential, a CD2-LMPP with high lymphoid potential, and a CLL1+ GMP with minimal lymphoid potential. We used these new HSPC profiles to study human and mouse bone marrow cells and observe limited cell type specific homology between humans and mice and cell type specific changes associated with aging. By identifying and functionally characterizing new adult MPP sub-populations, we provide an updated reference and framework for future studies in human hematopoiesis.
Collapse
|
2
|
Tao W, Yu Z, Han JDJ. Single-cell senescence identification reveals senescence heterogeneity, trajectory, and modulators. Cell Metab 2024; 36:1126-1143.e5. [PMID: 38604170 DOI: 10.1016/j.cmet.2024.03.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 12/15/2023] [Accepted: 03/13/2024] [Indexed: 04/13/2024]
Abstract
Cellular senescence underlies many aging-related pathologies, but its heterogeneity poses challenges for studying and targeting senescent cells. We present here a machine learning program senescent cell identification (SenCID), which accurately identifies senescent cells in both bulk and single-cell transcriptome. Trained on 602 samples from 52 senescence transcriptome datasets spanning 30 cell types, SenCID identifies six major senescence identities (SIDs). Different SIDs exhibit different senescence baselines, stemness, gene functions, and responses to senolytics. SenCID enables the reconstruction of senescent trajectories under normal aging, chronic diseases, and COVID-19. Additionally, when applied to single-cell Perturb-seq data, SenCID helps reveal a hierarchy of senescence modulators. Overall, SenCID is an essential tool for precise single-cell analysis of cellular senescence, enabling targeted interventions against senescent cells.
Collapse
Affiliation(s)
- Wanyu Tao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, China
| | - Zhengqing Yu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, China; Peking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, China.
| |
Collapse
|
3
|
Li C, Ye G, Jiang Y, Wang Z, Yu H, Yang M. Artificial Intelligence in battling infectious diseases: A transformative role. J Med Virol 2024; 96:e29355. [PMID: 38179882 DOI: 10.1002/jmv.29355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/01/2023] [Accepted: 12/17/2023] [Indexed: 01/06/2024]
Abstract
It is widely acknowledged that infectious diseases have wrought immense havoc on human society, being regarded as adversaries from which humanity cannot elude. In recent years, the advancement of Artificial Intelligence (AI) technology has ushered in a revolutionary era in the realm of infectious disease prevention and control. This evolution encompasses early warning of outbreaks, contact tracing, infection diagnosis, drug discovery, and the facilitation of drug design, alongside other facets of epidemic management. This article presents an overview of the utilization of AI systems in the field of infectious diseases, with a specific focus on their role during the COVID-19 pandemic. The article also highlights the contemporary challenges that AI confronts within this domain and posits strategies for their mitigation. There exists an imperative to further harness the potential applications of AI across multiple domains to augment its capacity in effectively addressing future disease outbreaks.
Collapse
Affiliation(s)
- Chunhui Li
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Guoguo Ye
- Shenzhen Key Laboratory of Pathogen and Immunity, National Clinical Research Center for Infectious Disease, The Third People's Hospital of Shenzhen, Second Hospital Affiliated to Southern University of Science and Technology, Shenzhen, China
| | - Yinghan Jiang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Zhiming Wang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| | - Haiyang Yu
- Hangzhou Yalla Information Technology Service Co., Ltd., Hangzhou, People's Republic of China
| | - Minghui Yang
- School of Life Science, Advanced Research Institute of Multidisciplinary Science, Key Laboratory of Molecular Medicine and Biotherapy, Beijing Institute of Technology, Beijing, People's Republic of China
| |
Collapse
|
4
|
HELLER GERWIN, FUEREDER THORSTEN, GRANDITS ALEXANDERMICHAEL, WIESER ROTRAUD. New perspectives on biology, disease progression, and therapy response of head and neck cancer gained from single cell RNA sequencing and spatial transcriptomics. Oncol Res 2023; 32:1-17. [PMID: 38188682 PMCID: PMC10767240 DOI: 10.32604/or.2023.044774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/12/2023] [Indexed: 01/09/2024] Open
Abstract
Head and neck squamous cell carcinoma (HNSCC) is one of the most frequent cancers worldwide. The main risk factors are consumption of tobacco products and alcohol, as well as infection with human papilloma virus. Approved therapeutic options comprise surgery, radiation, chemotherapy, targeted therapy through epidermal growth factor receptor inhibition, and immunotherapy, but outcome has remained unsatisfactory due to recurrence rates of ~50% and the frequent occurrence of second primaries. The availability of the human genome sequence at the beginning of the millennium heralded the omics era, in which rapid technological progress has advanced our knowledge of the molecular biology of malignant diseases, including HNSCC, at an unprecedented pace. Initially, microarray-based methods, followed by approaches based on next-generation sequencing, were applied to study the genetics, epigenetics, and gene expression patterns of bulk tumors. More recently, the advent of single-cell RNA sequencing (scRNAseq) and spatial transcriptomics methods has facilitated the investigation of the heterogeneity between and within different cell populations in the tumor microenvironment (e.g., cancer cells, fibroblasts, immune cells, endothelial cells), led to the discovery of novel cell types, and advanced the discovery of cell-cell communication within tumors. This review provides an overview of scRNAseq, spatial transcriptomics, and the associated bioinformatics methods, and summarizes how their application has promoted our understanding of the emergence, composition, progression, and therapy responsiveness of, and intercellular signaling within, HNSCC.
Collapse
Affiliation(s)
- GERWIN HELLER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | - THORSTEN FUEREDER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
| | | | - ROTRAUD WIESER
- Division of Oncology, Department of Medicine I, Medical University of Vienna, Vienna, 1090, Austria
- Ludwig Boltzmann Institute for Hematology and Oncology, Medical University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
5
|
Fiannaca A, La Rosa M, La Paglia L, Gaglio S, Urso A. GOWDL: gene ontology-driven wide and deep learning model for cell typing of scRNA-seq data. Brief Bioinform 2023; 24:bbad332. [PMID: 37756593 PMCID: PMC10530315 DOI: 10.1093/bib/bbad332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 08/17/2023] [Accepted: 09/04/2023] [Indexed: 09/29/2023] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) allows for obtaining genomic and transcriptomic profiles of individual cells. That data make it possible to characterize tissues at the cell level. In this context, one of the main analyses exploiting scRNA-seq data is identifying the cell types within tissue to estimate the quantitative composition of cell populations. Due to the massive amount of available scRNA-seq data, automatic classification approaches for cell typing, based on the most recent deep learning technology, are needed. Here, we present the gene ontology-driven wide and deep learning (GOWDL) model for classifying cell types in several tissues. GOWDL implements a hybrid architecture that considers the functional annotations found in Gene Ontology and the marker genes typical of specific cell types. We performed cross-validation and independent external testing, comparing our algorithm with 12 other state-of-the-art predictors. Classification scores demonstrated that GOWDL reached the best results over five different tissues, except for recall, where we got about 92% versus 97% of the best tool. Finally, we presented a case study on classifying immune cell populations in breast cancer using a hierarchical approach based on GOWDL.
Collapse
Affiliation(s)
- Antonino Fiannaca
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Massimo La Rosa
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Laura La Paglia
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| | - Salvatore Gaglio
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
- Dipartimento di Ingegneria, Università degli studi di Palermo, Viale Delle Scienze, ed. 6, 90128, Palermo, Italy
| | - Alfonso Urso
- ICAR-CNR, National Research Council of Italy, Via Ugo La Malfa 153, 90146, Palermo, Italy
| |
Collapse
|
6
|
Madadi Y, Monavarfeshani A, Chen H, Stamer WD, Williams RW, Yousefi S. Artificial Intelligence Models for Cell Type and Subtype Identification Based on Single-Cell RNA Sequencing Data in Vision Science. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2837-2852. [PMID: 37294649 PMCID: PMC10631573 DOI: 10.1109/tcbb.2023.3284795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) provides a high throughput, quantitative and unbiased framework for scientists in many research fields to identify and characterize cell types within heterogeneous cell populations from various tissues. However, scRNA-seq based identification of discrete cell-types is still labor intensive and depends on prior molecular knowledge. Artificial intelligence has provided faster, more accurate, and user-friendly approaches for cell-type identification. In this review, we discuss recent advances in cell-type identification methods using artificial intelligence techniques based on single-cell and single-nucleus RNA sequencing data in vision science. The main purpose of this review paper is to assist vision scientists not only to select suitable datasets for their problems, but also to be aware of the appropriate computational tools to perform their analysis. Developing novel methods for scRNA-seq data analysis remains to be addressed in future studies.
Collapse
|
7
|
Jiao L, Wang G, Dai H, Li X, Wang S, Song T. scTransSort: Transformers for Intelligent Annotation of Cell Types by Gene Embeddings. Biomolecules 2023; 13:biom13040611. [PMID: 37189359 DOI: 10.3390/biom13040611] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 03/05/2023] [Accepted: 03/10/2023] [Indexed: 03/31/2023] Open
Abstract
Single-cell transcriptomics is rapidly advancing our understanding of the composition of complex tissues and biological cells, and single-cell RNA sequencing (scRNA-seq) holds great potential for identifying and characterizing the cell composition of complex tissues. Cell type identification by analyzing scRNA-seq data is mostly limited by time-consuming and irreproducible manual annotation. As scRNA-seq technology scales to thousands of cells per experiment, the exponential increase in the number of cell samples makes manual annotation more difficult. On the other hand, the sparsity of gene transcriptome data remains a major challenge. This paper applied the idea of the transformer to single-cell classification tasks based on scRNA-seq data. We propose scTransSort, a cell-type annotation method pretrained with single-cell transcriptomics data. The scTransSort incorporates a method of representing genes as gene expression embedding blocks to reduce the sparsity of data used for cell type identification and reduce the computational complexity. The feature of scTransSort is that its implementation of intelligent information extraction for unordered data, automatically extracting valid features of cell types without the need for manually labeled features and additional references. In experiments on cells from 35 human and 26 mouse tissues, scTransSort successfully elucidated its high accuracy and high performance for cell type identification, and demonstrated its own high robustness and generalization ability.
Collapse
|
8
|
Lee J, Kim M, Kang K, Yang CS, Yoon S. Hierarchical cell-type identifier accurately distinguishes immune-cell subtypes enabling precise profiling of tissue microenvironment with single-cell RNA-sequencing. Brief Bioinform 2023; 24:6995373. [PMID: 36681937 PMCID: PMC10025442 DOI: 10.1093/bib/bbad006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 12/22/2022] [Accepted: 01/02/2023] [Indexed: 01/23/2023] Open
Abstract
Single-cell RNA-seq enabled in-depth study on tissue micro-environment and immune-profiling, where a crucial step is to annotate cell identity. Immune cells play key roles in many diseases, whereas their activities are hard to track due to their diverse and highly variable nature. Existing cell-type identifiers had limited performance for this purpose. We present HiCAT, a hierarchical, marker-based cell-type identifier utilising gene set analysis for statistical scoring for given markers. It features successive identification of major-type, minor-type and subsets utilising subset markers structured in a three-level taxonomy tree. Comparison with manual annotation and pairwise match test showed HiCAT outperforms others in major- and minor-type identification. For subsets, we qualitatively evaluated the marker expression profile demonstrating that HiCAT provide the clearest immune-cell landscape. HiCAT was also used for immune-cell profiling in ulcerative colitis and discovered distinct features of the disease in macrophage and T-cell subsets that could not be identified previously.
Collapse
Affiliation(s)
- Joongho Lee
- Dept. of Computer Science, College of SW Convergence, Dankook University, Yongin-si, Korea, 16890
| | - Minsoo Kim
- Dept. of Computer Science, College of SW Convergence, Dankook University, Yongin-si, Korea, 16890
| | - Keunsoo Kang
- Dept. of Microbiology, College of Natural Sciences, Dankook University, Cheonan-si, Korea, 31116
| | - Chul-Su Yang
- Dept. of Molecular and Life Science, Center for Bionano Intelligence Education and Research, Hanyang University, Ansan, Korea, 15588
| | - Seokhyun Yoon
- Dept. of Electronics & Electrical Eng., College of Engineering, Dankook University, Yongin-si Korea, 16890
| |
Collapse
|
9
|
Wang K, Li Z, You ZH, Han P, Nie R. Adversarial dense graph convolutional networks for single-cell classification. Bioinformatics 2023; 39:6994183. [PMID: 36661313 PMCID: PMC9919433 DOI: 10.1093/bioinformatics/btad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/30/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION In single-cell transcriptomics applications, effective identification of cell types in multicellular organisms and in-depth study of the relationships between genes has become one of the main goals of bioinformatics research. However, data heterogeneity and random noise pose significant difficulties for scRNA-seq data analysis. RESULTS We have proposed an adversarial dense graph convolutional network architecture for single-cell classification. Specifically, to enhance the representation of higher-order features and the organic combination between features, dense connectivity mechanism and attention-based feature aggregation are introduced for feature learning in convolutional neural networks. To preserve the features of the original data, we use a feature reconstruction module to assist the goal of single-cell classification. In addition, HNNVAT uses virtual adversarial training to improve the generalization and robustness. Experimental results show that our model outperforms the existing classical methods in terms of classification accuracy on benchmark datasets. AVAILABILITY AND IMPLEMENTATION The source code of HNNVAT is available at https://github.com/DisscLab/HNNVAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kangwei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhengwei Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
| | - Pengyong Han
- Central Lab, Changzhi Medical College, Changzhi 046000, China
| | - Ru Nie
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|
10
|
Chen J, Xu H, Tao W, Chen Z, Zhao Y, Han JDJ. Transformer for one stop interpretable cell type annotation. Nat Commun 2023; 14:223. [PMID: 36641532 PMCID: PMC9840170 DOI: 10.1038/s41467-023-35923-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/09/2023] [Indexed: 01/15/2023] Open
Abstract
Consistent annotation transfer from reference dataset to query dataset is fundamental to the development and reproducibility of single-cell research. Compared with traditional annotation methods, deep learning based methods are faster and more automated. A series of useful single cell analysis tools based on autoencoder architecture have been developed but these struggle to strike a balance between depth and interpretability. Here, we present TOSICA, a multi-head self-attention deep learning model based on Transformer that enables interpretable cell type annotation using biologically understandable entities, such as pathways or regulons. We show that TOSICA achieves fast and accurate one-stop annotation and batch-insensitive integration while providing biologically interpretable insights for understanding cellular behavior during development and disease progressions. We demonstrate TOSICA's advantages by applying it to scRNA-seq data of tumor-infiltrating immune cells, and CD14+ monocytes in COVID-19 to reveal rare cell types, heterogeneity and dynamic trajectories associated with disease progression and severity.
Collapse
Affiliation(s)
- Jiawei Chen
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Hao Xu
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Wanyu Tao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Zhaoxiong Chen
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Yuxuan Zhao
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| |
Collapse
|
11
|
Wu H, Gonzalez Villalobos R, Yao X, Reilly D, Chen T, Rankin M, Myshkin E, Breyer MD, Humphreys BD. Mapping the single-cell transcriptomic response of murine diabetic kidney disease to therapies. Cell Metab 2022; 34:1064-1078.e6. [PMID: 35709763 PMCID: PMC9262852 DOI: 10.1016/j.cmet.2022.05.010] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 03/21/2022] [Accepted: 05/24/2022] [Indexed: 11/29/2022]
Abstract
Diabetic kidney disease (DKD) occurs in ∼40% of patients with diabetes and causes kidney failure, cardiovascular disease, and premature death. We analyzed the response of a murine DKD model to five treatment regimens using single-cell RNA sequencing (scRNA-seq). Our atlas of ∼1 million cells revealed a heterogeneous response of all kidney cell types both to DKD and its treatment. Both monotherapy and combination therapies targeted differing cell types and induced distinct and non-overlapping transcriptional changes. The early effects of sodium-glucose cotransporter-2 inhibitors (SGLT2i) on the S1 segment of the proximal tubule suggest that this drug class induces fasting mimicry and hypoxia responses. Diabetes downregulated the spliceosome regulator serine/arginine-rich splicing factor 7 (Srsf7) in proximal tubule that was specifically rescued by SGLT2i. In vitro proximal tubule knockdown of Srsf7 induced a pro-inflammatory phenotype, implicating alternative splicing as a driver of DKD and suggesting SGLT2i regulation of proximal tubule alternative splicing as a potential mechanism of action for this drug class.
Collapse
Affiliation(s)
- Haojia Wu
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, MO, USA
| | | | - Xiang Yao
- Tox LJ Janssen Research & Development, La Jolla, CA, USA
| | | | - Tao Chen
- PSTS Janssen Research & Development, Shanghai, China
| | | | | | | | - Benjamin D Humphreys
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, MO, USA; Department of Developmental Biology, Washington University, St. Louis, MO, USA.
| |
Collapse
|
12
|
Yan H, Lee J, Song Q, Li Q, Schiefelbein J, Zhao B, Li S. Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods. THE NEW PHYTOLOGIST 2022; 234:1507-1520. [PMID: 35211979 PMCID: PMC9314150 DOI: 10.1111/nph.18053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 02/06/2022] [Indexed: 05/16/2023]
Abstract
An essential step in the analysis of single-cell RNA sequencing data is to classify cells into specific cell types using marker genes. In this study, we have developed a machine learning pipeline called single-cell predictive marker (SPmarker) to identify novel cell-type marker genes in the Arabidopsis root. Unlike traditional approaches, our method uses interpretable machine learning models to select marker genes. We have demonstrated that our method can: assign cell types based on cells that were labelled using published methods; project cell types identified by trajectory analysis from one data set to other data sets; and assign cell types based on internal GFP markers. Using SPmarker, we have identified hundreds of new marker genes that were not identified before. As compared to known marker genes, the new marker genes have more orthologous genes identifiable in the corresponding rice single-cell clusters. The new root hair marker genes also include 172 genes with orthologs expressed in root hair cells in five non-Arabidopsis species, which expands the number of marker genes for this cell type by 35-154%. Our results represent a new approach to identifying cell-type marker genes from scRNA-seq data and pave the way for cross-species mapping of scRNA-seq data in plants.
Collapse
Affiliation(s)
- Haidong Yan
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - Jiyoung Lee
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| | - Qi Song
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| | - Qi Li
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - John Schiefelbein
- Department of Molecular, Cellular, and Developmental BiologyUniversity of MichiganAnn ArborMI48109USA
| | - Bingyu Zhao
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
| | - Song Li
- School of Plant and Environmental Sciences (SPES)Virginia TechBlacksburgVA24060USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB)Virginia TechBlacksburgVA24060USA
| |
Collapse
|
13
|
Yin Q, Liu Q, Fu Z, Zeng W, Zhang B, Zhang X, Jiang R, Lv H. scGraph: a graph neural network-based approach to automatically identify cell types. Bioinformatics 2022; 38:2996-3003. [PMID: 35394015 DOI: 10.1093/bioinformatics/btac199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/13/2021] [Accepted: 04/07/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Single cell technologies play a crucial role in revolutionizing biological research over the past decade, which strengthens our understanding in cell differentiation, development, and regulation from a single-cell level perspective. Single-cell RNA sequencing (scRNA-seq) is one of the most common single cell technologies, which enables probing transcriptional states in thousands of cells in one experiment. Identification of cell types from scRNA-seq measurements is a fundamental and crucial question to answer. Most previous studies directly take gene expression as input while ignoring the comprehensive gene-gene interactions. RESULTS We propose scGraph, an automatic cell identification algorithm leveraging gene interaction relationships to enhance the performance of the cell type identification. ScGraph is based on a graph neural network to aggregate the information of interacting genes. In a series of experiments, we demonstrate that scGraph is accurate and outperforms eight comparison methods in the task of cell type identification. Moreover, scGraph automatically learns the gene interaction relationships from biological data and the pathway enrichment analysis shows consistent findings with previous analysis, providing insights on the analysis of regulatory mechanism. AVAILABILITY scGraph is freely available at https://github.com/QijinYin/scGraph and https://figshare.com/articles/software/scGraph/17157743. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qijin Yin
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Qiao Liu
- Department of Statistics, Stanford University Stanford, CA 94305
| | - Zhuoran Fu
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wanwen Zeng
- Department of Statistics, Stanford University Stanford, CA 94305.,College of Software, Nankai University, Tianjin, 300350, China
| | - Boheng Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rui Jiang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Hairong Lv
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China.,Fuzhou Institute of Data Technology, Changle, Fuzhou, 350200, China
| |
Collapse
|
14
|
Cao X, Xing L, Majd E, He H, Gu J, Zhang X. A Systematic Evaluation of Supervised Machine Learning Algorithms for Cell Phenotype Classification Using Single-Cell RNA Sequencing Data. Front Genet 2022; 13:836798. [PMID: 35281805 PMCID: PMC8905542 DOI: 10.3389/fgene.2022.836798] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
The new technology of single-cell RNA sequencing (scRNA-seq) can yield valuable insights into gene expression and give critical information about the cellular compositions of complex tissues. In recent years, vast numbers of scRNA-seq datasets have been generated and made publicly available, and this has enabled researchers to train supervised machine learning models for predicting or classifying various cell-level phenotypes. This has led to the development of many new methods for analyzing scRNA-seq data. Despite the popularity of such applications, there has as yet been no systematic investigation of the performance of these supervised algorithms using predictors from various sizes of scRNA-seq datasets. In this study, 13 popular supervised machine learning algorithms for cell phenotype classification were evaluated using published real and simulated datasets with diverse cell sizes. This benchmark comprises two parts. In the first, real datasets were used to assess the computing speed and cell phenotype classification performance of popular supervised algorithms. The classification performances were evaluated using the area under the receiver operating characteristic curve, F1-score, Precision, Recall, and false-positive rate. In the second part, we evaluated gene-selection performance using published simulated datasets with a known list of real genes. The results showed that ElasticNet with interactions performed the best for small and medium-sized datasets. The NaiveBayes classifier was found to be another appropriate method for medium-sized datasets. With large datasets, the performance of the XGBoost algorithm was found to be excellent. Ensemble algorithms were not found to be significantly superior to individual machine learning methods. Including interactions in the ElasticNet algorithm caused a significant performance improvement for small datasets. The linear discriminant analysis algorithm was found to be the best choice when speed is critical; it is the fastest method, it can scale to handle large sample sizes, and its performance is not much worse than the top performers.
Collapse
Affiliation(s)
- Xiaowen Cao
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China.,Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Li Xing
- Department of Mathematics and Statistics, University of Saskatchewan, Saskatoon, SK, Canada
| | - Elham Majd
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| | - Hua He
- School of Science, Hebei University of Technology, Tianjin, China
| | - Junhua Gu
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Xuekui Zhang
- Department of Mathematics and Statistics, University of Victoria, Victoria, BC, Canada
| |
Collapse
|
15
|
Nguyen V, Griss J. scAnnotatR: framework to accurately classify cell types in single-cell RNA-sequencing data. BMC Bioinformatics 2022; 23:44. [PMID: 35038984 PMCID: PMC8762856 DOI: 10.1186/s12859-022-04574-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 01/11/2022] [Indexed: 12/02/2022] Open
Abstract
Background Automatic cell type identification is essential to alleviate a key bottleneck in scRNA-seq data analysis. While most existing classification tools show good sensitivity and specificity, they often fail to adequately not-classify cells that are missing in the used reference. Additionally, many tools do not scale to the continuously increasing size of current scRNA-seq datasets. Therefore, additional tools are needed to solve these challenges. Results scAnnotatR is a novel R package that provides a complete framework to classify cells in scRNA-seq datasets using pre-trained classifiers. It supports both Seurat and Bioconductor’s SingleCellExperiment and is thereby compatible with the vast majority of R-based analysis workflows. scAnnotatR uses hierarchically organised SVMs to distinguish a specific cell type versus all others. It shows comparable or even superior accuracy, sensitivity and specificity compared to existing tools while being able to not-classify unknown cell types. Moreover, scAnnotatR is the only of the best performing tools able to process datasets containing more than 600,000 cells. Conclusions scAnnotatR is freely available on GitHub (https://github.com/grisslab/scAnnotatR) and through Bioconductor (from version 3.14). It is consistently among the best performing tools in terms of classification accuracy while scaling to the largest datasets. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04574-5.
Collapse
Affiliation(s)
- Vy Nguyen
- Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria
| | - Johannes Griss
- Department of Dermatology, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria.
| |
Collapse
|
16
|
Zeng Y, Wei Z, Pan Z, Lu Y, Yang Y. A robust and scalable graph neural network for accurate single-cell classification. Brief Bioinform 2022; 23:6501353. [PMID: 35018408 DOI: 10.1093/bib/bbab570] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/01/2021] [Accepted: 12/11/2021] [Indexed: 12/25/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), but traditional GNNs are difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single-cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabeled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity over cell numbers. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, cross-species and cross-omics scRNA-seq datasets. More importantly, our model provides a high speed and scalability on large datasets, and can achieve superior performance for 1 million cells within 50 min.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhuoyi Wei
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zixiang Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou 510000, China
| |
Collapse
|
17
|
Zhang Y, Zhang F, Wang Z, Wu S, Tian W. scMAGIC: accurately annotating single cells using two rounds of reference-based classification. Nucleic Acids Res 2022; 50:e43. [PMID: 34986249 PMCID: PMC9071478 DOI: 10.1093/nar/gkab1275] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 11/08/2021] [Accepted: 12/14/2021] [Indexed: 11/21/2022] Open
Abstract
Here, we introduce scMAGIC (Single Cell annotation using MArker Genes Identification and two rounds of reference-based Classification [RBC]), a novel method that uses well-annotated single-cell RNA sequencing (scRNA-seq) data as the reference to assist in the classification of query scRNA-seq data. A key innovation in scMAGIC is the introduction of a second-round RBC in which those query cells whose cell identities are confidently validated in the first round are used as a new reference to again classify query cells, therefore eliminating the batch effects between the reference and the query data. scMAGIC significantly outperforms 13 competing RBC methods with their optimal parameter settings across 86 benchmark tests, especially when the cell types in the query dataset are not completely covered by the reference dataset and when there exist significant batch effects between the reference and the query datasets. Moreover, when no reference dataset is available, scMAGIC can annotate query cells with reasonably high accuracy by using an atlas dataset as the reference.
Collapse
Affiliation(s)
- Yu Zhang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Feng Zhang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China.,Department of Histoembryology, Genetics and Developmental Biology, Shanghai Key Laboratory of Reproductive Medicine, Key Laboratory of Cell Differentiation and Apoptosis of Chinese Ministry of Education, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Zekun Wang
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Siyi Wu
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering, Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai 200438, P.R. China.,Qilu Children's Hospital of Shandong University, No 23976 Jingshi Road, Jinan, Shandong, China.,Children's Hospital of Fudan University, Shanghai 201102, China
| |
Collapse
|
18
|
Song Q, Liu L. Single-Cell RNA-Seq Technologies and Computational Analysis Tools: Application in Cancer Research. Methods Mol Biol 2022; 2413:245-255. [PMID: 35044670 DOI: 10.1007/978-1-0716-1896-7_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The recent maturation of single-cell RNA sequencing (scRNA-seq) provides unique opportunities for researchers to uncover new and potentially unexpected biological discoveries and to understand the complexity of tissues by transcriptomic profiling in individual cells. This review introduces the latest scRNA-seq techniques and platforms as well as their advantages and disadvantages. Moreover, we review computational tools and pipelines for analyzing scRNA-seq data, and their applications in cancer research, highlighting the important role of scRNA-seq techniques in this area.
Collapse
Affiliation(s)
- Qianqian Song
- Department of Cancer Biology, Wake Forest Baptist Comprehensive Cancer Center, Winston-Salem, NC, USA
| | - Liang Liu
- Department of Cancer Biology, Wake Forest Baptist Comprehensive Cancer Center, Winston-Salem, NC, USA.
- Center for Cancer Genomics and Precision Oncology, Wake Forest Baptist Comprehensive Cancer Center, Winston-Salem, NC, USA.
| |
Collapse
|
19
|
Yin Q, Wang Y, Guan J, Ji G. scIAE: an integrative autoencoder-based ensemble classification framework for single-cell RNA-seq data. Brief Bioinform 2021; 23:6463428. [PMID: 34913057 DOI: 10.1093/bib/bbab508] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Revised: 10/28/2021] [Accepted: 11/04/2021] [Indexed: 12/12/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) allows quantitative analysis of gene expression at the level of single cells, beneficial to study cell heterogeneity. The recognition of cell types facilitates the construction of cell atlas in complex tissues or organisms, which is the basis of almost all downstream scRNA-seq data analyses. Using disease-related scRNA-seq data to perform the prediction of disease status can facilitate the specific diagnosis and personalized treatment of disease. Since single-cell gene expression data are high-dimensional and sparse with dropouts, we propose scIAE, an integrative autoencoder-based ensemble classification framework, to firstly perform multiple random projections and apply integrative and devisable autoencoders (integrating stacked, denoising and sparse autoencoders) to obtain compressed representations. Then base classifiers are built on the lower-dimensional representations and the predictions from all base models are integrated. The comparison of scIAE and common feature extraction methods shows that scIAE is effective and robust, independent of the choice of dimension, which is beneficial to subsequent cell classification. By testing scIAE on different types of data and comparing it with existing general and single-cell-specific classification methods, it is proven that scIAE has a great classification power in cell type annotation intradataset, across batches, across platforms and across species, and also disease status prediction. The architecture of scIAE is flexible and devisable, and it is available at https://github.com/JGuan-lab/scIAE.
Collapse
Affiliation(s)
- Qingyang Yin
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China.,Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | - Yang Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| | - Guoli Ji
- Department of Automation, Xiamen University, Xiamen, Fujian 361102, China.,National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361102, China
| |
Collapse
|
20
|
Chambers B, Shah I. Evaluating adaptive stress response gene signatures using transcriptomics. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2021; 20:1-9. [PMID: 37829472 PMCID: PMC10569130 DOI: 10.1016/j.comtox.2021.100179] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Stress response pathways (SRPs) mitigate the cellular effects of chemicals, but excessive perturbation can lead to adverse outcomes. Here, we investigated a computational approach to evaluate SRP activity from transcriptomic data using gene set enrichment analysis (GSEA). We extracted published gene signatures for DNA damage response (DDR), unfolded protein response (UPR), heat shock response (HSR), response to hypoxia (HPX), metal-associated response (MTL), and oxidative stress response (OSR) from the Molecular Signatures Database (MSigDB). Next, we used a gene-frequency approach to build consensus SRP signatures of varying lengths from 50 to 477 genes. We then prepared a reference dataset from perturbagens associated with SRPs from the literature with their transcriptomic profiles retrieved from public repositories. Lastly, we used receiver-operator characteristic analysis to evaluate the GSEA scores from matching transcriptomic reference profiles to SRP signatures. Our consensus signatures performed better than or as well as published signatures for 4 out of the 6 SRPs, with the best consensus signature area under the curve (% performance relative to median of published signatures) of 1.00 for DDR (109%), 0.86 for UPR (169%), 0.99 for HTS (103%), 1.00 for HPX (104%), 0.74 for MTL (150%) and 0.83 for OSR (148%). The best matches between transcriptomic profiles and SRP signatures correctly classified perturbagens in 78% and 88% of the cases by first and second rank, respectively. We believe this approach can characterize SRP activity for new chemicals using transcriptomics with further evaluation.
Collapse
Affiliation(s)
- Bryant Chambers
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| | - Imran Shah
- Center for Computational Toxicology and Exposure, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA
| |
Collapse
|
21
|
Xie B, Jiang Q, Mora A, Li X. Automatic cell type identification methods for single-cell RNA sequencing. Comput Struct Biotechnol J 2021; 19:5874-5887. [PMID: 34815832 PMCID: PMC8572862 DOI: 10.1016/j.csbj.2021.10.027] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 09/23/2021] [Accepted: 10/18/2021] [Indexed: 11/24/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a powerful tool for scientists of many research disciplines due to its ability to elucidate the heterogeneous and complex cell-type compositions of different tissues and cell populations. Traditional cell-type identification methods for scRNA-seq data analysis are time-consuming and knowledge-dependent for manual annotation. By contrast, automatic cell-type identification methods may have the advantages of being fast, accurate, and more user friendly. Here, we discuss and evaluate thirty-two published automatic methods for scRNA-seq data analysis in terms of their prediction accuracy, F1-score, unlabeling rate and running time. We highlight the advantages and disadvantages of these methods and provide recommendations of method choice depending on the available information. The challenges and future applications of these automatic methods are further discussed. In addition, we provide a free scRNA-seq data analysis package encompassing the discussed automatic methods to help the easy usage of them in real-world applications.
Collapse
Affiliation(s)
- Bingbing Xie
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou 510060, Guangdong, China
| | - Qin Jiang
- Affiliated Eye Hospital of Nanjing Medical University, Nanjing, China
| | - Antonio Mora
- Joint School of Life Sciences, Guangzhou Medical University and Guangzhou Institutes of Biomedicine and Health (Chinese Academy of Sciences), Xinzao, Panyu District, Guangzhou 511436, Guangdong, China
| | - Xuri Li
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou 510060, Guangdong, China
| |
Collapse
|
22
|
Cortal A, Martignetti L, Six E, Rausell A. Gene signature extraction and cell identity recognition at the single-cell level with Cell-ID. Nat Biotechnol 2021; 39:1095-1102. [PMID: 33927417 DOI: 10.1038/s41587-021-00896-6] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 03/15/2021] [Indexed: 02/08/2023]
Abstract
Because of the stochasticity associated with high-throughput single-cell sequencing, current methods for exploring cell-type diversity rely on clustering-based computational approaches in which heterogeneity is characterized at cell subpopulation rather than at full single-cell resolution. Here we present Cell-ID, a clustering-free multivariate statistical method for the robust extraction of per-cell gene signatures from single-cell sequencing data. We applied Cell-ID to data from multiple human and mouse samples, including blood cells, pancreatic islets and airway, intestinal and olfactory epithelium, as well as to comprehensive mouse cell atlas datasets. We demonstrate that Cell-ID signatures are reproducible across different donors, tissues of origin, species and single-cell omics technologies, and can be used for automatic cell-type annotation and cell matching across datasets. Cell-ID improves biological interpretation at individual cell level, enabling discovery of previously uncharacterized rare cell types or cell states. Cell-ID is distributed as an open-source R software package.
Collapse
Affiliation(s)
- Akira Cortal
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Loredana Martignetti
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Emmanuelle Six
- Laboratory of Human Lymphohematopoiesis, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France
| | - Antonio Rausell
- Clinical Bioinformatics Laboratory, Université de Paris, INSERM UMR1163, Imagine Institute, Paris, France. .,Molecular Genetics Service, AP-HP, Necker Hospital for Sick Children, Paris, France.
| |
Collapse
|
23
|
Shi Q, Li X, Peng Q, Zhang C, Chen L. scDA: Single cell discriminant analysis for single-cell RNA sequencing data. Comput Struct Biotechnol J 2021; 19:3234-3244. [PMID: 34141142 PMCID: PMC8187165 DOI: 10.1016/j.csbj.2021.05.046] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 05/25/2021] [Accepted: 05/25/2021] [Indexed: 11/30/2022] Open
Abstract
Cell-to-Cell representation graph could be constructed. Cell groups and Discriminant metagenes could be identified simultaneously. scDA less sensitive to drop-out events and capable to label a mass of cells after learning even from a small set of data. scDA can avoid unnecessary re-clustering, and is actually a combinational approach simultaneously performing both clustering and classification.
Single-cell RNA-sequencing (scRNA-seq) techniques provide unprecedented opportunities to investigate phenotypic and molecular heterogeneity in complex biological systems. However, profiling massive amounts of cells brings great computational challenges to accurately and efficiently characterize diverse cell populations. Single cell discriminant analysis (scDA) solves this problem by simultaneously identifying cell groups and discriminant metagenes based on the construction of cell-by-cell representation graph, and then using them to annotate unlabeled cells in data. We demonstrate scDA is effective to determine cell types, revealing the overall variabilities between cells from eleven data sets. scDA also outperforms several state-of-the-art methods when inferring the labels of new samples. In particular, we found scDA less sensitive to drop-out events and capable to label a mass of cells within or across datasets after learning even from a small set of data. The scDA approach offers a new way to efficiently analyze scRNA-seq profiles of large size or from different batches. scDA was implemented and freely available at https://github.com/ZCCQQWork/scDA.
Collapse
Affiliation(s)
- Qianqian Shi
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Xinxing Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Qirui Peng
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Chuanchao Zhang
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China
| | - Luonan Chen
- Key Laboratory of Systems Health Science of Zhejiang Province, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Hangzhou 310024, China.,State Key Laboratory of Cell Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai 200031, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
24
|
Duan B, Chen S, Chen X, Zhu C, Tang C, Wang S, Gao Y, Fu S, Liu Q. Integrating multiple references for single-cell assignment. Nucleic Acids Res 2021; 49:e80. [PMID: 34037791 PMCID: PMC8373058 DOI: 10.1093/nar/gkab380] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 04/13/2021] [Accepted: 04/27/2021] [Indexed: 01/09/2023] Open
Abstract
Efficient single-cell assignment is essential for single-cell sequencing data analysis. With the explosive growth of single-cell sequencing data, multiple single-cell sequencing data sources are available for the same kind of tissue, which can be integrated to further improve single-cell assignment; however, an efficient integration strategy is still lacking due to the great challenges of data heterogeneity existing in multiple references. To this end, we present mtSC, a flexible single-cell assignment framework that integrates multiple references based on multitask deep metric learning designed specifically for cell type identification within tissues with multiple single-cell sequencing data as references. We evaluated mtSC on a comprehensive set of publicly available benchmark datasets and demonstrated its state-of-the-art effectiveness for integrative single-cell assignment with multiple references.
Collapse
Affiliation(s)
- Bin Duan
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaoqi Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiaohan Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chenyu Zhu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chen Tang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shuguang Wang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Yicheng Gao
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaliu Fu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| |
Collapse
|
25
|
Michielsen L, Reinders MJT, Mahfouz A. Hierarchical progressive learning of cell identities in single-cell data. Nat Commun 2021; 12:2799. [PMID: 33990598 PMCID: PMC8121839 DOI: 10.1038/s41467-021-23196-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 04/16/2021] [Indexed: 12/11/2022] Open
Abstract
Supervised methods are increasingly used to identify cell populations in single-cell data. Yet, current methods are limited in their ability to learn from multiple datasets simultaneously, are hampered by the annotation of datasets at different resolutions, and do not preserve annotations when retrained on new datasets. The latter point is especially important as researchers cannot rely on downstream analysis performed using earlier versions of the dataset. Here, we present scHPL, a hierarchical progressive learning method which allows continuous learning from single-cell data by leveraging the different resolutions of annotations across multiple datasets to learn and continuously update a classification tree. We evaluate the classification and tree learning performance using simulated as well as real datasets and show that scHPL can successfully learn known cellular hierarchies from multiple datasets while preserving the original annotations. scHPL is available at https://github.com/lcmmichielsen/scHPL .
Collapse
Affiliation(s)
- Lieke Michielsen
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
| | - Marcel J T Reinders
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
| | - Ahmed Mahfouz
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands.
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands.
| |
Collapse
|
26
|
Huang Q, Liu Y, Du Y, Garmire LX. Evaluation of Cell Type Annotation R Packages on Single-cell RNA-seq Data. GENOMICS, PROTEOMICS & BIOINFORMATICS 2021; 19:267-281. [PMID: 33359678 PMCID: PMC8602772 DOI: 10.1016/j.gpb.2020.07.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 07/16/2020] [Accepted: 10/27/2020] [Indexed: 01/13/2023]
Abstract
Annotating cell types is a critical step in single-cell RNA sequencing (scRNA-seq) data analysis. Some supervised or semi-supervised classification methods have recently emerged to enable automated cell type identification. However, comprehensive evaluations of these methods are lacking. Moreover, it is not clear whether some classification methods originally designed for analyzing other bulk omics data are adaptable to scRNA-seq analysis. In this study, we evaluated ten cell type annotation methods publicly available as R packages. Eight of them are popular methods developed specifically for single-cell research, including Seurat, scmap, SingleR, CHETAH, SingleCellNet, scID, Garnett, and SCINA. The other two methods were repurposed from deconvoluting DNA methylation data, i.e., linear constrained projection (CP) and robust partial correlations (RPC). We conducted systematic comparisons on a wide variety of public scRNA-seq datasets as well as simulation data. We assessed the accuracy through intra-dataset and inter-dataset predictions; the robustness over practical challenges such as gene filtering, high similarity among cell types, and increased cell type classes; as well as the detection of rare and unknown cell types. Overall, methods such as Seurat, SingleR, CP, RPC, and SingleCellNet performed well, with Seurat being the best at annotating major cell types. Additionally, Seurat, SingleR, CP, and RPC were more robust against downsampling. However, Seurat did have a major drawback at predicting rare cell populations, and it was suboptimal at differentiating cell types highly similar to each other, compared to SingleR and RPC. All the code and data are available from https://github.com/qianhuiSenn/scRNA_cell_deconv_benchmark.
Collapse
Affiliation(s)
- Qianhui Huang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yu Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48105, USA
| | - Yuheng Du
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48105, USA.
| |
Collapse
|
27
|
Maseda F, Cang Z, Nie Q. DEEPsc: A Deep Learning-Based Map Connecting Single-Cell Transcriptomics and Spatial Imaging Data. Front Genet 2021; 12:636743. [PMID: 33833776 PMCID: PMC8021700 DOI: 10.3389/fgene.2021.636743] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 02/23/2021] [Indexed: 11/13/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data provides unprecedented information on cell fate decisions; however, the spatial arrangement of cells is often lost. Several recent computational methods have been developed to impute spatial information onto a scRNA-seq dataset through analyzing known spatial expression patterns of a small subset of genes known as a reference atlas. However, there is a lack of comprehensive analysis of the accuracy, precision, and robustness of the mappings, along with the generalizability of these methods, which are often designed for specific systems. We present a system-adaptive deep learning-based method (DEEPsc) to impute spatial information onto a scRNA-seq dataset from a given spatial reference atlas. By introducing a comprehensive set of metrics that evaluate the spatial mapping methods, we compare DEEPsc with four existing methods on four biological systems. We find that while DEEPsc has comparable accuracy to other methods, an improved balance between precision and robustness is achieved. DEEPsc provides a data-adaptive tool to connect scRNA-seq datasets and spatial imaging datasets to analyze cell fate decisions. Our implementation with a uniform API can serve as a portal with access to all the methods investigated in this work for spatial exploration of cell fate decisions in scRNA-seq data. All methods evaluated in this work are implemented as an open-source software with a uniform interface.
Collapse
Affiliation(s)
- Floyd Maseda
- Department of Mathematics, University of California, Irvine, Irvine, CA, United States
| | - Zixuan Cang
- Department of Mathematics, University of California, Irvine, Irvine, CA, United States
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, United States
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA, United States
- The NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, Irvine, CA, United States
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
28
|
Pasquini G, Rojo Arias JE, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNA-seq data. Comput Struct Biotechnol J 2021; 19:961-969. [PMID: 33613863 PMCID: PMC7873570 DOI: 10.1016/j.csbj.2021.01.015] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 01/13/2021] [Accepted: 01/13/2021] [Indexed: 12/22/2022] Open
Abstract
The advent of single-cell sequencing started a new era of transcriptomic and genomic research, advancing our knowledge of the cellular heterogeneity and dynamics. Cell type annotation is a crucial step in analyzing single-cell RNA sequencing data, yet manual annotation is time-consuming and partially subjective. As an alternative, tools have been developed for automatic cell type identification. Different strategies have emerged to ultimately associate gene expression profiles of single cells with a cell type either by using curated marker gene databases, correlating reference expression data, or transferring labels by supervised classification. In this review, we present an overview of the available tools and the underlying approaches to perform automated cell type annotations on scRNA-seq data.
Collapse
Affiliation(s)
- Giovanni Pasquini
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
- Universitäts-Augenklinik Bonn, University of Bonn, Department of Ophthalmology, Bonn 53127, Germany
| | - Jesus Eduardo Rojo Arias
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK
| | - Patrick Schäfer
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
| | - Volker Busskamp
- Technische Universität Dresden, Center for Molecular and Cellular Bioengineering (CMCB), Center for Regenerative Therapies Dresden (CRTD), Dresden 01307, Germany
- Universitäts-Augenklinik Bonn, University of Bonn, Department of Ophthalmology, Bonn 53127, Germany
| |
Collapse
|
29
|
Cahan P, Cacchiarelli D, Dunn SJ, Hemberg M, de Sousa Lopes SMC, Morris SA, Rackham OJL, Del Sol A, Wells CA. Computational Stem Cell Biology: Open Questions and Guiding Principles. Cell Stem Cell 2021; 28:20-32. [PMID: 33417869 PMCID: PMC7799393 DOI: 10.1016/j.stem.2020.12.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computational biology is enabling an explosive growth in our understanding of stem cells and our ability to use them for disease modeling, regenerative medicine, and drug discovery. We discuss four topics that exemplify applications of computation to stem cell biology: cell typing, lineage tracing, trajectory inference, and regulatory networks. We use these examples to articulate principles that have guided computational biology broadly and call for renewed attention to these principles as computation becomes increasingly important in stem cell biology. We also discuss important challenges for this field with the hope that it will inspire more to join this exciting area.
Collapse
Affiliation(s)
- Patrick Cahan
- Institute for Cell Engineering, Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins School of Medicine, Baltimore, MD 21205, USA.
| | - Davide Cacchiarelli
- Telethon Institute of Genetics and Medicine (TIGEM), Armenise/Harvard Laboratory of Integrative Genomics, Pozzuoli, Italy d Department of Translational Medicine, University of Naples "Federico II," Naples, Italy
| | - Sara-Jane Dunn
- DeepMind, 14-18 Handyside Street, London N1C 4DN, UK; Wellcome-MRC Cambridge Stem Cell Institute, University of Cambridge, Jeffrey Cheah Biomedical Centre, Puddicombe Way, Cambridge Biomedical Campus, Cambridge CB2 0AW, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | | | - Samantha A Morris
- Department of Developmental Biology, Department of Genetics, Center of Regenerative Medicine, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Owen J L Rackham
- Centre for Computational Biology and The Program for Cardiovascular and Metabolic Disorders, Duke-NUS Medical School, Singapore, Singapore
| | - Antonio Del Sol
- Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, 6 Avenue du Swing, Belvaux 4366, Luxembourg; CIC bioGUNE, Bizkaia Technology Park, 801 Building, 48160 Derio, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao 48013, Spain
| | - Christine A Wells
- Centre for Stem Cell Systems, Faculty of Medicine, Dentistry and Health Sciences, The University of Melbourne, Melbourne, VIC 3010, Australia
| |
Collapse
|
30
|
Boufea K, Gonzalez-Huici V, Lindberg M, Olova NN, Symeonides S, Oikonomidou O, Batada NN. Single-cell RNA sequencing of human breast tumour-infiltrating immune cells reveals a γδ T-cell subtype associated with good clinical outcome. Life Sci Alliance 2020; 4:4/1/e202000680. [PMID: 33268347 PMCID: PMC7723295 DOI: 10.26508/lsa.202000680] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 01/10/2023] Open
Abstract
The association of increased levels of tumour-infiltrating gamma-delta (γδ) T cells with favorable prognosis across many cancer types and their ability to recognize stress antigens in an MHC unrestricted manner has led to an increased interest in exploiting them for cancer immunotherapy. We performed single-cell RNA sequencing (scRNA-seq) of peripheral blood γδ T cells from healthy adult donors and from fresh tumour biopsies of breast cancer patients. We identified five γδ T cells subtypes in blood and three subtypes of γδ T cells in breast tumour. These subtypes differed in the expression of genes contributing to effector functions such as antigen presentation, cytotoxicity, and IL17A and IFNγ production. Compared with the blood γδ T cells, the breast tumour-infiltrating γδ T cells were more activated, expressed higher levels of cytotoxic genes, yet were immunosuppressed. One subtype in the breast tumour that was IFNγ-positive had no obvious similarity to any of the subtypes observed in the blood γδ T cell and was the only subtype associated with improved overall survival of breast cancer patients. Taken together, our study has identified markers of subtypes of human blood γδ T cells and uncovered a tumour-infiltrating γδ T cells subtype associated improved overall cancer survival.
Collapse
Affiliation(s)
- Katerina Boufea
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, Scotland
| | - Victor Gonzalez-Huici
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, Scotland
| | - Marcus Lindberg
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, Scotland
| | - Nelly N Olova
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh, Scotland
| | - Stefan Symeonides
- Cancer Research UK Edinburgh Centre, University of Edinburgh, Western General Hospital, Edinburgh, Scotland
| | - Olga Oikonomidou
- Cancer Research UK Edinburgh Centre, University of Edinburgh, Western General Hospital, Edinburgh, Scotland
| | - Nizar N Batada
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Edinburgh, Scotland,Correspondence:
| |
Collapse
|
31
|
Duan B, Zhu C, Chuai G, Tang C, Chen X, Chen S, Fu S, Li G, Liu Q. Learning for single-cell assignment. SCIENCE ADVANCES 2020; 6:6/44/eabd0855. [PMID: 33127686 PMCID: PMC7608777 DOI: 10.1126/sciadv.abd0855] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/15/2020] [Indexed: 06/11/2023]
Abstract
Efficient single-cell assignment without prior marker gene annotations is essential for single-cell sequencing data analysis. Current methods, however, have limited effectiveness for distinct single-cell assignment. They failed to achieve a well-generalized performance in different tasks because of the inherent heterogeneity of different single-cell sequencing datasets and different single-cell types. Furthermore, current methods are inefficient to identify novel cell types that are absent in the reference datasets. To this end, we present scLearn, a learning-based framework that automatically infers quantitative measurement/similarity and threshold that can be used for different single-cell assignment tasks, achieving a well-generalized assignment performance on different single-cell types. We evaluated scLearn on a comprehensive set of publicly available benchmark datasets. We proved that scLearn outperformed the comparable existing methods for single-cell assignment from various aspects, demonstrating state-of-the-art effectiveness with a reliable and generalized single-cell type identification and categorizing ability.
Collapse
Affiliation(s)
- Bin Duan
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chenyu Zhu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Guohui Chuai
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Chen Tang
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Xiaohan Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaoqi Chen
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shaliu Fu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Gaoyang Li
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Qi Liu
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.
| |
Collapse
|