1
|
Alsaggaf I, Buchan D, Wan C. Improving cell type identification with Gaussian noise-augmented single-cell RNA-seq contrastive learning. Brief Funct Genomics 2024; 23:441-451. [PMID: 38242863 DOI: 10.1093/bfgp/elad059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 12/14/2023] [Accepted: 12/18/2023] [Indexed: 01/21/2024] Open
Abstract
Cell type identification is an important task for single-cell RNA-sequencing (scRNA-seq) data analysis. Many prediction methods have recently been proposed, but the predictive accuracy of difficult cell type identification tasks is still low. In this work, we proposed a novel Gaussian noise augmentation-based scRNA-seq contrastive learning method (GsRCL) to learn a type of discriminative feature representations for cell type identification tasks. A large-scale computational evaluation suggests that GsRCL successfully outperformed other state-of-the-art predictive methods on difficult cell type identification tasks, while the conventional random genes masking augmentation-based contrastive learning method also improved the accuracy of easy cell type identification tasks in general.
Collapse
Affiliation(s)
- Ibrahim Alsaggaf
- School of Computing and Mathematical Sciences, Birkbeck, University of London, Malet Street, WC1E 7HX, London, United Kingdom
| | - Daniel Buchan
- Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, United Kingdom
| | - Cen Wan
- School of Computing and Mathematical Sciences, Birkbeck, University of London, Malet Street, WC1E 7HX, London, United Kingdom
| |
Collapse
|
2
|
Gondal MN, Shah SUR, Chinnaiyan AM, Cieslik M. A systematic overview of single-cell transcriptomics databases, their use cases, and limitations. FRONTIERS IN BIOINFORMATICS 2024; 4:1417428. [PMID: 39040140 PMCID: PMC11260681 DOI: 10.3389/fbinf.2024.1417428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Accepted: 06/11/2024] [Indexed: 07/24/2024] Open
Abstract
Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of transcriptomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-specific databases, cancer-focused databases, and cell type-focused databases. Next, we discuss the technical and methodological challenges associated with curating large-scale scRNA-seq databases, along with current computational solutions. We argue that understanding scRNA-seq databases, including their limitations and assumptions, is crucial for effectively utilizing this data to make robust discoveries and identify novel biological insights. Such platforms can help bridge the gap between computational and wet lab scientists through user-friendly web-based interfaces needed for democratizing access to single-cell data. These platforms would facilitate interdisciplinary research, enabling researchers from various disciplines to collaborate effectively. This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising direction for future research in the field.
Collapse
Affiliation(s)
- Mahnoor N. Gondal
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
| | - Saad Ur Rehman Shah
- Gies College of Business, University of Illinois Business College, Champaign, MI, United States
| | - Arul M. Chinnaiyan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Urology, University of Michigan, Ann Arbor, MI, United States
- Howard Hughes Medical Institute, Ann Arbor, MI, United States
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, United States
| | - Marcin Cieslik
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI, United States
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States
- University of Michigan Rogel Cancer Center, Ann Arbor, MI, United States
| |
Collapse
|
3
|
de Winter N, Ji J, Sintou A, Forte E, Lee M, Noseda M, Li A, Koenig AL, Lavine KJ, Hayat S, Rosenthal N, Emanueli C, Srivastava PK, Sattler S. Persistent transcriptional changes in cardiac adaptive immune cells following myocardial infarction: New evidence from the re-analysis of publicly available single cell and nuclei RNA-sequencing data sets. J Mol Cell Cardiol 2024; 192:48-64. [PMID: 38734060 DOI: 10.1016/j.yjmcc.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 03/17/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
INTRODUCTION Chronic immunopathology contributes to the development of heart failure after a myocardial infarction. Both T and B cells of the adaptive immune system are present in the myocardium and have been suggested to be involved in post-MI immunopathology. METHODS We analyzed the B and T cell populations isolated from previously published single cell RNA-sequencing data sets (PMID: 32130914, PMID: 35948637, PMID: 32971526 and PMID: 35926050), of the mouse and human heart, using differential expression analysis, functional enrichment analysis, gene regulatory inferences, and integration with autoimmune and cardiovascular GWAS. RESULTS Already at baseline, mature effector B and T cells are present in the human and mouse heart, having increased activity in transcription factors maintaining tolerance (e.g. DEAF1, JDP2, SPI-B). Following MI, T cells upregulate pro-inflammatory transcript levels (e.g. Cd11, Gzmk, Prf1), while B cells upregulate activation markers (e.g. Il6, Il1rn, Ccl6) and collagen (e.g. Col5a2, Col4a1, Col1a2). Importantly, pro-inflammatory and fibrotic transcription factors (e.g. NFKB1, CREM, REL) remain active in T cells, while B cells maintain elevated activity in transcription factors related to immunoglobulin production (e.g. ERG, REL) in both mouse and human post-MI hearts. Notably, genes differentially expressed in post-MI T and B cells are associated with cardiovascular and autoimmune disease. CONCLUSION These findings highlight the varied and time-dependent dynamic roles of post-MI T and B cells. They appear ready-to-go and are activated immediately after MI, thus participate in the acute wound healing response. However, they subsequently remain in a state of pro-inflammatory activation contributing to persistent immunopathology.
Collapse
Affiliation(s)
- Natasha de Winter
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Jiahui Ji
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Amalia Sintou
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Elvira Forte
- The Jackson Laboratory, Bar Harbor, United States
| | - Michael Lee
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Michela Noseda
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Aoxue Li
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Medicine Solna, Division of Cardiovascular Medicine, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Andrew L Koenig
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | - Kory J Lavine
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | | | - Nadia Rosenthal
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; The Jackson Laboratory, Bar Harbor, United States
| | - Costanza Emanueli
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Prashant K Srivastava
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Susanne Sattler
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Cardiology, Medical University of Graz, Austria; Division of Pharmacology, Otto Loewi Research Center, Medical University of Graz, Austria.
| |
Collapse
|
4
|
Li J, Choi J, Cheng X, Ma J, Pema S, Sanes JR, Mardon G, Frankfort BJ, Tran NM, Li Y, Chen R. Comprehensive single-cell atlas of the mouse retina. iScience 2024; 27:109916. [PMID: 38812536 PMCID: PMC11134544 DOI: 10.1016/j.isci.2024.109916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 03/18/2024] [Accepted: 05/03/2024] [Indexed: 05/31/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has advanced our understanding of cellular heterogeneity by characterizing cell types across tissues and species. While several mouse retinal scRNA-seq datasets exist, each dataset is either limited in cell numbers or focused on specific cell classes, thereby hindering comprehensive gene expression analysis across all retina types. To fill the gap, we generated the largest retinal scRNA-seq dataset to date, comprising approximately 190,000 single cells from C57BL/6J mouse retinas, enriched for rare population cells via antibody-based magnetic cell sorting. Integrating this dataset with public datasets, we constructed the Mouse Retina Cell Atlas (MRCA) for wild-type mice, encompassing over 330,000 cells, characterizing 12 major classes and 138 cell types. The MRCA consolidates existing knowledge, identifies new cell types, and is publicly accessible via CELLxGENE, UCSC Cell Browser, and the Broad Single Cell Portal, providing a user-friendly resource for the mouse retina research community.
Collapse
Affiliation(s)
- Jin Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jongsu Choi
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Xuesen Cheng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Justin Ma
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shahil Pema
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Joshua R. Sanes
- Center for Brain Science and Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02130, USA
| | - Graeme Mardon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX 77030, USA
- Departments of Ophthalmology and Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Benjamin J. Frankfort
- Departments of Ophthalmology and Neuroscience, Baylor College of Medicine, Houston, TX 77030, USA
| | - Nicholas M. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yumei Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
5
|
Deol K, Weber GM, Yu YW. SlowMoMan: a web app for discovery of important features along user-drawn trajectories in 2D embeddings. BIOINFORMATICS ADVANCES 2024; 4:vbae095. [PMID: 38962404 PMCID: PMC11220466 DOI: 10.1093/bioadv/vbae095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 06/19/2024] [Indexed: 07/05/2024]
Abstract
Motivation Nonlinear low-dimensional embeddings allow humans to visualize high-dimensional data, as is often seen in bioinformatics, where datasets may have tens of thousands of dimensions. However, relating the axes of a nonlinear embedding to the original dimensions is a nontrivial problem. In particular, humans may identify patterns or interesting subsections in the embedding, but cannot easily identify what those patterns correspond to in the original data. Results Thus, we present SlowMoMan (SLOW Motions on MANifolds), a web application which allows the user to draw a one-dimensional path onto a 2D embedding. Then, by back-projecting the manifold to the original, high-dimensional space, we sort the original features such that those most discriminative along the manifold are ranked highly. We show a number of pertinent use cases for our tool, including trajectory inference, spatial transcriptomics, and automatic cell classification. Availability and implementation Software: https://yunwilliamyu.github.io/SlowMoMan/; Code: https://github.com/yunwilliamyu/SlowMoMan.
Collapse
Affiliation(s)
- Kiran Deol
- Department of Computer Science, University of Alberta, Edmonton, Alberta T6G 2R3, Canada
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Yun William Yu
- Computer and Mathematical Sciences, University of Toronto at Scarborough, Toronto, Ontario M1C 1A4, Canada
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, United States
| |
Collapse
|
6
|
Fang C, Selega A, Campbell KR. Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance. Genome Biol 2024; 25:159. [PMID: 38886757 PMCID: PMC11184819 DOI: 10.1186/s13059-024-03304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
BACKGROUND The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.
Collapse
Affiliation(s)
- Cindy Fang
- Lunenfeld-Tanenbaum Research Institute, Toronto, Canada
- Program in Bioinformatics and Computational Biology, University of Toronto, Toronto, Canada
- Present address: Department of Biostatistics, Johns Hopkins University, Baltimore, USA
| | - Alina Selega
- Lunenfeld-Tanenbaum Research Institute, Toronto, Canada
- Vector Institute, Toronto, Canada
| | - Kieran R Campbell
- Lunenfeld-Tanenbaum Research Institute, Toronto, Canada.
- Vector Institute, Toronto, Canada.
- Departments of Molecular Genetics, Statistical Sciences, Computer Science, University of Toronto, Toronto, Canada.
- Ontario Institute for Cancer Research, Toronto, Canada.
| |
Collapse
|
7
|
Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024; 4:100581. [PMID: 38823397 PMCID: PMC11228957 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Julian Lehrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O'Farrell
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Vanessa D Jonsson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| | - Mohammed A Mostajo-Radji
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| |
Collapse
|
8
|
Wang C, Acosta D, McNutt M, Bian J, Ma A, Fu H, Ma Q. A single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). Nat Commun 2024; 15:4710. [PMID: 38844475 PMCID: PMC11156951 DOI: 10.1038/s41467-024-49133-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Accepted: 05/21/2024] [Indexed: 06/09/2024] Open
Abstract
Alzheimer's Disease (AD) pathology has been increasingly explored through single-cell and single-nucleus RNA-sequencing (scRNA-seq & snRNA-seq) and spatial transcriptomics (ST). However, the surge in data demands a comprehensive, user-friendly repository. Addressing this, we introduce a single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). It offers a broader spectrum of AD-related datasets, an optimized analytical pipeline, and improved usability. The database encompasses 1,053 samples (277 integrated datasets) from 67 AD-related scRNA-seq & snRNA-seq studies, totaling 7,332,202 cells. Additionally, it archives 381 ST datasets from 18 human and mouse brain studies. Each dataset is annotated with details such as species, gender, brain region, disease/control status, age, and AD Braak stages. ssREAD also provides an analysis suite for cell clustering, identification of differentially expressed and spatially variable genes, cell-type-specific marker genes and regulons, and spot deconvolution for integrative analysis. ssREAD is freely available at https://bmblx.bmi.osumc.edu/ssread/ .
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Diana Acosta
- Department of Neuroscience, The Ohio State University, Columbus, OH, 43210, USA
| | - Megan McNutt
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, Gainesville, FL, 32606, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA
| | - Hongjun Fu
- Department of Neuroscience, The Ohio State University, Columbus, OH, 43210, USA.
- Chronic Brain Injury Program, The Ohio State University, Columbus, OH, 43210, USA.
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
9
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
10
|
Wu R, Horimoto Y, Oshi M, Benesch MGK, Khoury T, Takabe K, Ishikawa T. Emerging measurements for tumor-infiltrating lymphocytes in breast cancer. Jpn J Clin Oncol 2024; 54:620-629. [PMID: 38521965 PMCID: PMC11144297 DOI: 10.1093/jjco/hyae033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 03/01/2024] [Indexed: 03/25/2024] Open
Abstract
Tumor-infiltrating lymphocytes are a general term for lymphocytes or immune cells infiltrating the tumor microenvironment. Numerous studies have demonstrated tumor-infiltrating lymphocytes to be robust prognostic and predictive biomarkers in breast cancer. Recently, immune checkpoint inhibitors, which directly target tumor-infiltrating lymphocytes, have become part of standard of care treatment for triple-negative breast cancer. Surprisingly, tumor-infiltrating lymphocytes quantified by conventional methods do not predict response to immune checkpoint inhibitors, which highlights the heterogeneity of tumor-infiltrating lymphocytes and the complexity of the immune network in the tumor microenvironment. Tumor-infiltrating lymphocytes are composed of diverse immune cell populations, including cytotoxic CD8-positive T lymphocytes, B cells and myeloid cells. Traditionally, tumor-infiltrating lymphocytes in tumor stroma have been evaluated by histology. However, the standardization of this approach is limited, necessitating the use of various novel technologies to elucidate the heterogeneity in the tumor microenvironment. This review outlines the evaluation methods for tumor-infiltrating lymphocytes from conventional pathological approaches that evaluate intratumoral and stromal tumor-infiltrating lymphocytes such as immunohistochemistry, to the more recent advancements in computer tissue imaging using artificial intelligence, flow cytometry sorting and multi-omics analyses using high-throughput assays to estimate tumor-infiltrating lymphocytes from bulk tumor using immune signatures or deconvolution tools. We also discuss higher resolution technologies that enable the analysis of tumor-infiltrating lymphocytes heterogeneity such as single-cell analysis and spatial transcriptomics. As we approach the era of personalized medicine, it is important for clinicians to understand these technologies.
Collapse
Affiliation(s)
- Rongrong Wu
- Department of Breast Surgery and Oncology, Tokyo Medical University, Tokyo, Japan
- Department of Surgical Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Yoshiya Horimoto
- Department of Breast Surgery and Oncology, Tokyo Medical University, Tokyo, Japan
- Department of Breast Oncology, Juntendo University Hospital, Tokyo, Japan
| | - Masanori Oshi
- Department of Surgical Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
- Department of Gastroenterological Surgery, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| | - Matthew G K Benesch
- Department of Surgical Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Thaer Khoury
- Department of Pathology & Laboratory Medicine, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
| | - Kazuaki Takabe
- Department of Breast Surgery and Oncology, Tokyo Medical University, Tokyo, Japan
- Department of Surgical Oncology, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
- Department of Gastroenterological Surgery, Yokohama City University Graduate School of Medicine, Yokohama, Japan
- Department of Surgery, University at Buffalo Jacobs School of Medicine and Biomedical Sciences, The State University of New York, Buffalo, NY, USA
- Department of Surgery, Niigata University Graduate School of Medical and Dental Sciences, Niigata, Japan
- Department of Breast Surgery, Fukushima Medical University, Fukushima, Japan
| | - Takashi Ishikawa
- Department of Breast Surgery and Oncology, Tokyo Medical University, Tokyo, Japan
| |
Collapse
|
11
|
Ma Y, Pei Y. NDMNN: A novel deep residual network based MNN method to remove batch effects from scRNA-seq data. J Bioinform Comput Biol 2024; 22:2450015. [PMID: 39036845 DOI: 10.1142/s021972002450015x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/23/2024]
Abstract
The rapid development of single-cell RNA sequencing (scRNA-seq) technology has generated vast amounts of data. However, these data often exhibit batch effects due to various factors such as different time points, experimental personnel, and instruments used, which can obscure the biological differences in the data itself. Based on the characteristics of scRNA-seq data, we designed a dense deep residual network model, referred to as NDnetwork. Subsequently, we combined the NDnetwork model with the MNN method to correct batch effects in scRNA-seq data, and named it the NDMNN method. Comprehensive experimental results demonstrate that the NDMNN method outperforms existing commonly used methods for correcting batch effects in scRNA-seq data. As the scale of single-cell sequencing continues to expand, we believe that NDMNN will be a valuable tool for researchers in the biological community for correcting batch effects in their studies. The source code and experimental results of the NDMNN method can be found at https://github.com/mustang-hub/NDMNN.
Collapse
Affiliation(s)
- Yupeng Ma
- Software Engineering, Tiangong University, Tianjin, P. R. China
| | - Yongzhen Pei
- School of Mathematical Sciences, Tiangong University, Tianjin, P. R. China
| |
Collapse
|
12
|
Cho J, Baik B, Nguyen HCT, Park D, Nam D. Characterizing efficient feature selection for single-cell expression analysis. Brief Bioinform 2024; 25:bbae317. [PMID: 38975891 PMCID: PMC11229035 DOI: 10.1093/bib/bbae317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 03/31/2024] [Accepted: 06/17/2024] [Indexed: 07/09/2024] Open
Abstract
Unsupervised feature selection is a critical step for efficient and accurate analysis of single-cell RNA-seq data. Previous benchmarks used two different criteria to compare feature selection methods: (i) proportion of ground-truth marker genes included in the selected features and (ii) accuracy of cell clustering using ground-truth cell types. Here, we systematically compare the performance of 11 feature selection methods for both criteria. We first demonstrate the discordance between these criteria and suggest using the latter. We then compare the distribution of selected genes in their means between feature selection methods. We show that lowly expressed genes exhibit seriously high coefficients of variation and are mostly excluded by high-performance methods. In particular, high-deviation- and high-expression-based methods outperform the widely used in Seurat package in clustering cells and data visualization. We further show they also enable a clear separation of the same cell type from different tissues as well as accurate estimation of cell trajectories.
Collapse
Affiliation(s)
- Juok Cho
- Department of Biomedical Engineering, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Ulsan 44919, Republic of Korea
| | - Bukyung Baik
- Department of Biological Sciences, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Ulsan 44919, Republic of Korea
| | - Hai C T Nguyen
- Department of Biological Sciences, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Ulsan 44919, Republic of Korea
| | - Daeui Park
- Department of Predictive Toxicology, Korea Institute of Toxicology, 141, Gajeong-ro, Daejeon 34114, Republic of Korea
| | - Dougu Nam
- Department of Biological Sciences, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Ulsan 44919, Republic of Korea
- Department of Mathematical Sciences, Ulsan National Institute of Science and Technology (UNIST), 50, UNIST-gil, Ulsan 44919, Republic of Korea
| |
Collapse
|
13
|
Gan D, Zhu Y, Lu X, Li J. SCIPAC: quantitative estimation of cell-phenotype associations. Genome Biol 2024; 25:119. [PMID: 38741183 PMCID: PMC11089691 DOI: 10.1186/s13059-024-03263-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/30/2024] [Indexed: 05/16/2024] Open
Abstract
Numerous algorithms have been proposed to identify cell types in single-cell RNA sequencing data, yet a fundamental problem remains: determining associations between cells and phenotypes such as cancer. We develop SCIPAC, the first algorithm that quantitatively estimates the association between each cell in single-cell data and a phenotype. SCIPAC also provides a p-value for each association and applies to data with virtually any type of phenotype. We demonstrate SCIPAC's accuracy in simulated data. On four real cancerous or noncancerous datasets, insights from SCIPAC help interpret the data and generate new hypotheses. SCIPAC requires minimum tuning and is computationally very fast.
Collapse
Affiliation(s)
- Dailin Gan
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA
| | - Yini Zhu
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, Integrated Biomedical Sciences Graduate Program, University of Notre Dame, Notre Dame, 46556, IN, USA
| | - Xin Lu
- Department of Biological Sciences, Boler-Parseghian Center for Rare and Neglected Diseases, Harper Cancer Research Institute, Integrated Biomedical Sciences Graduate Program, University of Notre Dame, Notre Dame, 46556, IN, USA
- Tumor Microenvironment and Metastasis Program, Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Indianapolis, 46202, IN, USA
| | - Jun Li
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, 46556, IN, USA.
| |
Collapse
|
14
|
Ediriwickrema A, Nakauchi Y, Fan AC, Köhnke T, Hu X, Luca BA, Kim Y, Ramakrishnan S, Nakamoto M, Karigane D, Linde MH, Azizi A, Newman AM, Gentles AJ, Majeti R. A single cell framework identifies functionally and molecularly distinct multipotent progenitors in adult human hematopoiesis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.07.592983. [PMID: 38766031 PMCID: PMC11100686 DOI: 10.1101/2024.05.07.592983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Hematopoietic multipotent progenitors (MPPs) regulate blood cell production to appropriately meet the biological demands of the human body. Human MPPs remain ill-defined whereas mouse MPPs have been well characterized with distinct immunophenotypes and lineage potencies. Using multiomic single cell analyses and complementary functional assays, we identified new human MPPs and oligopotent progenitor populations within Lin-CD34+CD38dim/lo adult bone marrow with distinct biomolecular and functional properties. These populations were prospectively isolated based on expression of CD69, CLL1, and CD2 in addition to classical markers like CD90 and CD45RA. We show that within the canonical Lin-CD34+CD38dim/loCD90CD45RA-MPP population, there is a CD69+ MPP with long-term engraftment and multilineage differentiation potential, a CLL1+ myeloid-biased MPP, and a CLL1-CD69-erythroid-biased MPP. We also show that the canonical Lin-CD34+CD38dim/loCD90-CD45RA+ LMPP population can be separated into a CD2+ LMPP with lymphoid and myeloid potential, a CD2-LMPP with high lymphoid potential, and a CLL1+ GMP with minimal lymphoid potential. We used these new HSPC profiles to study human and mouse bone marrow cells and observe limited cell type specific homology between humans and mice and cell type specific changes associated with aging. By identifying and functionally characterizing new adult MPP sub-populations, we provide an updated reference and framework for future studies in human hematopoiesis.
Collapse
|
15
|
Xu Z, Liu F, Ding Y, Pan T, Wu YH, Liu J, Bado IL, Zhang W, Wu L, Gao Y, Hao X, Yu L, Edwards DG, Chan HL, Aguirre S, Dieffenbach MW, Chen E, Shen Y, Hoffman D, Dominguez LB, Rivas CH, Chen X, Wang H, Gugala Z, Satcher RL, Zhang XHF. Unbiased metastatic niche-labeling identifies estrogen receptor-positive macrophages as a barrier of T cell infiltration during bone colonization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.07.593016. [PMID: 38765966 PMCID: PMC11100675 DOI: 10.1101/2024.05.07.593016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Microenvironment niches determine cellular fates of metastatic cancer cells. However, robust and unbiased approaches to identify niche components and their molecular profiles are lacking. We established Sortase A-Based Microenvironment Niche Tagging (SAMENT), which selectively labels cells encountered by cancer cells during metastatic colonization. SAMENT was applied to multiple cancer models colonizing the same organ and the same cancer to different organs. Common metastatic niche features include macrophage enrichment and T cell depletion. Macrophage niches are phenotypically diverse between different organs. In bone, macrophages express the estrogen receptor alpha (ERα) and exhibit active ERα signaling in male and female hosts. Conditional knockout of Esr1 in macrophages significantly retarded bone colonization by allowing T cell infiltration. ERα expression was also discovered in human bone metastases of both genders. Collectively, we identified a unique population of ERα+ macrophages in the metastatic niche and functionally tied ERα signaling in macrophages to T cell exclusion during metastatic colonization. HIGHLIGHTS SAMENT is a robust metastatic niche-labeling approach amenable to single-cell omics.Metastatic niches are typically enriched with macrophages and depleted of T cells.Direct interaction with cancer cells induces ERα expression in niche macrophages. Knockout of Esr1 in macrophages allows T cell infiltration and retards bone colonization.
Collapse
|
16
|
Xu J, Huang D, Zhang X. scmFormer Integrates Large-Scale Single-Cell Proteomics and Transcriptomics Data by Multi-Task Transformer. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307835. [PMID: 38483032 PMCID: PMC11109621 DOI: 10.1002/advs.202307835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Revised: 01/24/2024] [Indexed: 05/23/2024]
Abstract
Transformer-based models have revolutionized single cell RNA-seq (scRNA-seq) data analysis. However, their applicability is challenged by the complexity and scale of single-cell multi-omics data. Here a novel single-cell multi-modal/multi-task transformer (scmFormer) is proposed to fill up the existing blank of integrating single-cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large-scale single-cell multimodal data and heterogeneous multi-batch paired multi-omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell-type labels from single-cell transcriptomics to proteomics data. Using COVID-19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well-suited for spatial multi-omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single-cell multi-omics data.
Collapse
Affiliation(s)
- Jing Xu
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- University of Chinese Academy of SciencesBeijing100049China
| | - De‐Shuang Huang
- Eastern Institute for Advanced StudyEastern Institute of TechnologyNingbo315200China
| | - Xiujun Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty AgricultureWuhan Botanical GardenChinese Academy of SciencesWuhan430074China
- Center of Economic BotanyCore Botanical GardensChinese Academy of SciencesWuhan430074China
| |
Collapse
|
17
|
Wang C, Acosta D, McNutt M, Bian J, Ma A, Fu H, Ma Q. A Single-cell and Spatial RNA-seq Database for Alzheimer's Disease (ssREAD). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.08.556944. [PMID: 37745592 PMCID: PMC10515769 DOI: 10.1101/2023.09.08.556944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Alzheimer's Disease (AD) pathology has been increasingly explored through single-cell and single-nucleus RNA-sequencing (scRNA-seq & snRNA-seq) and spatial transcriptomics (ST). However, the surge in data demands a comprehensive, user-friendly repository. Addressing this, we introduce a single-cell and spatial RNA-seq database for Alzheimer's disease (ssREAD). It offers a broader spectrum of AD-related datasets, an optimized analytical pipeline, and improved usability. The database encompasses 1,053 samples (277 integrated datasets) from 67 AD-related scRNA-seq & snRNA-seq studies, totaling 7,332,202 cells. Additionally, it archives 381 ST datasets from 18 human and mouse brain studies. Each dataset is annotated with details such as species, gender, brain region, disease/control status, age, and AD Braak stages. ssREAD also provides an analysis suite for cell clustering, identification of differentially expressed and spatially variable genes, cell-type-specific marker genes and regulons, and spot deconvolution for integrative analysis. ssREAD is freely available at https://bmblx.bmi.osumc.edu/ssread/.
Collapse
Affiliation(s)
- Cankun Wang
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Diana Acosta
- Department of Neuroscience, The Ohio State University, OH 43210, USA
| | - Megan McNutt
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Jiang Bian
- Department of Health Outcomes & Biomedical Informatics, University of Florida, FL 32606, USA
| | - Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| | - Hongjun Fu
- Department of Neuroscience, The Ohio State University, OH 43210, USA
- Chronic Brain Injury Program, The Ohio State University, OH 43210, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, OH 43210, USA
| |
Collapse
|
18
|
Gondal MN, Shah SUR, Chinnaiyan AM, Cieslik M. A Systematic Overview of Single-Cell Transcriptomics Databases, their Use cases, and Limitations. ARXIV 2024:arXiv:2404.10545v1. [PMID: 38699169 PMCID: PMC11065044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Rapid advancements in high-throughput single-cell RNA-seq (scRNA-seq) technologies and experimental protocols have led to the generation of vast amounts of genomic data that populates several online databases and repositories. Here, we systematically examined large-scale scRNA-seq databases, categorizing them based on their scope and purpose such as general, tissue-specific databases, disease-specific databases, cancer-focused databases, and cell type-focused databases. Next, we discuss the technical and methodological challenges associated with curating large-scale scRNA-seq databases, along with current computational solutions. We argue that understanding scRNA-seq databases, including their limitations and assumptions, is crucial for effectively utilizing this data to make robust discoveries and identify novel biological insights. Furthermore, we propose that bridging the gap between computational and wet lab scientists through user-friendly web-based platforms is needed for democratizing access to single-cell data. These platforms would facilitate interdisciplinary research, enabling researchers from various disciplines to collaborate effectively. This review underscores the importance of leveraging computational approaches to unravel the complexities of single-cell data and offers a promising direction for future research in the field.
Collapse
Affiliation(s)
- Mahnoor N. Gondal
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI USA
| | - Saad Ur Rehman Shah
- Gies College of Business, University of Illinois Business College, Champaign, IL USA
| | - Arul M. Chinnaiyan
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI USA
- Department of Pathology, University of Michigan, Ann Arbor, MI USA
- Department of Urology, University of Michigan, Ann Arbor, MI USA
- Howard Hughes Medical Institute, Ann Arbor, MI USA
- University of Michigan Rogel Cancer Center, Ann Arbor, MI USA
| | - Marcin Cieslik
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI USA
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI USA
- Department of Pathology, University of Michigan, Ann Arbor, MI USA
- University of Michigan Rogel Cancer Center, Ann Arbor, MI USA
| |
Collapse
|
19
|
Cao Y, Zhao X, Tang S, Jiang Q, Li S, Li S, Chen S. scButterfly: a versatile single-cell cross-modality translation method via dual-aligned variational autoencoders. Nat Commun 2024; 15:2973. [PMID: 38582890 PMCID: PMC10998864 DOI: 10.1038/s41467-024-47418-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 03/28/2024] [Indexed: 04/08/2024] Open
Abstract
Recent advancements for simultaneously profiling multi-omics modalities within individual cells have enabled the interrogation of cellular heterogeneity and molecular hierarchy. However, technical limitations lead to highly noisy multi-modal data and substantial costs. Although computational methods have been proposed to translate single-cell data across modalities, broad applications of the methods still remain impeded by formidable challenges. Here, we propose scButterfly, a versatile single-cell cross-modality translation method based on dual-aligned variational autoencoders and data augmentation schemes. With comprehensive experiments on multiple datasets, we provide compelling evidence of scButterfly's superiority over baseline methods in preserving cellular heterogeneity while translating datasets of various contexts and in revealing cell type-specific biological insights. Besides, we demonstrate the extensive applications of scButterfly for integrative multi-omics analysis of single-modality data, data enhancement of poor-quality single-cell multi-omics, and automatic cell type annotation of scATAC-seq data. Moreover, scButterfly can be generalized to unpaired data training, perturbation-response analysis, and consecutive translation.
Collapse
Affiliation(s)
- Yichuan Cao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Xiamiao Zhao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Songming Tang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Qun Jiang
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division of BNRIST, Department of Automation, Tsinghua University, 100084, Beijing, China
| | - Sijie Li
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China
| | - Siyu Li
- School of Statistics and Data Science, Nankai University, Tianjin, 300071, China
| | - Shengquan Chen
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, China.
| |
Collapse
|
20
|
Trepte P, Secker C, Olivet J, Blavier J, Kostova S, Maseko SB, Minia I, Silva Ramos E, Cassonnet P, Golusik S, Zenkner M, Beetz S, Liebich MJ, Scharek N, Schütz A, Sperling M, Lisurek M, Wang Y, Spirohn K, Hao T, Calderwood MA, Hill DE, Landthaler M, Choi SG, Twizere JC, Vidal M, Wanker EE. AI-guided pipeline for protein-protein interaction drug discovery identifies a SARS-CoV-2 inhibitor. Mol Syst Biol 2024; 20:428-457. [PMID: 38467836 PMCID: PMC10987651 DOI: 10.1038/s44320-024-00019-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 01/22/2024] [Accepted: 01/23/2024] [Indexed: 03/13/2024] Open
Abstract
Protein-protein interactions (PPIs) offer great opportunities to expand the druggable proteome and therapeutically tackle various diseases, but remain challenging targets for drug discovery. Here, we provide a comprehensive pipeline that combines experimental and computational tools to identify and validate PPI targets and perform early-stage drug discovery. We have developed a machine learning approach that prioritizes interactions by analyzing quantitative data from binary PPI assays or AlphaFold-Multimer predictions. Using the quantitative assay LuTHy together with our machine learning algorithm, we identified high-confidence interactions among SARS-CoV-2 proteins for which we predicted three-dimensional structures using AlphaFold-Multimer. We employed VirtualFlow to target the contact interface of the NSP10-NSP16 SARS-CoV-2 methyltransferase complex by ultra-large virtual drug screening. Thereby, we identified a compound that binds to NSP10 and inhibits its interaction with NSP16, while also disrupting the methyltransferase activity of the complex, and SARS-CoV-2 replication. Overall, this pipeline will help to prioritize PPI targets to accelerate the discovery of early-stage drug candidates targeting protein complexes and pathways.
Collapse
Affiliation(s)
- Philipp Trepte
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
- Brain Development and Disease, Institute of Molecular Biotechnology of the Austrian Academy of Sciences, 1030, Vienna, Austria.
| | - Christopher Secker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
- Zuse Institute Berlin, Berlin, Germany.
| | - Julien Olivet
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Structural Biology Unit, Laboratory of Virology and Chemotherapy, Rega Institute for Medical Research, Department of Microbiology, Immunology and Transplantation, Katholieke Universiteit Leuven, 3000, Leuven, Belgium
| | - Jeremy Blavier
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Simona Kostova
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Sibusiso B Maseko
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium
| | - Igor Minia
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
| | - Eduardo Silva Ramos
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Patricia Cassonnet
- Département de Virologie, Unité de Génétique Moléculaire des Virus à ARN (GMVR), Institut Pasteur, Centre National de la Recherche Scientifique (CNRS), Université de Paris, Paris, France
| | - Sabrina Golusik
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Martina Zenkner
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Stephanie Beetz
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Mara J Liebich
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Nadine Scharek
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Anja Schütz
- Protein Production & Characterization, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany
| | - Marcel Sperling
- Multifunctional Colloids and Coating, Fraunhofer Institute for Applied Polymer Research (IAP), 14476, Potsdam-Golm, Germany
| | - Michael Lisurek
- Structural Chemistry and Computational Biophysics, Leibniz-Institut für Molekulare Pharmakologie (FMP), 13125, Berlin, Germany
| | - Yang Wang
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Kerstin Spirohn
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Tong Hao
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Michael A Calderwood
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - David E Hill
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA
| | - Markus Landthaler
- RNA Biology and Posttranscriptional Regulation, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, 13125, Berlin, Germany
- Institute of Biology, Humboldt-Universität zu Berlin, 13125, Berlin, Germany
| | - Soon Gang Choi
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
| | - Jean-Claude Twizere
- Laboratory of Viral Interactomes, Interdisciplinary Cluster for Applied Genoproteomics (GIGA)-Molecular Biology of Diseases, University of Liège, 4000, Liège, Belgium.
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- TERRA Teaching and Research Center, Gembloux Agro-Bio Tech, University of Liège, 5030, Gembloux, Belgium.
- Laboratory of Algal Synthetic and Systems Biology, Division of Science and Math, New York University Abu Dhabi, Abu Dhabi, UAE.
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
- Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, 02115, USA.
| | - Erich E Wanker
- Proteomics and Molecular Mechanisms of Neurodegenerative Diseases, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, 13125, Berlin, Germany.
| |
Collapse
|
21
|
Huang X, Liu R, Yang S, Chen X, Li H. scAnnoX: an R package integrating multiple public tools for single-cell annotation. PeerJ 2024; 12:e17184. [PMID: 38560451 PMCID: PMC10981883 DOI: 10.7717/peerj.17184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 03/11/2024] [Indexed: 04/04/2024] Open
Abstract
Background Single-cell annotation plays a crucial role in the analysis of single-cell genomics data. Despite the existence of numerous single-cell annotation algorithms, a comprehensive tool for integrating and comparing these algorithms is also lacking. Methods This study meticulously investigated a plethora of widely adopted single-cell annotation algorithms. Ten single-cell annotation algorithms were selected based on the classification of either reference dataset-dependent or marker gene-dependent approaches. These algorithms included SingleR, Seurat, sciBet, scmap, CHETAH, scSorter, sc.type, cellID, scCATCH, and SCINA. Building upon these algorithms, we developed an R package named scAnnoX for the integration and comparative analysis of single-cell annotation algorithms. Results The development of the scAnnoX software package provides a cohesive framework for annotating cells in scRNA-seq data, enabling researchers to more efficiently perform comparative analyses among the cell type annotations contained in scRNA-seq datasets. The integrated environment of scAnnoX streamlines the testing, evaluation, and comparison processes among various algorithms. Among the ten annotation tools evaluated, SingleR, Seurat, sciBet, and scSorter emerged as top-performing algorithms in terms of prediction accuracy, with SingleR and sciBet demonstrating particularly superior performance, offering guidance for users. Interested parties can access the scAnnoX package at https://github.com/XQ-hub/scAnnoX.
Collapse
Affiliation(s)
- Xiaoqian Huang
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Ruiqi Liu
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Shiwei Yang
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Xiaozhou Chen
- School of Mathematics and Computer Science, Yunnan Minzu University, Kunming, Yunnan Province, China
| | - Huamei Li
- Department of Hepatobiliary Surgery, the Affiliated Drum Tower Hospital, Medical School, Nanjing University, Nanjing, Jiangsu Province, China
| |
Collapse
|
22
|
Ding J, Liu R, Wen H, Tang W, Li Z, Venegas J, Su R, Molho D, Jin W, Wang Y, Lu Q, Li L, Zuo W, Chang Y, Xie Y, Tang J. DANCE: a deep learning library and benchmark platform for single-cell analysis. Genome Biol 2024; 25:72. [PMID: 38504331 PMCID: PMC10949782 DOI: 10.1186/s13059-024-03211-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024] Open
Abstract
DANCE is the first standard, generic, and extensible benchmark platform for accessing and evaluating computational methods across the spectrum of benchmark datasets for numerous single-cell analysis tasks. Currently, DANCE supports 3 modules and 8 popular tasks with 32 state-of-art methods on 21 benchmark datasets. People can easily reproduce the results of supported algorithms across major benchmark datasets via minimal efforts, such as using only one command line. In addition, DANCE provides an ecosystem of deep learning architectures and tools for researchers to facilitate their own model development. DANCE is an open-source Python package that welcomes all kinds of contributions.
Collapse
Affiliation(s)
- Jiayuan Ding
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Hongzhi Wen
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Wenzhuo Tang
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Zhaoheng Li
- Department of Biostatistics, University of Washington, Seattle, USA
| | - Julian Venegas
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Runze Su
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
- Department of Statistics and Probability, Michigan State University, East Lansing, USA
| | - Dylan Molho
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA
| | - Wei Jin
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| | - Yixin Wang
- Department of Bioengineering, Stanford University, Palo Alto, USA
| | - Qiaolin Lu
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Lingxiao Li
- Department of Computer Science, Boston University, Boston, USA
| | - Wangyang Zuo
- Department of Computer Science, Zhejiang University of Technology, Zhejiang, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Yuying Xie
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, USA.
- Department of Statistics and Probability, Michigan State University, East Lansing, USA.
| | - Jiliang Tang
- Department of Computer Science and Engineering, Michigan State University, East Lansing, USA.
| |
Collapse
|
23
|
Zhang Z, Schaefer C, Jiang W, Lu Z, Lee J, Sziraki A, Abdulraouf A, Wick B, Haeussler M, Li Z, Molla G, Satija R, Zhou W, Cao J. A Panoramic View of Cell Population Dynamics in Mammalian Aging. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.01.583001. [PMID: 38496474 PMCID: PMC10942312 DOI: 10.1101/2024.03.01.583001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
To elucidate the aging-associated cellular population dynamics throughout the body, here we present PanSci, a single-cell transcriptome atlas profiling over 20 million cells from 623 mouse tissue samples, encompassing a range of organs across different life stages, sexes, and genotypes. This comprehensive dataset allowed us to identify more than 3,000 unique cellular states and catalog over 200 distinct aging-associated cell populations experiencing significant depletion or expansion. Our panoramic analysis uncovered temporally structured, organ- and lineage-specific shifts of cellular dynamics during lifespan progression. Moreover, we investigated aging-associated alterations in immune cell populations, revealing both widespread shifts and organ-specific changes. We further explored the regulatory roles of the immune system on aging and pinpointed specific age-related cell population expansions that are lymphocyte-dependent. The breadth and depth of our 'cell-omics' methodology not only enhance our comprehension of cellular aging but also lay the groundwork for exploring the complex regulatory networks among varied cell types in the context of aging and aging-associated diseases.
Collapse
Affiliation(s)
- Zehao Zhang
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
- The David Rockefeller Graduate Program in Bioscience, The Rockefeller University, New York, NY, USA
| | - Chloe Schaefer
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
| | - Weirong Jiang
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
| | - Ziyu Lu
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
- The David Rockefeller Graduate Program in Bioscience, The Rockefeller University, New York, NY, USA
| | - Jasper Lee
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
| | - Andras Sziraki
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
- The David Rockefeller Graduate Program in Bioscience, The Rockefeller University, New York, NY, USA
| | - Abdulraouf Abdulraouf
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
- The Tri-Institutional M.D-Ph.D Program, New York, NY, USA
| | - Brittney Wick
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Zhuoyan Li
- New York Genome Center, New York, NY, USA
| | | | - Rahul Satija
- New York Genome Center, New York, NY, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, USA
| | - Wei Zhou
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
| | - Junyue Cao
- Laboratory of Single Cell Genomics and Population Dynamics, The Rockefeller University, New York, NY, USA
| |
Collapse
|
24
|
Theunissen L, Mortier T, Saeys Y, Waegeman W. Uncertainty-aware single-cell annotation with a hierarchical reject option. Bioinformatics 2024; 40:btae128. [PMID: 38441258 PMCID: PMC10957513 DOI: 10.1093/bioinformatics/btae128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/23/2024] [Accepted: 03/01/2024] [Indexed: 03/23/2024] Open
Abstract
MOTIVATION Automatic cell type annotation methods assign cell type labels to new datasets by extracting relationships from a reference RNA-seq dataset. However, due to the limited resolution of gene expression features, there is always uncertainty present in the label assignment. To enhance the reliability and robustness of annotation, most machine learning methods address this uncertainty by providing a full reject option, i.e. when the predicted confidence score of a cell type label falls below a user-defined threshold, no label is assigned and no prediction is made. As a better alternative, some methods deploy hierarchical models and consider a so-called partial rejection by returning internal nodes of the hierarchy as label assignment. However, because a detailed experimental analysis of various rejection approaches is missing in the literature, there is currently no consensus on best practices. RESULTS We evaluate three annotation approaches (i) full rejection, (ii) partial rejection, and (iii) no rejection for both flat and hierarchical probabilistic classifiers. Our findings indicate that hierarchical classifiers are superior when rejection is applied, with partial rejection being the preferred rejection approach, as it preserves a significant amount of label information. For optimal rejection implementation, the rejection threshold should be determined through careful examination of a method's rejection behavior. Without rejection, flat and hierarchical annotation perform equally well, as long as the cell type hierarchy accurately captures transcriptomic relationships. AVAILABILITY AND IMPLEMENTATION Code is freely available at https://github.com/Latheuni/Hierarchical_reject and https://doi.org/10.5281/zenodo.10697468.
Collapse
Affiliation(s)
- Lauren Theunissen
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Thomas Mortier
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | - Willem Waegeman
- Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium
| |
Collapse
|
25
|
Garmire LX, Li Y, Huang Q, Xu C, Teichmann SA, Kaminski N, Pellegrini M, Nguyen Q, Teschendorff AE. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods 2024; 21:391-400. [PMID: 38374264 DOI: 10.1038/s41592-023-02166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/26/2023] [Indexed: 02/21/2024]
Abstract
Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.
Collapse
Affiliation(s)
- Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Yijun Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Qianhui Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Naftali Kaminski
- Pulmonary, Critical Care & Sleep Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Matteo Pellegrini
- Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland and QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- UCL Cancer Institute, University College London, London, UK
| |
Collapse
|
26
|
Maan H, Zhang L, Yu C, Geuenich MJ, Campbell KR, Wang B. Characterizing the impacts of dataset imbalance on single-cell data integration. Nat Biotechnol 2024:10.1038/s41587-023-02097-9. [PMID: 38429430 DOI: 10.1038/s41587-023-02097-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 12/13/2023] [Indexed: 03/03/2024]
Abstract
Computational methods for integrating single-cell transcriptomic data from multiple samples and conditions do not generally account for imbalances in the cell types measured in different datasets. In this study, we examined how differences in the cell types present, the number of cells per cell type and the cell type proportions across samples affect downstream analyses after integration. The Iniquitate pipeline assesses the robustness of integration results after perturbing the degree of imbalance between datasets. Benchmarking of five state-of-the-art single-cell RNA sequencing integration techniques in 2,600 integration experiments indicates that sample imbalance has substantial impacts on downstream analyses and the biological interpretation of integration results. Imbalance perturbation led to statistically significant variation in unsupervised clustering, cell type classification, differential expression and marker gene annotation, query-to-reference mapping and trajectory inference. We quantified the impacts of imbalance through newly introduced properties-aggregate cell type support and minimum cell type center distance. To better characterize and mitigate impacts of imbalance, we introduce balanced clustering metrics and imbalanced integration guidelines for integration method users.
Collapse
Affiliation(s)
- Hassaan Maan
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
| | - Lin Zhang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
- Department of Statistics and Actuarial Science, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Chengxin Yu
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Michael J Geuenich
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada
| | - Kieran R Campbell
- Vector Institute, Toronto, Ontario, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
| | - Bo Wang
- Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada.
- Vector Institute, Toronto, Ontario, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| |
Collapse
|
27
|
Fang C, Dziedzic A, Zhang L, Oliva L, Verma A, Razak F, Papernot N, Wang B. Decentralised, collaborative, and privacy-preserving machine learning for multi-hospital data. EBioMedicine 2024; 101:105006. [PMID: 38377795 PMCID: PMC10884342 DOI: 10.1016/j.ebiom.2024.105006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 01/26/2024] [Accepted: 01/28/2024] [Indexed: 02/22/2024] Open
Abstract
BACKGROUND Machine Learning (ML) has demonstrated its great potential on medical data analysis. Large datasets collected from diverse sources and settings are essential for ML models in healthcare to achieve better accuracy and generalizability. Sharing data across different healthcare institutions or jurisdictions is challenging because of complex and varying privacy and regulatory requirements. Hence, it is hard but crucial to allow multiple parties to collaboratively train an ML model leveraging the private datasets available at each party without the need for direct sharing of those datasets or compromising the privacy of the datasets through collaboration. METHODS In this paper, we address this challenge by proposing Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH). This framework offers the following key benefits: (1) it allows different parties to collaboratively train an ML model without transferring their private datasets (i.e., no data centralization); (2) it safeguards patients' privacy by limiting the potential privacy leakage arising from any contents shared across the parties during the training process; and (3) it facilitates the ML model training without relying on a centralized party/server. FINDINGS We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets: patient mortality prediction using electronic health records, cell-type classification using single-cell human genomes, and pathology identification using chest radiology images. The ML models trained with DeCaPH framework have less than 3.2% drop in model performance comparing to those trained by the non-privacy-preserving collaborative framework. Meanwhile, the average vulnerability to privacy attacks of the models trained with DeCaPH decreased by up to 16%. In addition, models trained with our DeCaPH framework achieve better performance than those models trained solely with the private datasets from individual parties without collaboration and those trained with the previous privacy-preserving collaborative training framework under the same privacy guarantee by up to 70% and 18.2% respectively. INTERPRETATION We demonstrate that the ML models trained with DeCaPH framework have an improved utility-privacy trade-off, showing DeCaPH enables the models to have good performance while preserving the privacy of the training data points. In addition, the ML models trained with DeCaPH framework in general outperform those trained solely with the private datasets from individual parties, showing that DeCaPH enhances the model generalizability. FUNDING This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC, RGPIN-2020-06189 and DGECR-2020-00294), Canadian Institute for Advanced Research (CIFAR) AI Catalyst Grants, CIFAR AI Chair programs, Temerty Professor of AI Research and Education in Medicine, University of Toronto, Amazon, Apple, DARPA through the GARD project, Intel, Meta, the Ontario Early Researcher Award, and the Sloan Foundation. Resources used in preparing this research were provided, in part, by the Province of Ontario, the Government of Canada through CIFAR, and companies sponsoring the Vector Institute.
Collapse
Affiliation(s)
- Congyu Fang
- Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada
| | - Adam Dziedzic
- Vector Institute, Toronto, Canada; CISPA Helmholtz Center for Information Security, Germany; Department of Electrical and Computer Engineering, University of Toronto, Canada
| | - Lin Zhang
- Peter Munk Cardiac Centre, University Health Network, Canada; Simon Fraser University, Canada
| | - Laura Oliva
- Peter Munk Cardiac Centre, University Health Network, Canada
| | - Amol Verma
- St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada
| | - Fahad Razak
- St. Michael's Hospital, Unity Health Toronto, Canada; Department of Medicine, University of Toronto, Canada; Institute of Health Policy, Management and Evaluation, University of Toronto, Canada
| | - Nicolas Papernot
- Department of Computer Science, University of Toronto, Canada; Vector Institute, Toronto, Canada; Department of Electrical and Computer Engineering, University of Toronto, Canada.
| | - Bo Wang
- Department of Computer Science, University of Toronto, Canada; Peter Munk Cardiac Centre, University Health Network, Canada; Vector Institute, Toronto, Canada; Department of Laboratory Medicine and Pathobiology, Temerty Faculty of Medicine, University of Toronto, Canada.
| |
Collapse
|
28
|
Ali M, Yang T, He H, Zhang Y. Plant biotechnology research with single-cell transcriptome: recent advancements and prospects. PLANT CELL REPORTS 2024; 43:75. [PMID: 38381195 DOI: 10.1007/s00299-024-03168-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024]
Abstract
KEY MESSAGE Single-cell transcriptomic techniques have emerged as powerful tools in plant biology, offering high-resolution insights into gene expression at the individual cell level. This review highlights the rapid expansion of single-cell technologies in plants, their potential in understanding plant development, and their role in advancing plant biotechnology research. Single-cell techniques have emerged as powerful tools to enhance our understanding of biological systems, providing high-resolution transcriptomic analysis at the single-cell level. In plant biology, the adoption of single-cell transcriptomics has seen rapid expansion of available technologies and applications. This review article focuses on the latest advancements in the field of single-cell transcriptomic in plants and discusses the potential role of these approaches in plant development and expediting plant biotechnology research in the near future. Furthermore, inherent challenges and limitations of single-cell technology are critically examined to overcome them and enhance our knowledge and understanding.
Collapse
Affiliation(s)
- Muhammad Ali
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- Peking University-Institute of Advanced Agricultural Sciences, Weifang, China
| | - Tianxia Yang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing, China
| | - Hai He
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
| | - Yu Zhang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China.
| |
Collapse
|
29
|
Sun H, Qu H, Duan K, Du W. scMGCN: A Multi-View Graph Convolutional Network for Cell Type Identification in scRNA-seq Data. Int J Mol Sci 2024; 25:2234. [PMID: 38396909 PMCID: PMC10889820 DOI: 10.3390/ijms25042234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/07/2024] [Accepted: 02/09/2024] [Indexed: 02/25/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) data reveal the complexity and diversity of cellular ecosystems and molecular interactions in various biomedical research. Hence, identifying cell types from large-scale scRNA-seq data using existing annotations is challenging and requires stable and interpretable methods. However, the current cell type identification methods have limited performance, mainly due to the intrinsic heterogeneity among cell populations and extrinsic differences between datasets. Here, we present a robust graph artificial intelligence model, a multi-view graph convolutional network model (scMGCN) that integrates multiple graph structures from raw scRNA-seq data and applies graph convolutional networks with attention mechanisms to learn cell embeddings and predict cell labels. We evaluate our model on single-dataset, cross-species, and cross-platform experiments and compare it with other state-of-the-art methods. Our results show that scMGCN outperforms the other methods regarding stability, accuracy, and robustness to batch effects. Our main contributions are as follows: Firstly, we introduce multi-view learning and multiple graph construction methods to capture comprehensive cellular information from scRNA-seq data. Secondly, we construct a scMGCN that combines graph convolutional networks with attention mechanisms to extract shared, high-order information from cells. Finally, we demonstrate the effectiveness and superiority of the scMGCN on various datasets.
Collapse
Affiliation(s)
| | | | | | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (H.S.); (H.Q.); (K.D.)
| |
Collapse
|
30
|
Majd H, Cesiulis A, Samuel RM, Richter MN, Elder N, Guyer RA, Hao MM, Stamp LA, Goldstein AM, Fattahi F. A call for a unified and multimodal definition of cellular identity in the enteric nervous system. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.15.575794. [PMID: 38293133 PMCID: PMC10827084 DOI: 10.1101/2024.01.15.575794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
The enteric nervous system (ENS) is a tantalizing frontier in neuroscience. With the recent emergence of single cell transcriptomic technologies, this rare and poorly understood tissue has begun to be better characterized in recent years. A precise functional mapping of enteric neuron diversity is critical for understanding ENS biology and enteric neuropathies. Nonetheless, this pursuit has faced considerable technical challenges. By leveraging different methods to compare available primary mouse and human ENS datasets, we underscore the urgent need for careful identity annotation, achieved through the harmonization and advancements of wet lab and computational techniques. We took different approaches including differential gene expression, module scoring, co-expression and correlation analysis, unbiased biological function hierarchical clustering, data integration and label transfer to compare and contrast functional annotations of several independently reported ENS datasets. These analyses highlight substantial discrepancies stemming from an overreliance on transcriptomics data without adequate validation in tissues. To achieve a comprehensive understanding of enteric neuron identity and their functional context, it is imperative to expand tissue sources and incorporate innovative technologies such as multiplexed imaging, electrophysiology, spatial transcriptomics, as well as comprehensive profiling of epigenome, proteome, and metabolome. Harnessing human pluripotent stem cell (hPSC) models provides unique opportunities for delineating lineage trees of the human ENS, and offers unparalleled advantages, including their scalability and compatibility with genetic manipulation and unbiased screens. We encourage a paradigm shift in our comprehension of cellular complexity and function in the ENS by calling for large-scale collaborative efforts and research investments.
Collapse
Affiliation(s)
- Homa Majd
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Andrius Cesiulis
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Ryan M Samuel
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Mikayla N Richter
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Nicholas Elder
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
| | - Richard A Guyer
- Department of Pediatric Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Marlene M. Hao
- Department of Anatomy and Physiology, the University of Melbourne, Parkville, VIC, Australia
| | - Lincon A. Stamp
- Department of Anatomy and Physiology, the University of Melbourne, Parkville, VIC, Australia
| | - Allan M Goldstein
- Department of Pediatric Surgery, Massachusetts General Hospital, Boston, MA, USA
| | - Faranak Fattahi
- Department of Cellular and Molecular Pharmacology, University of California, San Francisco, San Francisco, CA 94158, USA
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, 94143, USA
- Program in Craniofacial Biology, University of California, San Francisco, California, USA
- Lead contact
| |
Collapse
|
31
|
Geuenich MJ, Gong DW, Campbell KR. The impacts of active and self-supervised learning on efficient annotation of single-cell expression data. Nat Commun 2024; 15:1014. [PMID: 38307875 PMCID: PMC10837127 DOI: 10.1038/s41467-024-45198-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 01/16/2024] [Indexed: 02/04/2024] Open
Abstract
A crucial step in the analysis of single-cell data is annotating cells to cell types and states. While a myriad of approaches has been proposed, manual labeling of cells to create training datasets remains tedious and time-consuming. In the field of machine learning, active and self-supervised learning methods have been proposed to improve the performance of a classifier while reducing both annotation time and label budget. However, the benefits of such strategies for single-cell annotation have yet to be evaluated in realistic settings. Here, we perform a comprehensive benchmarking of active and self-supervised labeling strategies across a range of single-cell technologies and cell type annotation algorithms. We quantify the benefits of active learning and self-supervised strategies in the presence of cell type imbalance and variable similarity. We introduce adaptive reweighting, a heuristic procedure tailored to single-cell data-including a marker-aware version-that shows competitive performance with existing approaches. In addition, we demonstrate that having prior knowledge of cell type markers improves annotation accuracy. Finally, we summarize our findings into a set of recommendations for those implementing cell type annotation procedures or platforms. An R package implementing the heuristic approaches introduced in this work may be found at https://github.com/camlab-bioml/leader .
Collapse
Affiliation(s)
- Michael J Geuenich
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
| | - Dae-Won Gong
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada
| | - Kieran R Campbell
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, ON, M5G 1×5, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A8, Canada.
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5S 3G3, Canada.
- Department of Computer Science, University of Toronto, Toronto, ON, M5T 3A1, Canada.
- Ontario Institute of Cancer Research, Toronto, ON, M5G 1M1, Canada.
- Vector Institute, Toronto, ON, M5G 1M1, Canada.
| |
Collapse
|
32
|
Wang X, Chai Z, Li S, Liu Y, Li C, Jiang Y, Liu Q. CTISL: a dynamic stacking multi-class classification approach for identifying cell types from single-cell RNA-seq data. Bioinformatics 2024; 40:btae063. [PMID: 38317054 PMCID: PMC10873586 DOI: 10.1093/bioinformatics/btae063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 02/07/2024] Open
Abstract
MOTIVATION Effective identification of cell types is of critical importance in single-cell RNA-sequencing (scRNA-seq) data analysis. To date, many supervised machine learning-based predictors have been implemented to identify cell types from scRNA-seq datasets. Despite the technical advances of these state-of-the-art tools, most existing predictors were single classifiers, of which the performances can still be significantly improved. It is therefore highly desirable to employ the ensemble learning strategy to develop more accurate computational models for robust and comprehensive identification of cell types on scRNA-seq datasets. RESULTS We propose a two-layer stacking model, termed CTISL (Cell Type Identification by Stacking ensemble Learning), which integrates multiple classifiers to identify cell types. In the first layer, given a reference scRNA-seq dataset with known cell types, CTISL dynamically combines multiple cell-type-specific classifiers (i.e. support-vector machine and logistic regression) as the base learners to deliver the outcomes for the input of a meta-classifier in the second layer. We conducted a total of 24 benchmarking experiments on 17 human and mouse scRNA-seq datasets to evaluate and compare the prediction performance of CTISL and other state-of-the-art predictors. The experiment results demonstrate that CTISL achieves superior or competitive performance compared to these state-of-the-art approaches. We anticipate that CTISL can serve as a useful and reliable tool for cost-effective identification of cell types from scRNA-seq datasets. AVAILABILITY AND IMPLEMENTATION The webserver and source code are freely available at http://bigdata.biocie.cn/CTISLweb/home and https://zenodo.org/records/10568906, respectively.
Collapse
Affiliation(s)
- Xiao Wang
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Ziyi Chai
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Shaohua Li
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
| | - Yan Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Chen Li
- Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yu Jiang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China
| | - Quanzhong Liu
- Department of Software Engineering, College of Information Engineering, Northwest A&F University, Yangling 712100, China
- Shaanxi Engineering Research Center of Agricultural Information Intelligent Perception and Analysis, Northwest A&F University, Yangling 712100, China
| |
Collapse
|
33
|
Xian W, Asad M, Wu S, Bai Z, Li F, Lu J, Zu G, Brintnell E, Chen H, Mao Y, Zhou G, Liao B, Wu J, Wang E, You L. Distinct immune escape and microenvironment between RG-like and pri-OPC-like glioma revealed by single-cell RNA-seq analysis. Front Med 2024; 18:147-168. [PMID: 37955814 DOI: 10.1007/s11684-023-1017-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 06/24/2023] [Indexed: 11/14/2023]
Abstract
The association of neurogenesis and gliogenesis with glioma remains unclear. By conducting single-cell RNA-seq analyses on 26 gliomas, we reported their classification into primitive oligodendrocyte precursor cell (pri-OPC)-like and radial glia (RG)-like tumors and validated it in a public cohort and TCGA glioma. The RG-like tumors exhibited wild-type isocitrate dehydrogenase and tended to carry EGFR mutations, and the pri-OPC-like ones were prone to carrying TP53 mutations. Tumor subclones only in pri-OPC-like tumors showed substantially down-regulated MHC-I genes, suggesting their distinct immune evasion programs. Furthermore, the two subgroups appeared to extensively modulate glioma-infiltrating lymphocytes in distinct manners. Some specific genes not expressed in normal immune cells were found in glioma-infiltrating lymphocytes. For example, glial/glioma stem cell markers OLIG1/PTPRZ1 and B cell-specific receptors IGLC2/IGKC were expressed in pri-OPC-like and RG-like glioma-infiltrating lymphocytes, respectively. Their expression was positively correlated with those of immune checkpoint genes (e.g., LGALS33) and poor survivals as validated by the increased expression of LGALS3 upon IGKC overexpression in Jurkat cells. This finding indicated a potential inhibitory role in tumor-infiltrating lymphocytes and could provide a new way of cancer immune evasion.
Collapse
Affiliation(s)
- Weiwei Xian
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Mohammad Asad
- Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
| | - Shuai Wu
- Glioma Surgery Division, Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Zhixin Bai
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Fengjiao Li
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Junfeng Lu
- Glioma Surgery Division, Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Gaoyu Zu
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
| | - Erin Brintnell
- Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada
| | - Hong Chen
- Department of Pathology, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Ying Mao
- Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, 200040, China
| | - Guomin Zhou
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China
- Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Fudan University, Shanghai, 200040, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, 570100, China
| | - Jinsong Wu
- Glioma Surgery Division, Department of Neurosurgery, Huashan Hospital, Fudan University, Shanghai, 200040, China.
| | - Edwin Wang
- Cumming School of Medicine, University of Calgary, Calgary, Alberta, T2N 4N1, Canada.
| | - Linya You
- Department of Human Anatomy & Histoembryology, School of Basic Medical Sciences, Fudan University, Shanghai, 200032, China.
- Shanghai Key Laboratory of Medical Imaging Computing and Computer Assisted Intervention, Fudan University, Shanghai, 200040, China.
| |
Collapse
|
34
|
Mihai IS, Chafle S, Henriksson J. Representing and extracting knowledge from single-cell data. Biophys Rev 2024; 16:29-56. [PMID: 38495441 PMCID: PMC10937862 DOI: 10.1007/s12551-023-01091-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 06/28/2023] [Indexed: 03/19/2024] Open
Abstract
Single-cell analysis is currently one of the most high-resolution techniques to study biology. The large complex datasets that have been generated have spurred numerous developments in computational biology, in particular the use of advanced statistics and machine learning. This review attempts to explain the deeper theoretical concepts that underpin current state-of-the-art analysis methods. Single-cell analysis is covered from cell, through instruments, to current and upcoming models. The aim of this review is to spread concepts which are not yet in common use, especially from topology and generative processes, and how new statistical models can be developed to capture more of biology. This opens epistemological questions regarding our ontology and models, and some pointers will be given to how natural language processing (NLP) may help overcome our cognitive limitations for understanding single-cell data.
Collapse
Affiliation(s)
- Ionut Sebastian Mihai
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
- Industrial Doctoral School, Umeå University, Umeå, Sweden
| | - Sarang Chafle
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| | - Johan Henriksson
- The Laboratory for Molecular Infection Medicine Sweden (MIMS), Umeå, Sweden
- Umeå Centre for Microbial Research (UCMR), Department of Molecular Biology, Umeå University, Umeå, Sweden
| |
Collapse
|
35
|
Li J, Choi J, Cheng X, Ma J, Pema S, Sanes JR, Mardon G, Frankfort BJ, Tran NM, Li Y, Chen R. Comprehensive single-cell atlas of the mouse retina. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.24.577060. [PMID: 38328114 PMCID: PMC10849744 DOI: 10.1101/2024.01.24.577060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has advanced our understanding of cellular heterogeneity at the single-cell resolution by classifying and characterizing cell types in multiple tissues and species. While several mouse retinal scRNA-seq reference datasets have been published, each dataset either has a relatively small number of cells or is focused on specific cell classes, and thus is suboptimal for assessing gene expression patterns across all retina types at the same time. To establish a unified and comprehensive reference for the mouse retina, we first generated the largest retinal scRNA-seq dataset to date, comprising approximately 190,000 single cells from C57BL/6J mouse whole retinas. This dataset was generated through the targeted enrichment of rare population cells via antibody-based magnetic cell sorting. By integrating this new dataset with public datasets, we conducted an integrated analysis to construct the Mouse Retina Cell Atlas (MRCA) for wild-type mice, which encompasses over 330,000 single cells. The MRCA characterizes 12 major classes and 138 cell types. It captured consensus cell type characterization from public datasets and identified additional new cell types. To facilitate the public use of the MRCA, we have deposited it in CELLxGENE, UCSC Cell Browser, and the Broad Single Cell Portal for visualization and gene expression exploration. The comprehensive MRCA serves as an easy-to-use, one-stop data resource for the mouse retina communities.
Collapse
Affiliation(s)
- Jin Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Jongsu Choi
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Xuesen Cheng
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Justin Ma
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Shahil Pema
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Joshua R. Sanes
- Center for Brain Science and Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02130, USA
| | - Graeme Mardon
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Pathology and Immunology, Baylor College of Medicine, Houston, Texas 77030, USA
- Departments of Ophthalmology and Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Benjamin J. Frankfort
- Departments of Ophthalmology and Neuroscience, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Nicholas M. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Yumei Li
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| |
Collapse
|
36
|
Zhai Y, Chen L, Deng M. scEVOLVE: cell-type incremental annotation without forgetting for single-cell RNA-seq data. Brief Bioinform 2024; 25:bbae039. [PMID: 38366803 PMCID: PMC10939389 DOI: 10.1093/bib/bbae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 01/03/2024] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
The evolution in single-cell RNA sequencing (scRNA-seq) technology has opened a new avenue for researchers to inspect cellular heterogeneity with single-cell precision. One crucial aspect of this technology is cell-type annotation, which is fundamental for any subsequent analysis in single-cell data mining. Recently, the scientific community has seen a surge in the development of automatic annotation methods aimed at this task. However, these methods generally operate at a steady-state total cell-type capacity, significantly restricting the cell annotation systems'capacity for continuous knowledge acquisition. Furthermore, creating a unified scRNA-seq annotation system remains challenged by the need to progressively expand its understanding of ever-increasing cell-type concepts derived from a continuous data stream. In response to these challenges, this paper presents a novel and challenging setting for annotation, namely cell-type incremental annotation. This concept is designed to perpetually enhance cell-type knowledge, gleaned from continuously incoming data. This task encounters difficulty with data stream samples that can only be observed once, leading to catastrophic forgetting. To address this problem, we introduce our breakthrough methodology termed scEVOLVE, an incremental annotation method. This innovative approach is built upon the methodology of contrastive sample replay combined with the fundamental principle of partition confidence maximization. Specifically, we initially retain and replay sections of the old data in each subsequent training phase, then establish a unique prototypical learning objective to mitigate the cell-type imbalance problem, as an alternative to using cross-entropy. To effectively emulate a model that trains concurrently with complete data, we introduce a cell-type decorrelation strategy that efficiently scatters feature representations of each cell type uniformly. We constructed the scEVOLVE framework with simplicity and ease of integration into most deep softmax-based single-cell annotation methods. Thorough experiments conducted on a range of meticulously constructed benchmarks consistently prove that our methodology can incrementally learn numerous cell types over an extended period, outperforming other strategies that fail quickly. As far as our knowledge extends, this is the first attempt to propose and formulate an end-to-end algorithm framework to address this new, practical task. Additionally, scEVOLVE, coded in Python using the Pytorch machine-learning library, is freely accessible at https://github.com/aimeeyaoyao/scEVOLVE.
Collapse
Affiliation(s)
- Yuyao Zhai
- School of Mathematical Sciences, Peking University, Beijing, China
| | - Liang Chen
- Huawei Technologies Co., Ltd., Beijing, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, Beijing, China
- Center for Statistical Science, Peking University, Beijing, China
- Center for Quantitative Biology, Peking University, Beijing, China
| |
Collapse
|
37
|
Mason K, Sathe A, Hess PR, Rong J, Wu CY, Furth E, Susztak K, Levinsohn J, Ji HP, Zhang N. Niche-DE: niche-differential gene expression analysis in spatial transcriptomics data identifies context-dependent cell-cell interactions. Genome Biol 2024; 25:14. [PMID: 38217002 PMCID: PMC10785550 DOI: 10.1186/s13059-023-03159-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 12/22/2023] [Indexed: 01/14/2024] Open
Abstract
Existing methods for analysis of spatial transcriptomic data focus on delineating the global gene expression variations of cell types across the tissue, rather than local gene expression changes driven by cell-cell interactions. We propose a new statistical procedure called niche-differential expression (niche-DE) analysis that identifies cell-type-specific niche-associated genes, which are differentially expressed within a specific cell type in the context of specific spatial niches. We further develop niche-LR, a method to reveal ligand-receptor signaling mechanisms that underlie niche-differential gene expression patterns. Niche-DE and niche-LR are applicable to low-resolution spot-based spatial transcriptomics data and data that is single-cell or subcellular in resolution.
Collapse
Affiliation(s)
- Kaishu Mason
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, USA
| | - Anuja Sathe
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Paul R Hess
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, USA
| | - Jiazhen Rong
- Genomics and Computational Biology Graduate Program, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Chi-Yun Wu
- The Gladstone Institute, San Francisco, USA
| | - Emma Furth
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Katalin Susztak
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jonathan Levinsohn
- Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Nancy Zhang
- Department of Statistics and Data Science, The Wharton School, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
38
|
Jiang Y, Hu Z, Lynch AW, Jiang J, Zhu A, Zhang Y, Xie Y, Li R, Zhou N, Meyer CA, Cejas P, Brown M, Long HW, Qiu X. scATAnno: Automated Cell Type Annotation for single-cell ATAC Sequencing Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.06.01.543296. [PMID: 37333088 PMCID: PMC10274707 DOI: 10.1101/2023.06.01.543296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
The recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key task is to determine cell types based on epigenetic profiling. We introduce scATAnno, a workflow designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow can generate scATAC-seq reference atlases from publicly available datasets, and enable accurate cell type annotation by integrating query data with reference atlases, without the aid of scRNA-seq profiling. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect unknown cell populations within the query data. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), basal cell carcinoma (BCC) and Triple Negative Breast Cancer (TNBC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a powerful tool for cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems.
Collapse
|
39
|
Zhang Y, Sun H, Zhang W, Fu T, Huang S, Mou M, Zhang J, Gao J, Ge Y, Yang Q, Zhu F. CellSTAR: a comprehensive resource for single-cell transcriptomic annotation. Nucleic Acids Res 2024; 52:D859-D870. [PMID: 37855686 PMCID: PMC10767908 DOI: 10.1093/nar/gkad874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/12/2023] [Accepted: 09/27/2023] [Indexed: 10/20/2023] Open
Abstract
Large-scale studies of single-cell sequencing and biological experiments have successfully revealed expression patterns that distinguish different cell types in tissues, emphasizing the importance of studying cellular heterogeneity and accurately annotating cell types. Analysis of gene expression profiles in these experiments provides two essential types of data for cell type annotation: annotated references and canonical markers. In this study, the first comprehensive database of single-cell transcriptomic annotation resource (CellSTAR) was thus developed. It is unique in (a) offering the comprehensive expertly annotated reference data for annotating hundreds of cell types for the first time and (b) enabling the collective consideration of reference data and marker genes by incorporating tens of thousands of markers. Given its unique features, CellSTAR is expected to attract broad research interests from the technological innovations in single-cell transcriptomics, the studies of cellular heterogeneity & dynamics, and so on. It is now publicly accessible without any login requirement at: https://idrblab.org/cellstar.
Collapse
Affiliation(s)
- Ying Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Huaicheng Sun
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Tingting Fu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Shijie Huang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jinsong Zhang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jianqing Gao
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yichao Ge
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou 310058, China
- Department of Bioinformatics, School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
40
|
Møller AF, Madsen JGS. JOINTLY: interpretable joint clustering of single-cell transcriptomes. Nat Commun 2023; 14:8473. [PMID: 38123569 PMCID: PMC10733431 DOI: 10.1038/s41467-023-44279-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023] Open
Abstract
Single-cell and single-nucleus RNA-sequencing (sxRNA-seq) is increasingly being used to characterise the transcriptomic state of cell types at homeostasis, during development and in disease. However, this is a challenging task, as biological effects can be masked by technical variation. Here, we present JOINTLY, an algorithm enabling joint clustering of sxRNA-seq datasets across batches. JOINTLY performs on par or better than state-of-the-art batch integration methods in clustering tasks and outperforms other intrinsically interpretable methods. We demonstrate that JOINTLY is robust against over-correction while retaining subtle cell state differences between biological conditions and highlight how the interpretation of JOINTLY can be used to annotate cell types and identify active signalling programs across cell types and pseudo-time. Finally, we use JOINTLY to construct a reference atlas of white adipose tissue (WATLAS), an expandable and comprehensive community resource, in which we describe four adipocyte subpopulations and map compositional changes in obesity and between depots.
Collapse
Affiliation(s)
- Andreas Fønss Møller
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark
- Sino-Danish College (SDC), University of Chinese Academy of Sciences, Beijing, China
| | - Jesper Grud Skat Madsen
- Institute of Biochemistry and Molecular Biology, University of Southern, Odense, Denmark.
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
- Center for Functional Genomics and Tissue Plasticity (ATLAS), Odense M, 5230, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
| |
Collapse
|
41
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
42
|
Ghaddar B, De S. Hierarchical and automated cell-type annotation and inference of cancer cell of origin with Census. Bioinformatics 2023; 39:btad714. [PMID: 38011649 PMCID: PMC10713118 DOI: 10.1093/bioinformatics/btad714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 10/26/2023] [Accepted: 11/25/2023] [Indexed: 11/29/2023] Open
Abstract
MOTIVATION Cell-type annotation is a time-consuming yet critical first step in the analysis of single-cell RNA-seq data, especially when multiple similar cell subtypes with overlapping marker genes are present. Existing automated annotation methods have a number of limitations, including requiring large reference datasets, high computation time, shallow annotation resolution, and difficulty in identifying cancer cells or their most likely cell of origin. RESULTS We developed Census, a biologically intuitive and fully automated cell-type identification method for single-cell RNA-seq data that can deeply annotate normal cells in mammalian tissues and identify malignant cells and their likely cell of origin. Motivated by the inherently stratified developmental programs of cellular differentiation, Census infers hierarchical cell-type relationships and uses gradient-boosted \decision trees that capitalize on nodal cell-type relationships to achieve high prediction speed and accuracy. When benchmarked on 44 atlas-scale normal and cancer, human and mouse tissues, Census significantly outperforms state-of-the-art methods across multiple metrics and naturally predicts the cell-of-origin of different cancers. Census is pretrained on the Tabula Sapiens to classify 175 cell-types from 24 organs; however, users can seamlessly train their own models for customized applications. AVAILABILITY AND IMPLEMENTATION Census is available at Zenodo https://zenodo.org/records/7017103 and on our Github https://github.com/sjdlabgroup/Census.
Collapse
Affiliation(s)
- Bassel Ghaddar
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| | - Subhajyoti De
- Center for Systems and Computational Biology, Rutgers Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ 08901, United States
| |
Collapse
|
43
|
Dezem FS, Marção M, Ben-Cheikh B, Nikulina N, Omotoso A, Burnett D, Coelho P, Hurley J, Gomez C, Phan-Everson T, Ong G, Martelotto L, Lewis ZR, George S, Braubach O, Malta TM, Plummer J. A machine learning one-class logistic regression model to predict stemness for single cell transcriptomics and spatial omics. BMC Genomics 2023; 24:717. [PMID: 38017371 PMCID: PMC10683105 DOI: 10.1186/s12864-023-09722-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/07/2023] [Indexed: 11/30/2023] Open
Abstract
Cell annotation is a crucial methodological component to interpreting single cell and spatial omics data. These approaches were developed for single cell analysis but are often biased, manually curated and yet unproven in spatial omics. Here we apply a stemness model for assessing oncogenic states to single cell and spatial omic cancer datasets. This one-class logistic regression machine learning algorithm is used to extract transcriptomic features from non-transformed stem cells to identify dedifferentiated cell states in tumors. We found this method identifies single cell states in metastatic tumor cell populations without the requirement of cell annotation. This machine learning model identified stem-like cell populations not identified in single cell or spatial transcriptomic analysis using existing methods. For the first time, we demonstrate the application of a ML tool across five emerging spatial transcriptomic and proteomic technologies to identify oncogenic stem-like cell types in the tumor microenvironment.
Collapse
Affiliation(s)
- Felipe Segato Dezem
- Center for Spatial Omics, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Maycon Marção
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Bassem Ben-Cheikh
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Nadya Nikulina
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Ayodele Omotoso
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Destiny Burnett
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Priscila Coelho
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Judith Hurley
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Carmen Gomez
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | | | - Giang Ong
- Nanostring Technologies, Seattle, WA, USA
| | | | | | - Sophia George
- Department of Obstetrics, Gynecology and Reproductive Sciences, University of Miami Miller School of Medicine, Miami, FL, USA
- Sylvester Comprehensive Cancer Center, UHealth Medical Systems, Miami, FL, USA
| | - Oliver Braubach
- Akoya Biosciences, The Spatial Biology Company, Marlborough, MA, USA
| | - Tathiane M Malta
- Department of Clinical Analysis, Toxicology and Food Sciences, School of Pharmaceutical Sciences of Ribeirao Preto, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Jasmine Plummer
- Center for Spatial Omics, St Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Developmental Neurobiology, St Jude Children's Research Hospital, Memphis, TN, USA.
- Department of Cellular & Molecular Biology, St Jude Children's Research Hospital, Memphis, TN, USA.
- Comprehensive Cancer Center, St Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
44
|
Yin Q, Chen L. CellTICS: an explainable neural network for cell-type identification and interpretation based on single-cell RNA-seq data. Brief Bioinform 2023; 25:bbad449. [PMID: 38061196 PMCID: PMC10703497 DOI: 10.1093/bib/bbad449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 10/30/2023] [Accepted: 11/14/2023] [Indexed: 12/18/2023] Open
Abstract
Identifying cell types is crucial for understanding the functional units of an organism. Machine learning has shown promising performance in identifying cell types, but many existing methods lack biological significance due to poor interpretability. However, it is of the utmost importance to understand what makes cells share the same function and form a specific cell type, motivating us to propose a biologically interpretable method. CellTICS prioritizes marker genes with cell-type-specific expression, using a hierarchy of biological pathways for neural network construction, and applying a multi-predictive-layer strategy to predict cell and sub-cell types. CellTICS usually outperforms existing methods in prediction accuracy. Moreover, CellTICS can reveal pathways that define a cell type or a cell type under specific physiological conditions, such as disease or aging. The nonlinear nature of neural networks enables us to identify many novel pathways. Interestingly, some of the pathways identified by CellTICS exhibit differential expression "variability" rather than differential expression across cell types, indicating that expression stochasticity within a pathway could be an important feature characteristic of a cell type. Overall, CellTICS provides a biologically interpretable method for identifying and characterizing cell types, shedding light on the underlying pathways that define cellular heterogeneity and its role in organismal function. CellTICS is available at https://github.com/qyyin0516/CellTICS.
Collapse
Affiliation(s)
- Qingyang Yin
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| | - Liang Chen
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089, United States
| |
Collapse
|
45
|
Wang S, Shen B, Guo L, Shang M, Liu J, Sun Q, Shen B. scFed: federated learning for cell type classification with scRNA-seq. Brief Bioinform 2023; 25:bbad507. [PMID: 38221903 PMCID: PMC10788680 DOI: 10.1093/bib/bbad507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 12/03/2023] [Accepted: 12/12/2023] [Indexed: 01/16/2024] Open
Abstract
The advent of single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular heterogeneity and complexity in biological tissues. However, the nature of large, sparse scRNA-seq datasets and privacy regulations present challenges for efficient cell identification. Federated learning provides a solution, allowing efficient and private data use. Here, we introduce scFed, a unified federated learning framework that allows for benchmarking of four classification algorithms without violating data privacy, including single-cell-specific and general-purpose classifiers. We evaluated scFed using eight publicly available scRNA-seq datasets with diverse sizes, species and technologies, assessing its performance via intra-dataset and inter-dataset experimental setups. We find that scFed performs well on a variety of datasets with competitive accuracy to centralized models. Though Transformer-based model excels in centralized training, its performance slightly lags behind single-cell-specific model within the scFed framework, coupled with a notable time complexity concern. Our study not only helps select suitable cell identification methods but also highlights federated learning's potential for privacy-preserving, collaborative biomedical research.
Collapse
Affiliation(s)
- Shuang Wang
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, 610212, Chengdu, China
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, 310053, Hangzhou, China
| | - Bochen Shen
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, 310053, Hangzhou, China
| | - Lanting Guo
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, 310053, Hangzhou, China
| | - Mengqi Shang
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, 310053, Hangzhou, China
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, 23298, Richmond, VA, USA
| | - Qi Sun
- Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, 310053, Hangzhou, China
| | - Bairong Shen
- Joint Laboratory of Artificial Intelligence for Critical Care Medicine, Department of Critical Care Medicine and Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, 610212, Chengdu, China
| |
Collapse
|
46
|
Fawaz A, Ferraresi A, Isidoro C. Systems Biology in Cancer Diagnosis Integrating Omics Technologies and Artificial Intelligence to Support Physician Decision Making. J Pers Med 2023; 13:1590. [PMID: 38003905 PMCID: PMC10672164 DOI: 10.3390/jpm13111590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 11/07/2023] [Accepted: 11/08/2023] [Indexed: 11/26/2023] Open
Abstract
Cancer is the second major cause of disease-related death worldwide, and its accurate early diagnosis and therapeutic intervention are fundamental for saving the patient's life. Cancer, as a complex and heterogeneous disorder, results from the disruption and alteration of a wide variety of biological entities, including genes, proteins, mRNAs, miRNAs, and metabolites, that eventually emerge as clinical symptoms. Traditionally, diagnosis is based on clinical examination, blood tests for biomarkers, the histopathology of a biopsy, and imaging (MRI, CT, PET, and US). Additionally, omics biotechnologies help to further characterize the genome, metabolome, microbiome traits of the patient that could have an impact on the prognosis and patient's response to the therapy. The integration of all these data relies on gathering of several experts and may require considerable time, and, unfortunately, it is not without the risk of error in the interpretation and therefore in the decision. Systems biology algorithms exploit Artificial Intelligence (AI) combined with omics technologies to perform a rapid and accurate analysis and integration of patient's big data, and support the physician in making diagnosis and tailoring the most appropriate therapeutic intervention. However, AI is not free from possible diagnostic and prognostic errors in the interpretation of images or biochemical-clinical data. Here, we first describe the methods used by systems biology for combining AI with omics and then discuss the potential, challenges, limitations, and critical issues in using AI in cancer research.
Collapse
Affiliation(s)
| | | | - Ciro Isidoro
- Laboratory of Molecular Pathology, Department of Health Sciences, Università del Piemonte Orientale, 28100 Novara, Italy; (A.F.); (A.F.)
| |
Collapse
|
47
|
Quan F, Liang X, Cheng M, Yang H, Liu K, He S, Sun S, Deng M, He Y, Liu W, Wang S, Zhao S, Deng L, Hou X, Zhang X, Xiao Y. Annotation of cell types (ACT): a convenient web server for cell type annotation. Genome Med 2023; 15:91. [PMID: 37924118 PMCID: PMC10623726 DOI: 10.1186/s13073-023-01249-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/18/2023] [Indexed: 11/06/2023] Open
Abstract
BACKGROUND The advancement of single-cell sequencing has progressed our ability to solve biological questions. Cell type annotation is of vital importance to this process, allowing for the analysis and interpretation of enormous single-cell datasets. At present, however, manual cell annotation which is the predominant approach remains limited by both speed and the requirement of expert knowledge. METHODS To address these challenges, we constructed a hierarchically organized marker map through manually curating over 26,000 cell marker entries from about 7000 publications. We then developed WISE, a weighted and integrated gene set enrichment method, to integrate the prevalence of canonical markers and ordered differentially expressed genes of specific cell types in the marker map. Benchmarking analysis suggested that our method outperformed state-of-the-art methods. RESULTS By integrating the marker map and WISE, we developed a user-friendly and convenient web server, ACT ( http://xteam.xbio.top/ACT/ or http://biocc.hrbmu.edu.cn/ACT/ ), which only takes a simple list of upregulated genes as input and provides interactive hierarchy maps, together with well-designed charts and statistical information, to accelerate the assignment of cell identities and made the results comparable to expert manual annotation. Besides, a pan-tissue marker map was constructed to assist in cell assignments in less-studied tissues. Applying ACT to three case studies showed that all cell clusters were quickly and accurately annotated, and multi-level and more refined cell types were identified. CONCLUSIONS We developed a knowledge-based resource and a corresponding method, together with an intuitive graphical web interface, for cell type annotation. We believe that ACT, emerging as a powerful tool for cell type annotation, would be widely used in single-cell research and considerably accelerate the process of cell type identification.
Collapse
Affiliation(s)
- Fei Quan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xin Liang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Mingjiang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Huan Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Kun Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shengyuan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shangqin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Menglan Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Yanzhen He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Wei Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shuai Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Shuxiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Lantian Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xiaobo Hou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China
| | - Xinxin Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| | - Yun Xiao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150086, China.
| |
Collapse
|
48
|
Yampolskaya M, Herriges MJ, Ikonomou L, Kotton DN, Mehta P. scTOP: physics-inspired order parameters for cellular identification and visualization. Development 2023; 150:dev201873. [PMID: 37756586 PMCID: PMC10629677 DOI: 10.1242/dev.201873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 09/11/2023] [Indexed: 09/29/2023]
Abstract
Advances in single-cell RNA sequencing provide an unprecedented window into cellular identity. The abundance of data requires new theoretical and computational frameworks to analyze the dynamics of differentiation and integrate knowledge from cell atlases. We present 'single-cell Type Order Parameters' (scTOP): a statistical, physics-inspired approach for quantifying cell identity given a reference basis of cell types. scTOP can accurately classify cells, visualize developmental trajectories and assess the fidelity of engineered cells. Importantly, scTOP does this without feature selection, statistical fitting or dimensional reduction (e.g. uniform manifold approximation and projection, principle components analysis, etc.). We illustrate the power of scTOP using human and mouse datasets. By reanalyzing mouse lung data, we characterize a transient hybrid alveolar type 1/alveolar type 2 cell population. Visualizations of lineage tracing hematopoiesis data using scTOP confirm that a single clone can give rise to multiple mature cell types. We assess the transcriptional similarity between endogenous and donor-derived cells in the context of murine pulmonary cell transplantation. Our results suggest that physics-inspired order parameters can be an important tool for understanding differentiation and characterizing engineered cells. scTOP is available as an easy-to-use Python package.
Collapse
Affiliation(s)
| | - Michael J. Herriges
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Laertis Ikonomou
- Department of Oral Biology, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
- Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, University at Buffalo, The State University of New York, Buffalo, NY 14215, USA
| | - Darrell N. Kotton
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- The Pulmonary Center and Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
| | - Pankaj Mehta
- Department of Physics, Boston University, Boston, MA 02215, USA
- Center for Regenerative Medicine of Boston University and Boston Medical Center, Boston, MA 02118, USA
- Faculty of Computing and Data Science, Boston University, Boston, MA 02215, USA
- Biological Design Center, Boston University, Boston, MA 02215, USA
| |
Collapse
|
49
|
Yang L, Ng YE, Sun H, Li Y, Chini LCS, LeBrasseur NK, Chen J, Zhang X. Single-cell Mayo Map (scMayoMap): an easy-to-use tool for cell type annotation in single-cell RNA-sequencing data analysis. BMC Biol 2023; 21:223. [PMID: 37858214 PMCID: PMC10588107 DOI: 10.1186/s12915-023-01728-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) has become a widely used tool for both basic and translational biomedical research. In scRNA-seq data analysis, cell type annotation is an essential but challenging step. In the past few years, several annotation tools have been developed. These methods require either labeled training/reference datasets, which are not always available, or a list of predefined cell subset markers, which are subject to biases. Thus, a user-friendly and precise annotation tool is still critically needed. RESULTS We curated a comprehensive cell marker database named scMayoMapDatabase and developed a companion R package scMayoMap, an easy-to-use single-cell annotation tool, to provide fast and accurate cell type annotation. The effectiveness of scMayoMap was demonstrated in 48 independent scRNA-seq datasets across different platforms and tissues. Additionally, the scMayoMapDatabase can be integrated with other tools and further improve their performance. CONCLUSIONS scMayoMap and scMayoMapDatabase will help investigators to define the cell types in their scRNA-seq data in a streamlined and user-friendly way.
Collapse
Affiliation(s)
- Lu Yang
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, 55905, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Yan Er Ng
- Robert and Arlene Kogod Center On Aging, Mayo Clinic, Rochester, MN, 55905, USA
| | - Haipeng Sun
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ying Li
- Department of Quantitative Health Sciences, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Lucas C S Chini
- Robert and Arlene Kogod Center On Aging, Mayo Clinic, Rochester, MN, 55905, USA
| | - Nathan K LeBrasseur
- Robert and Arlene Kogod Center On Aging, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Physical Medicine and Rehabilitation, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Jun Chen
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, 55905, USA.
- Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Xu Zhang
- Robert and Arlene Kogod Center On Aging, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
50
|
Zheng H, Vijg J, Fard AT, Mar JC. Measuring cell-to-cell expression variability in single-cell RNA-sequencing data: a comparative analysis and applications to B cell aging. Genome Biol 2023; 24:238. [PMID: 37864221 PMCID: PMC10588274 DOI: 10.1186/s13059-023-03036-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 08/11/2023] [Indexed: 10/22/2023] Open
Abstract
BACKGROUND Single-cell RNA-sequencing (scRNA-seq) technologies enable the capture of gene expression heterogeneity and consequently facilitate the study of cell-to-cell variability at the cell type level. Although different methods have been proposed to quantify cell-to-cell variability, it is unclear what the optimal statistical approach is, especially in light of challenging data structures that are unique to scRNA-seq data like zero inflation. RESULTS We systematically evaluate the performance of 14 different variability metrics that are commonly applied to transcriptomic data for measuring cell-to-cell variability. Leveraging simulations and real datasets, we benchmark the metric performance based on data-specific features, sparsity and sequencing platform, biological properties, and the ability to recapitulate true levels of biological variability based on known gene sets. Next, we use scran, the metric with the strongest all-round performance, to investigate changes in cell-to-cell variability that occur during B cell differentiation and the aging processes. The analysis of primary cell types from hematopoietic stem cells (HSCs) and B lymphopoiesis reveals unique gene signatures with consistent patterns of variable and stable expression profiles during B cell differentiation which highlights the significance of these methods. Identifying differentially variable genes between young and old cells elucidates the regulatory changes that may be overlooked by solely focusing on mean expression changes and we investigate this in the context of regulatory networks. CONCLUSIONS We highlight the importance of capturing cell-to-cell gene expression variability in a complex biological process like differentiation and aging and emphasize the value of these findings at the level of individual cell types.
Collapse
Affiliation(s)
- Huiwen Zheng
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Jan Vijg
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
- Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Atefeh Taherian Fard
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
| | - Jessica Cara Mar
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|