1
|
Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024; 4:100581. [PMID: 38823397 PMCID: PMC11228957 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Julian Lehrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O'Farrell
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Vanessa D Jonsson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| | - Mohammed A Mostajo-Radji
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| |
Collapse
|
2
|
Gonzalez-Ferrer J, Lehrer J, O’Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. Unraveling Neuronal Identities Using SIMS: A Deep Learning Label Transfer Tool for Single-Cell RNA Sequencing Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.529615. [PMID: 36909548 PMCID: PMC10002667 DOI: 10.1101/2023.02.28.529615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably high accuracy. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement, shedding light on genetic changes affecting specific cell types across development. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities and unveil genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. When cell types are obscured by stress signals, label transfer from primary tissue improves the accuracy of cortical organoid annotations, serving as a reliable ground truth. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Julian Lehrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Ash O’Farrell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Electrical and Computer Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Vanessa D. Jonsson
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| | - Mohammed A. Mostajo-Radji
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| |
Collapse
|
3
|
Adult Upper Cortical Layer Specific Transcription Factor CUX2 Is Expressed in Transient Subplate and Marginal Zone Neurons of the Developing Human Brain. Cells 2021; 10:cells10020415. [PMID: 33671178 PMCID: PMC7922267 DOI: 10.3390/cells10020415] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 02/10/2021] [Accepted: 02/13/2021] [Indexed: 12/18/2022] Open
Abstract
Cut-Like Homeobox 2 (Cux2) is a transcription factor involved in dendrite and spine development, and synapse formation of projection neurons placed in mouse upper neocortical layers. Therefore, Cux2 is often used as an upper layer marker in the mouse brain. However, expression of its orthologue CUX2 remains unexplored in the human fetal neocortex. Here, we show that CUX2 protein is expressed in transient compartments of developing neocortical anlage during the main fetal phases of neocortical laminar development in human brain. During the early fetal phase when neurons of the upper cortical layers are still radially migrating to reach their final place in the cortical anlage, CUX2 was expressed in the marginal zone (MZ), deep cortical plate, and pre-subplate. During midgestation, CUX2 was still expressed in the migrating upper cortical neurons as well as in the subplate (SP) and MZ neurons. At the term age, CUX2 was expressed in the gyral white matter along with its expected expression in the upper layer neurons. In sum, CUX2 was expressed in migratory neurons of prospective superficial layers and in the diverse subpopulation of transient postmigratory SP and MZ neurons. Therefore, our findings indicate that CUX2 is a novel marker of distinct transient, but critical histogenetic events during corticogenesis. Given the Cux2 functions reported in animal models, our data further suggest that the expression of CUX2 in postmigratory SP and MZ neurons is associated with their unique dendritic and synaptogenesis characteristics.
Collapse
|
4
|
Kostović I. The enigmatic fetal subplate compartment forms an early tangential cortical nexus and provides the framework for construction of cortical connectivity. Prog Neurobiol 2020; 194:101883. [PMID: 32659318 DOI: 10.1016/j.pneurobio.2020.101883] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 06/05/2020] [Accepted: 07/06/2020] [Indexed: 12/19/2022]
Abstract
The most prominent transient compartment of the primate fetal cortex is the deep, cell-sparse, synapse-containing subplate compartment (SPC). The developmental role of the SPC and its extraordinary size in humans remain enigmatic. This paper evaluates evidence on the development and connectivity of the SPC and discusses its role in the pathogenesis of neurodevelopmental disorders. A synthesis of data shows that the subplate becomes a prominent compartment by its expansion from the deep cortical plate (CP), appearing well-delineated on MR scans and forming a tangential nexus across the hemisphere, consisting of an extracellular matrix, randomly distributed postmigratory neurons, multiple branches of thalamic and long corticocortical axons. The SPC generates early spontaneous non-synaptic and synaptic activity and mediates cortical response upon thalamic stimulation. The subplate nexus provides large-scale interareal connectivity possibly underlying fMR resting-state activity, before corticocortical pathways are established. In late fetal phase, when synapses appear within the CP, transient the SPC coexists with permanent circuitry. The histogenetic role of the SPC is to provide interactive milieu and capacity for guidance, sorting, "waiting" and target selection of thalamocortical and corticocortical pathways. The new evolutionary role of the SPC and its remnant white matter neurons is linked to the increasing number of associative pathways in the human neocortex. These roles attributed to the SPC are regulated using a spatiotemporal gene expression during critical periods, when pathogenic factors may disturb vulnerable circuitry of the SPC, causing neurodevelopmental cognitive circuitry disorders.
Collapse
Affiliation(s)
- Ivica Kostović
- Croatian Institute for Brain Research, School of Medicine, University of Zagreb, Scientific Centre of Excellence for Basic, Clinical and Translational Neuroscience, Salata 12, 10000 Zagreb, Croatia.
| |
Collapse
|