1
|
Wei X, Ma W, Wu Z, Wu H. Target-Oriented Reference Construction for supervised cell-type identification in scRNA-seq. RESEARCH SQUARE 2024:rs.3.rs-4559348. [PMID: 38978578 PMCID: PMC11230472 DOI: 10.21203/rs.3.rs-4559348/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Cell-type identification is the most crucial step in single cell RNA-seq (scRNA-seq) data analysis, for which the supervised cell-type identification method is a desired solution due to the accuracy and efficiency. The performance of such methods is highly dependent on the quality of the reference data. Even though there are many supervised cell-type identification tools, there is no method for selecting and constructing reference data. Here we develop Target-Oriented Reference Construction (TORC), a widely applicable strategy for constructing reference given target dataset in scRNA-seq supervised cell-type identification. TORC alleviates the differences in data distribution and cell-type composition between reference and target. Extensive benchmarks on simulated and real data analyses demonstrate consistent improvements in cell-type identification from TORC. TORC is freely available at https://github.com/weix21/TORC.
Collapse
Affiliation(s)
| | | | | | - Hao Wu
- Shenzhen University of Advanced Technology
| |
Collapse
|
2
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024:10.1038/s12276-024-01243-w. [PMID: 38871816 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
3
|
Gonzalez-Ferrer J, Lehrer J, O'Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. SIMS: A deep-learning label transfer tool for single-cell RNA sequencing analysis. CELL GENOMICS 2024; 4:100581. [PMID: 38823397 PMCID: PMC11228957 DOI: 10.1016/j.xgen.2024.100581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 04/02/2024] [Accepted: 05/09/2024] [Indexed: 06/03/2024]
Abstract
Cell atlases serve as vital references for automating cell labeling in new samples, yet existing classification algorithms struggle with accuracy. Here we introduce SIMS (scalable, interpretable machine learning for single cell), a low-code data-efficient pipeline for single-cell RNA classification. We benchmark SIMS against datasets from different tissues and species. We demonstrate SIMS's efficacy in classifying cells in the brain, achieving high accuracy even with small training sets (<3,500 cells) and across different samples. SIMS accurately predicts neuronal subtypes in the developing brain, shedding light on genetic changes during neuronal differentiation and postmitotic fate refinement. Finally, we apply SIMS to single-cell RNA datasets of cortical organoids to predict cell identities and uncover genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Julian Lehrer
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Ash O'Farrell
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Electrical and Computer Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA
| | - Vanessa D Jonsson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Department of Applied Mathematics, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| | - Mohammed A Mostajo-Radji
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95060, USA; Live Cell Biotechnology Discovery Lab, University of California, Santa Cruz, Santa Cruz, CA 95060, USA.
| |
Collapse
|
4
|
McLean AK, Reynolds G, Pratt AG. Leveraging Multi-Tissue, Single-Cell Atlases as Tools to Elucidate Shared Mechanisms of Immune-Mediated Inflammatory Diseases. Biomedicines 2024; 12:1297. [PMID: 38927506 PMCID: PMC11201400 DOI: 10.3390/biomedicines12061297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 06/05/2024] [Accepted: 06/08/2024] [Indexed: 06/28/2024] Open
Abstract
The observation that certain therapeutic strategies for targeting inflammation benefit patients with distinct immune-mediated inflammatory diseases (IMIDs) is exemplified by the success of TNF blockade in conditions including rheumatoid arthritis, ulcerative colitis, and skin psoriasis, albeit only for subsets of individuals with each condition. This suggests intersecting "nodes" in inflammatory networks at a molecular and cellular level may drive and/or maintain IMIDs, being "shared" between traditionally distinct diagnoses without mapping neatly to a single clinical phenotype. In line with this proposition, integrative tumour tissue analyses in oncology have highlighted novel cell states acting across diverse cancers, with important implications for precision medicine. Drawing upon advances in the oncology field, this narrative review will first summarise learnings from the Human Cell Atlas in health as a platform for interrogating IMID tissues. It will then review cross-disease studies to date that inform this endeavour before considering future directions in the field.
Collapse
Affiliation(s)
- Anthony K. McLean
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Gary Reynolds
- Center for Immunology and Inflammatory Diseases, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Arthur G. Pratt
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Musculoskeletal Unit, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne NE7 7DN, UK
| |
Collapse
|
5
|
Palmer JA, Rosenthal N, Teichmann SA, Litvinukova M. Revisiting Cardiac Biology in the Era of Single Cell and Spatial Omics. Circ Res 2024; 134:1681-1702. [PMID: 38843288 PMCID: PMC11149945 DOI: 10.1161/circresaha.124.323672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 04/16/2024] [Accepted: 04/24/2024] [Indexed: 06/09/2024]
Abstract
Throughout our lifetime, each beat of the heart requires the coordinated action of multiple cardiac cell types. Understanding cardiac cell biology, its intricate microenvironments, and the mechanisms that govern their function in health and disease are crucial to designing novel therapeutical and behavioral interventions. Recent advances in single-cell and spatial omics technologies have significantly propelled this understanding, offering novel insights into the cellular diversity and function and the complex interactions of cardiac tissue. This review provides a comprehensive overview of the cellular landscape of the heart, bridging the gap between suspension-based and emerging in situ approaches, focusing on the experimental and computational challenges, comparative analyses of mouse and human cardiac systems, and the rising contextualization of cardiac cells within their niches. As we explore the heart at this unprecedented resolution, integrating insights from both mouse and human studies will pave the way for novel diagnostic tools and therapeutic interventions, ultimately improving outcomes for patients with cardiovascular diseases.
Collapse
Affiliation(s)
- Jack A. Palmer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom (J.A.P., S.A.T.)
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus (J.A.P., S.A.T.), University of Cambridge, United Kingdom
| | - Nadia Rosenthal
- The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME (N.R.)
- National Heart and Lung Institute, Imperial College London, United Kingdom (N.R.)
| | - Sarah A. Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, United Kingdom (J.A.P., S.A.T.)
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus (J.A.P., S.A.T.), University of Cambridge, United Kingdom
- Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory (S.A.T.), University of Cambridge, United Kingdom
| | - Monika Litvinukova
- University Hospital Würzburg, Germany (M.L.)
- Würzburg Institute of Systems Immunology, Max Planck Research Group at the Julius-Maximilians-Universität Würzburg, Germany (M.L.)
- Helmholtz Pioneer Campus, Helmholtz Munich, Germany (M.L.)
| |
Collapse
|
6
|
Zeng Y, Luo M, Shangguan N, Shi P, Feng J, Xu J, Chen K, Lu Y, Yu W, Yang Y. Deciphering cell types by integrating scATAC-seq data with genome sequences. NATURE COMPUTATIONAL SCIENCE 2024; 4:285-298. [PMID: 38600256 DOI: 10.1038/s43588-024-00622-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/18/2024] [Indexed: 04/12/2024]
Abstract
The single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) technology provides insight into gene regulation and epigenetic heterogeneity at single-cell resolution, but cell annotation from scATAC-seq remains challenging due to high dimensionality and extreme sparsity within the data. Existing cell annotation methods mostly focus on the cell peak matrix without fully utilizing the underlying genomic sequence. Here we propose a method, SANGO, for accurate single-cell annotation by integrating genome sequences around the accessibility peaks within scATAC data. The genome sequences of peaks are encoded into low-dimensional embeddings, and then iteratively used to reconstruct the peak statistics of cells through a fully connected network. The learned weights are considered as regulatory modes to represent cells, and utilized to align the query cells and the annotated cells in the reference data through a graph transformer network for cell annotations. SANGO was demonstrated to consistently outperform competing methods on 55 paired scATAC-seq datasets across samples, platforms and tissues. SANGO was also shown to be able to detect unknown tumor cells through attention edge weights learned by the graph transformer. Moreover, from the annotated cells, we found cell-type-specific peaks that provide functional insights/biological signals through expression enrichment analysis, cis-regulatory chromatin interaction analysis and motif enrichment analysis.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Big Data and Software Engineering, Chongqing University, Chongqing, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Mai Luo
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Ningyuan Shangguan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Peiyu Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Junxi Feng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jin Xu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Ken Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Weijiang Yu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
7
|
Kaur H, Jha P, Ochatt SJ, Kumar V. Single-cell transcriptomics is revolutionizing the improvement of plant biotechnology research: recent advances and future opportunities. Crit Rev Biotechnol 2024; 44:202-217. [PMID: 36775666 DOI: 10.1080/07388551.2023.2165900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Revised: 11/04/2022] [Accepted: 12/08/2022] [Indexed: 02/14/2023]
Abstract
Single-cell approaches are a promising way to obtain high-resolution transcriptomics data and have the potential to revolutionize the study of plant growth and development. Recent years have seen the advent of unprecedented technological advances in the field of plant biology to study the transcriptional information of individual cells by single-cell RNA sequencing (scRNA-seq). This review focuses on the modern advancements of single-cell transcriptomics in plants over the past few years. In addition, it also offers a new insight of how these emerging methods will expedite advance research in plant biotechnology in the near future. Lastly, the various technological hurdles and inherent limitations of single-cell technology that need to be conquered to develop such outstanding possible knowledge gain is critically analyzed and discussed.
Collapse
Affiliation(s)
- Harmeet Kaur
- Division of Research and Development, Plant Biotechnology Lab, Lovely Professional University, Phagwara, Punjab, India
- Department of Biotechnology, Lovely Faculty of Technology and Sciences, Lovely Professional University, Phagwara, Punjab, India
| | - Priyanka Jha
- Department of Biotechnology, Lovely Faculty of Technology and Sciences, Lovely Professional University, Phagwara, Punjab, India
- Department of Research Facilitation, Division of Research and Development, Lovely Professional University, Phagwara, Punjab, India
| | - Sergio J Ochatt
- Agroécologie, InstitutAgro Dijon, INRAE, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Vijay Kumar
- Division of Research and Development, Plant Biotechnology Lab, Lovely Professional University, Phagwara, Punjab, India
- Department of Biotechnology, Lovely Faculty of Technology and Sciences, Lovely Professional University, Phagwara, Punjab, India
| |
Collapse
|
8
|
Ali M, Yang T, He H, Zhang Y. Plant biotechnology research with single-cell transcriptome: recent advancements and prospects. PLANT CELL REPORTS 2024; 43:75. [PMID: 38381195 DOI: 10.1007/s00299-024-03168-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/05/2024] [Indexed: 02/22/2024]
Abstract
KEY MESSAGE Single-cell transcriptomic techniques have emerged as powerful tools in plant biology, offering high-resolution insights into gene expression at the individual cell level. This review highlights the rapid expansion of single-cell technologies in plants, their potential in understanding plant development, and their role in advancing plant biotechnology research. Single-cell techniques have emerged as powerful tools to enhance our understanding of biological systems, providing high-resolution transcriptomic analysis at the single-cell level. In plant biology, the adoption of single-cell transcriptomics has seen rapid expansion of available technologies and applications. This review article focuses on the latest advancements in the field of single-cell transcriptomic in plants and discusses the potential role of these approaches in plant development and expediting plant biotechnology research in the near future. Furthermore, inherent challenges and limitations of single-cell technology are critically examined to overcome them and enhance our knowledge and understanding.
Collapse
Affiliation(s)
- Muhammad Ali
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- Peking University-Institute of Advanced Agricultural Sciences, Weifang, China
| | - Tianxia Yang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
- State Key Laboratory of Maize Bio-breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding (MOE), China Agricultural University, Beijing, China
| | - Hai He
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China
| | - Yu Zhang
- School of Agriculture, Sun Yat-Sen University, Shenzhen, 518107, China.
| |
Collapse
|
9
|
Park Y, Muttray NP, Hauschild AC. Species-agnostic transfer learning for cross-species transcriptomics data integration without gene orthology. Brief Bioinform 2024; 25:bbae004. [PMID: 38305455 PMCID: PMC10835749 DOI: 10.1093/bib/bbae004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/24/2023] [Accepted: 12/10/2023] [Indexed: 02/03/2024] Open
Abstract
Novel hypotheses in biomedical research are often developed or validated in model organisms such as mice and zebrafish and thus play a crucial role. However, due to biological differences between species, translating these findings into human applications remains challenging. Moreover, commonly used orthologous gene information is often incomplete and entails a significant information loss during gene-id conversion. To address these issues, we present a novel methodology for species-agnostic transfer learning with heterogeneous domain adaptation. We extended the cross-domain structure-preserving projection toward out-of-sample prediction. Our approach not only allows knowledge integration and translation across various species without relying on gene orthology but also identifies similar GO among the most influential genes composing the latent space for integration. Subsequently, during the alignment of latent spaces, each composed of species-specific genes, it is possible to identify functional annotations of genes missing from public orthology databases. We evaluated our approach with four different single-cell sequencing datasets focusing on cell-type prediction and compared it against related machine-learning approaches. In summary, the developed model outperforms related methods working without prior knowledge when predicting unseen cell types based on other species' data. The results demonstrate that our novel approach allows knowledge transfer beyond species barriers without the dependency on known gene orthology but utilizing the entire gene sets.
Collapse
Affiliation(s)
- Youngjun Park
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- International Max Planck Research Schools for Genome Science, Georg-August-Universität Göttingen Göttingen, Germany
| | - Nils P Muttray
- Applied Statistics, Georg-August-Universität Göttingen Göttingen, Germany
| | - Anne-Christin Hauschild
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
- Campus-Institute Data Science (CIDAS), Georg-August-Universität Göttingen Göttingen, Germany
| |
Collapse
|
10
|
Zhou S, Li Y, Wu W, Li L. scMMT: a multi-use deep learning approach for cell annotation, protein prediction and embedding in single-cell RNA-seq data. Brief Bioinform 2024; 25:bbad523. [PMID: 38300515 PMCID: PMC10833085 DOI: 10.1093/bib/bbad523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 12/19/2023] [Indexed: 02/02/2024] Open
Abstract
Accurate cell type annotation in single-cell RNA-sequencing data is essential for advancing biological and medical research, particularly in understanding disease progression and tumor microenvironments. However, existing methods are constrained by single feature extraction approaches, lack of adaptability to immune cell types with similar molecular profiles but distinct functions and a failure to account for the impact of cell label noise on model accuracy, all of which compromise the precision of annotation. To address these challenges, we developed a supervised approach called scMMT. We proposed a novel feature extraction technique to uncover more valuable information. Additionally, we constructed a multi-task learning framework based on the GradNorm method to enhance the recognition of challenging immune cells and reduce the impact of label noise by facilitating mutual reinforcement between cell type annotation and protein prediction tasks. Furthermore, we introduced logarithmic weighting and label smoothing mechanisms to enhance the recognition ability of rare cell types and prevent model overconfidence. Through comprehensive evaluations on multiple public datasets, scMMT has demonstrated state-of-the-art performance in various aspects including cell type annotation, rare cell identification, dropout and label noise resistance, protein expression prediction and low-dimensional embedding representation.
Collapse
Affiliation(s)
- Songqi Zhou
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Yang Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing, China
| | - Wenyuan Wu
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| | - Li Li
- Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China
- Chongqing School, University of Chinese Academy of Sciences, Chongqing, China
| |
Collapse
|
11
|
Xu C, Prete M, Webb S, Jardine L, Stewart BJ, Hoo R, He P, Meyer KB, Teichmann SA. Automatic cell-type harmonization and integration across Human Cell Atlas datasets. Cell 2023; 186:5876-5891.e20. [PMID: 38134877 DOI: 10.1016/j.cell.2023.11.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 08/24/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023]
Abstract
Harmonizing cell types across the single-cell community and assembling them into a common framework is central to building a standardized Human Cell Atlas. Here, we present CellHint, a predictive clustering tree-based tool to resolve cell-type differences in annotation resolution and technical biases across datasets. CellHint accurately quantifies cell-cell transcriptomic similarities and places cell types into a relationship graph that hierarchically defines shared and unique cell subtypes. Application to multiple immune datasets recapitulates expert-curated annotations. CellHint also reveals underexplored relationships between healthy and diseased lung cell states in eight diseases. Furthermore, we present a workflow for fast cross-dataset integration guided by harmonized cell types and cell hierarchy, which uncovers underappreciated cell types in adult human hippocampus. Finally, we apply CellHint to 12 tissues from 38 datasets, providing a deeply curated cross-tissue database with ∼3.7 million cells and various machine learning models for automatic cell annotation across human tissues.
Collapse
Affiliation(s)
- Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Simone Webb
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Laura Jardine
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Benjamin J Stewart
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge CB2 0QQ, UK; Cambridge University Hospitals NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge CB2 0QQ, UK
| | - Regina Hoo
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Peng He
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Kerstin B Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Theory of Condensed Matter Group, Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, UK.
| |
Collapse
|
12
|
Du ZH, Hu WL, Li JQ, Shang X, You ZH, Chen ZZ, Huang YA. scPML: pathway-based multi-view learning for cell type annotation from single-cell RNA-seq data. Commun Biol 2023; 6:1268. [PMID: 38097699 PMCID: PMC10721875 DOI: 10.1038/s42003-023-05634-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/24/2023] [Indexed: 12/17/2023] Open
Abstract
Recent developments in single-cell technology have enabled the exploration of cellular heterogeneity at an unprecedented level, providing invaluable insights into various fields, including medicine and disease research. Cell type annotation is an essential step in its omics research. The mainstream approach is to utilize well-annotated single-cell data to supervised learning for cell type annotation of new singlecell data. However, existing methods lack good generalization and robustness in cell annotation tasks, partially due to difficulties in dealing with technical differences between datasets, as well as not considering the heterogeneous associations of genes in regulatory mechanism levels. Here, we propose the scPML model, which utilizes various gene signaling pathway data to partition the genetic features of cells, thus characterizing different interaction maps between cells. Extensive experiments demonstrate that scPML performs better in cell type annotation and detection of unknown cell types from different species, platforms, and tissues.
Collapse
Affiliation(s)
- Zhi-Hua Du
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Wei-Lin Hu
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Zhuang-Zhuang Chen
- College of Computer Science and Software Engineering, ShenZhen University, 3688 Nanhai Avenue, Shenzhen, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
13
|
Gonzalez-Ferrer J, Lehrer J, O’Farrell A, Paten B, Teodorescu M, Haussler D, Jonsson VD, Mostajo-Radji MA. Unraveling Neuronal Identities Using SIMS: A Deep Learning Label Transfer Tool for Single-Cell RNA Sequencing Analysis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.28.529615. [PMID: 36909548 PMCID: PMC10002667 DOI: 10.1101/2023.02.28.529615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Large single-cell RNA datasets have contributed to unprecedented biological insight. Often, these take the form of cell atlases and serve as a reference for automating cell labeling of newly sequenced samples. Yet, classification algorithms have lacked the capacity to accurately annotate cells, particularly in complex datasets. Here we present SIMS (Scalable, Interpretable Machine Learning for Single-Cell), an end-to-end data-efficient machine learning pipeline for discrete classification of single-cell data that can be applied to new datasets with minimal coding. We benchmarked SIMS against common single-cell label transfer tools and demonstrated that it performs as well or better than state of the art algorithms. We then use SIMS to classify cells in one of the most complex tissues: the brain. We show that SIMS classifies cells of the adult cerebral cortex and hippocampus at a remarkably high accuracy. This accuracy is maintained in trans-sample label transfers of the adult human cerebral cortex. We then apply SIMS to classify cells in the developing brain and demonstrate a high level of accuracy at predicting neuronal subtypes, even in periods of fate refinement, shedding light on genetic changes affecting specific cell types across development. Finally, we apply SIMS to single cell datasets of cortical organoids to predict cell identities and unveil genetic variations between cell lines. SIMS identifies cell-line differences and misannotated cell lineages in human cortical organoids derived from different pluripotent stem cell lines. When cell types are obscured by stress signals, label transfer from primary tissue improves the accuracy of cortical organoid annotations, serving as a reliable ground truth. Altogether, we show that SIMS is a versatile and robust tool for cell-type classification from single-cell datasets.
Collapse
Affiliation(s)
- Jesus Gonzalez-Ferrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Julian Lehrer
- These authors contributed equally to this work
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Ash O’Farrell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Mircea Teodorescu
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Electrical and Computer Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
| | - Vanessa D. Jonsson
- Department of Applied Mathematics, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| | - Mohammed A. Mostajo-Radji
- Genomics Institute, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Live Cell Biotechnology Discovery Lab, University of California Santa Cruz, Santa Cruz, 95060, CA, USA
- Co-senior authors
| |
Collapse
|
14
|
Li W, Xiang B, Yang F, Rong Y, Yin Y, Yao J, Zhang H. scMHNN: a novel hypergraph neural network for integrative analysis of single-cell epigenomic, transcriptomic and proteomic data. Brief Bioinform 2023; 24:bbad391. [PMID: 37930028 DOI: 10.1093/bib/bbad391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 09/09/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023] Open
Abstract
Technological advances have now made it possible to simultaneously profile the changes of epigenomic, transcriptomic and proteomic at the single cell level, allowing a more unified view of cellular phenotypes and heterogeneities. However, current computational tools for single-cell multi-omics data integration are mainly tailored for bi-modality data, so new tools are urgently needed to integrate tri-modality data with complex associations. To this end, we develop scMHNN to integrate single-cell multi-omics data based on hypergraph neural network. After modeling the complex data associations among various modalities, scMHNN performs message passing process on the multi-omics hypergraph, which can capture the high-order data relationships and integrate the multiple heterogeneous features. Followingly, scMHNN learns discriminative cell representation via a dual-contrastive loss in self-supervised manner. Based on the pretrained hypergraph encoder, we further introduce the pre-training and fine-tuning paradigm, which allows more accurate cell-type annotation with only a small number of labeled cells as reference. Benchmarking results on real and simulated single-cell tri-modality datasets indicate that scMHNN outperforms other competing methods on both cell clustering and cell-type annotation tasks. In addition, we also demonstrate scMHNN facilitates various downstream tasks, such as cell marker detection and enrichment analysis.
Collapse
Affiliation(s)
- Wei Li
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350 Tianjin, China
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Bin Xiang
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Yueyang Road, 200031 Shanghai, China
| | - Fan Yang
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Yu Rong
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Yanbin Yin
- Department of Food Science and Technology, University of Nebraska - Lincoln, 1400 R Street, 68588 Nebraska, USA
| | - Jianhua Yao
- AI Lab, Tencent, Gaoxin 9th South Road, 518000 Shenzhen, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tongyan Road, 300350 Tianjin, China
| |
Collapse
|
15
|
Lyu P, Zhai Y, Li T, Qian J. CellAnn: a comprehensive, super-fast, and user-friendly single-cell annotation web server. Bioinformatics 2023; 39:btad521. [PMID: 37610325 PMCID: PMC10477937 DOI: 10.1093/bioinformatics/btad521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 07/17/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Single-cell sequencing technology has become a routine in studying many biological problems. A core step of analyzing single-cell data is the assignment of cell clusters to specific cell types. Reference-based methods are proposed for predicting cell types for single-cell clusters. However, the scalability and lack of preprocessed reference datasets prevent them from being practical and easy to use. RESULTS Here, we introduce a reference-based cell annotation web server, CellAnn, which is super-fast and easy to use. CellAnn contains a comprehensive reference database with 204 human and 191 mouse single-cell datasets. These reference datasets cover 32 organs. Furthermore, we developed a cluster-to-cluster alignment method to transfer cell labels from the reference to the query datasets, which is superior to the existing methods with higher accuracy and higher scalability. Finally, CellAnn is an online tool that integrates all the procedures in cell annotation, including reference searching, transferring cell labels, visualizing results, and harmonizing cell annotation labels. Through the user-friendly interface, users can identify the best annotation by cross-validating with multiple reference datasets. We believe that CellAnn can greatly facilitate single-cell sequencing data analysis. AVAILABILITY AND IMPLEMENTATION The web server is available at www.cellann.io, and the source code is available at https://github.com/Pinlyu3/CellAnn_shinyapp.
Collapse
Affiliation(s)
- Pin Lyu
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Yijie Zhai
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| | - Taibo Li
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD 21218, United States
| | - Jiang Qian
- Department of Ophthalmology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, United States
| |
Collapse
|
16
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
17
|
Yan X, Zheng R, Chen J, Li M. scNCL: transferring labels from scRNA-seq to scATAC-seq data with neighborhood contrastive regularization. Bioinformatics 2023; 39:btad505. [PMID: 37584660 PMCID: PMC10457667 DOI: 10.1093/bioinformatics/btad505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 07/17/2023] [Accepted: 08/12/2023] [Indexed: 08/17/2023] Open
Abstract
MOTIVATION scATAC-seq has enabled chromatin accessibility landscape profiling at the single-cell level, providing opportunities for determining cell-type-specific regulation codes. However, high dimension, extreme sparsity, and large scale of scATAC-seq data have posed great challenges to cell-type identification. Thus, there has been a growing interest in leveraging the well-annotated scRNA-seq data to help annotate scATAC-seq data. However, substantial computational obstacles remain to transfer information from scRNA-seq to scATAC-seq, especially for their heterogeneous features. RESULTS We propose a new transfer learning method, scNCL, which utilizes prior knowledge and contrastive learning to tackle the problem of heterogeneous features. Briefly, scNCL transforms scATAC-seq features into gene activity matrix based on prior knowledge. Since feature transformation can cause information loss, scNCL introduces neighborhood contrastive learning to preserve the neighborhood structure of scATAC-seq cells in raw feature space. To learn transferable latent features, scNCL uses a feature projection loss and an alignment loss to harmonize embeddings between scRNA-seq and scATAC-seq. Experiments on various datasets demonstrated that scNCL not only realizes accurate and robust label transfer for common types, but also achieves reliable detection of novel types. scNCL is also computationally efficient and scalable to million-scale datasets. Moreover, we prove scNCL can help refine cell-type annotations in existing scATAC-seq atlases. AVAILABILITY AND IMPLEMENTATION The source code and data used in this paper can be found in https://github.com/CSUBioGroup/scNCL-release.
Collapse
Affiliation(s)
- Xuhua Yan
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jinmiao Chen
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A*STAR), Singapore 138648, Singapore
- Immunology Translational Research Program, Department of Microbiology and Immunology, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore 117545, Singapore
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
18
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
19
|
Kanemaru K, Cranley J, Muraro D, Miranda AMA, Ho SY, Wilbrey-Clark A, Patrick Pett J, Polanski K, Richardson L, Litvinukova M, Kumasaka N, Qin Y, Jablonska Z, Semprich CI, Mach L, Dabrowska M, Richoz N, Bolt L, Mamanova L, Kapuge R, Barnett SN, Perera S, Talavera-López C, Mulas I, Mahbubani KT, Tuck L, Wang L, Huang MM, Prete M, Pritchard S, Dark J, Saeb-Parsy K, Patel M, Clatworthy MR, Hübner N, Chowdhury RA, Noseda M, Teichmann SA. Spatially resolved multiomics of human cardiac niches. Nature 2023; 619:801-810. [PMID: 37438528 PMCID: PMC10371870 DOI: 10.1038/s41586-023-06311-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 06/12/2023] [Indexed: 07/14/2023]
Abstract
The function of a cell is defined by its intrinsic characteristics and its niche: the tissue microenvironment in which it dwells. Here we combine single-cell and spatial transcriptomics data to discover cellular niches within eight regions of the human heart. We map cells to microanatomical locations and integrate knowledge-based and unsupervised structural annotations. We also profile the cells of the human cardiac conduction system1. The results revealed their distinctive repertoire of ion channels, G-protein-coupled receptors (GPCRs) and regulatory networks, and implicated FOXP2 in the pacemaker phenotype. We show that the sinoatrial node is compartmentalized, with a core of pacemaker cells, fibroblasts and glial cells supporting glutamatergic signalling. Using a custom CellPhoneDB.org module, we identify trans-synaptic pacemaker cell interactions with glia. We introduce a druggable target prediction tool, drug2cell, which leverages single-cell profiles and drug-target interactions to provide mechanistic insights into the chronotropic effects of drugs, including GLP-1 analogues. In the epicardium, we show enrichment of both IgG+ and IgA+ plasma cells forming immune niches that may contribute to infection defence. Overall, we provide new clarity to cardiac electro-anatomy and immunology, and our suite of computational approaches can be applied to other tissues and organs.
Collapse
Affiliation(s)
- Kazumasa Kanemaru
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - James Cranley
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Daniele Muraro
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Siew Yen Ho
- Cardiac Morphology Unit, Royal Brompton Hospital and Imperial College London, London, UK
| | - Anna Wilbrey-Clark
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Jan Patrick Pett
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Krzysztof Polanski
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Laura Richardson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Monika Litvinukova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
| | - Natsuhiko Kumasaka
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yue Qin
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Zuzanna Jablonska
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Claudia I Semprich
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Lukas Mach
- National Heart and Lung Institute, Imperial College London, London, UK
- Royal Brompton Hospital, London, UK
| | - Monika Dabrowska
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Nathan Richoz
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Cambridge, UK
| | - Liam Bolt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Lira Mamanova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Rakeshlal Kapuge
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sam N Barnett
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Shani Perera
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Carlos Talavera-López
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Würzburg Institute for Systems Immunology, Max Planck Research Group, Julius-Maximilian-Universität, Würzburg, Germany
| | - Ilaria Mulas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Krishnaa T Mahbubani
- Department of Surgery, University of Cambridge, and Cambridge Biorepository for Translational Medicine, NIHR Cambridge Biomedical Centre, Cambridge, UK
| | - Liz Tuck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Lu Wang
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Margaret M Huang
- Department of Surgery, University of Cambridge, and Cambridge Biorepository for Translational Medicine, NIHR Cambridge Biomedical Centre, Cambridge, UK
| | - Martin Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sophie Pritchard
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - John Dark
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, UK
| | - Kourosh Saeb-Parsy
- Department of Surgery, University of Cambridge, and Cambridge Biorepository for Translational Medicine, NIHR Cambridge Biomedical Centre, Cambridge, UK
| | - Minal Patel
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Menna R Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Cambridge, UK
| | - Norbert Hübner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin, Germany
- Charité-Universitätsmedizin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
| | | | - Michela Noseda
- National Heart and Lung Institute, Imperial College London, London, UK.
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
- Department of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
20
|
Palani NP, Horvath C, Timshel PN, Folkertsma P, Grønning AGB, Henriksen TI, Peijs L, Jensen VH, Sun W, Jespersen NZ, Wolfrum C, Pers TH, Nielsen S, Scheele C. Adipogenic and SWAT cells separate from a common progenitor in human brown and white adipose depots. Nat Metab 2023; 5:996-1013. [PMID: 37337126 PMCID: PMC10290958 DOI: 10.1038/s42255-023-00820-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 05/11/2023] [Indexed: 06/21/2023]
Abstract
Adipocyte function is a major determinant of metabolic disease, warranting investigations of regulating mechanisms. We show at single-cell resolution that progenitor cells from four human brown and white adipose depots separate into two main cell fates, an adipogenic and a structural branch, developing from a common progenitor. The adipogenic gene signature contains mitochondrial activity genes, and associates with genome-wide association study traits for fat distribution. Based on an extracellular matrix and developmental gene signature, we name the structural branch of cells structural Wnt-regulated adipose tissue-resident (SWAT) cells. When stripped from adipogenic cells, SWAT cells display a multipotent phenotype by reverting towards progenitor state or differentiating into new adipogenic cells, dependent on media. Label transfer algorithms recapitulate the cell types in human adipose tissue datasets. In conclusion, we provide a differentiation map of human adipocytes and define the multipotent SWAT cell, providing a new perspective on adipose tissue regulation.
Collapse
Affiliation(s)
- Nagendra P Palani
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Carla Horvath
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Pascal N Timshel
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- ZS Associates, Copenhagen, Denmark
| | - Pytrik Folkertsma
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Tora I Henriksen
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Lone Peijs
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Verena H Jensen
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Wenfei Sun
- Institute of Food, Nutrition and Health, ETH Zurich, Zurich, Switzerland
| | - Naja Z Jespersen
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark
| | - Christian Wolfrum
- Institute of Food, Nutrition and Health, ETH Zurich, Zurich, Switzerland
| | - Tune H Pers
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Søren Nielsen
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Camilla Scheele
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark.
- The Center of Inflammation and Metabolism and the Center for Physical Activity Research, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
21
|
Davalos OA, Heydari AA, Fertig EJ, Sindi SS, Hoyer KK. Boosting Single-Cell RNA Sequencing Analysis with Simple Neural Attention. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.29.542760. [PMID: 37398136 PMCID: PMC10312486 DOI: 10.1101/2023.05.29.542760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
A limitation of current deep learning (DL) approaches for single-cell RNA sequencing (scRNAseq) analysis is the lack of interpretability. Moreover, existing pipelines are designed and trained for specific tasks used disjointly for different stages of analysis. We present scANNA, a novel interpretable DL model for scRNAseq studies that leverages neural attention to learn gene associations. After training, the learned gene importance (interpretability) is used to perform downstream analyses (e.g., global marker selection and cell-type classification) without retraining. ScANNA's performance is comparable to or better than state-of-the-art methods designed and trained for specific standard scRNAseq analyses even though scANNA was not trained for these tasks explicitly. ScANNA enables researchers to discover meaningful results without extensive prior knowledge or training separate task-specific models, saving time and enhancing scRNAseq analyses.
Collapse
Affiliation(s)
- Oscar A. Davalos
- Quantitative and Systems Biology Graduate Program, University of California, Merced, CA, USA
| | - A. Ali Heydari
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Elana J. Fertig
- Department of Oncology, Division of Biostatistics and Bioinformatics, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Suzanne S. Sindi
- Department of Applied Mathematics, University of California, Merced, CA, USA
- Health Sciences Research Institute, University of California, Merced, CA, USA
| | - Katrina K. Hoyer
- Health Sciences Research Institute, University of California, Merced, CA, USA
- Department of Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, CA, USA
| |
Collapse
|
22
|
Miranda AMA, Janbandhu V, Maatz H, Kanemaru K, Cranley J, Teichmann SA, Hübner N, Schneider MD, Harvey RP, Noseda M. Single-cell transcriptomics for the assessment of cardiac disease. Nat Rev Cardiol 2023; 20:289-308. [PMID: 36539452 DOI: 10.1038/s41569-022-00805-7] [Citation(s) in RCA: 23] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/03/2022] [Indexed: 12/24/2022]
Abstract
Cardiovascular disease is the leading cause of death globally. An advanced understanding of cardiovascular disease mechanisms is required to improve therapeutic strategies and patient risk stratification. State-of-the-art, large-scale, single-cell and single-nucleus transcriptomics facilitate the exploration of the cardiac cellular landscape at an unprecedented level, beyond its descriptive features, and can further our understanding of the mechanisms of disease and guide functional studies. In this Review, we provide an overview of the technical challenges in the experimental design of single-cell and single-nucleus transcriptomics studies, as well as a discussion of the type of inferences that can be made from the data derived from these studies. Furthermore, we describe novel findings derived from transcriptomics studies for each major cardiac cell type in both health and disease, and from development to adulthood. This Review also provides a guide to interpreting the exhaustive list of newly identified cardiac cell types and states, and highlights the consensus and discordances in annotation, indicating an urgent need for standardization. We describe advanced applications such as integration of single-cell data with spatial transcriptomics to map genes and cells on tissue and define cellular microenvironments that regulate homeostasis and disease progression. Finally, we discuss current and future translational and clinical implications of novel transcriptomics approaches, and provide an outlook of how these technologies will change the way we diagnose and treat heart disease.
Collapse
Affiliation(s)
| | - Vaibhao Janbandhu
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
| | - Henrike Maatz
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Kazumasa Kanemaru
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - James Cranley
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sarah A Teichmann
- Cellular Genetics Programme, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- Deptartment of Physics, Cavendish Laboratory, University of Cambridge, Cambridge, UK
| | - Norbert Hübner
- Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
- Charite-Universitätsmedizin Berlin, Berlin, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
| | | | - Richard P Harvey
- Victor Chang Cardiac Research Institute, Sydney, NSW, Australia
- School of Clinical Medicine, Faculty of Medicine, UNSW Sydney, Sydney, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, NSW, Australia
| | - Michela Noseda
- National Heart and Lung Institute, Imperial College London, London, UK.
| |
Collapse
|
23
|
Xu Y, Kramann R, McCord RP, Hayat S. MASI enables fast model-free standardization and integration of single-cell transcriptomics data. Commun Biol 2023; 6:465. [PMID: 37117305 PMCID: PMC10144903 DOI: 10.1038/s42003-023-04820-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 04/06/2023] [Indexed: 04/30/2023] Open
Abstract
Single-cell transcriptomics datasets from the same anatomical sites generated by different research labs are becoming increasingly common. However, fast and computationally inexpensive tools for standardization of cell-type annotation and data integration are still needed in order to increase research inclusivity. To standardize cell-type annotation and integrate single-cell transcriptomics datasets, we have built a fast model-free integration method, named MASI (Marker-Assisted Standardization and Integration). We benchmark MASI with other well-established methods and demonstrate that MASI outperforms other methods, in terms of integration, annotation, and speed. To harness knowledge from single-cell atlases, we demonstrate three case studies that cover integration across biological conditions, surveyed participants, and research groups, respectively. Finally, we show MASI can annotate approximately one million cells on a personal laptop, making large-scale single-cell data integration more accessible. We envision that MASI can serve as a cheap computational alternative for the single-cell research community.
Collapse
Affiliation(s)
- Yang Xu
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, 37996, USA
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Rafael Kramann
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University, Aachen, Germany
| | - Rachel Patton McCord
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, 37996, USA.
| | - Sikander Hayat
- Institute of Experimental Medicine and Systems Biology, RWTH Aachen University, Aachen, Germany.
| |
Collapse
|
24
|
Latyshev P, Pavlov F, Herbert A, Poptsova M. Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals. Front Big Data 2023; 6:1140663. [PMID: 37063486 PMCID: PMC10101332 DOI: 10.3389/fdata.2023.1140663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 03/14/2023] [Indexed: 04/03/2023] Open
Abstract
Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
Collapse
Affiliation(s)
- Pavel Latyshev
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Fedor Pavlov
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
| | - Alan Herbert
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
- InsideOutBio, Charlestown, MA, United States
| | - Maria Poptsova
- Laboratory of Bioinformatics, Faculty of Computer Science, HSE University, Moscow, Russia
- *Correspondence: Maria Poptsova
| |
Collapse
|
25
|
Adversarial confound regression and uncertainty measurements to classify heterogeneous clinical MRI in Mass General Brigham. PLoS One 2023; 18:e0277572. [PMID: 36862751 PMCID: PMC9980829 DOI: 10.1371/journal.pone.0277572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 10/29/2022] [Indexed: 03/03/2023] Open
Abstract
In this work, we introduce a novel deep learning architecture, MUCRAN (Multi-Confound Regression Adversarial Network), to train a deep learning model on clinical brain MRI while regressing demographic and technical confounding factors. We trained MUCRAN using 17,076 clinical T1 Axial brain MRIs collected from Massachusetts General Hospital before 2019 and demonstrated that MUCRAN could successfully regress major confounding factors in the vast clinical dataset. We also applied a method for quantifying uncertainty across an ensemble of these models to automatically exclude out-of-distribution data in AD detection. By combining MUCRAN and the uncertainty quantification method, we showed consistent and significant increases in the AD detection accuracy for newly collected MGH data (post-2019; 84.6% with MUCRAN vs. 72.5% without MUCRAN) and for data from other hospitals (90.3% from Brigham and Women's Hospital and 81.0% from other hospitals). MUCRAN offers a generalizable approach for deep-learning-based disease detection in heterogenous clinical data.
Collapse
|
26
|
Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun 2022; 13:7640. [PMID: 36496406 PMCID: PMC9741613 DOI: 10.1038/s41467-022-35288-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 11/25/2022] [Indexed: 12/13/2022] Open
Abstract
Spatially resolved transcriptomics provides the opportunity to investigate the gene expression profiles and the spatial context of cells in naive state, but at low transcript detection sensitivity or with limited gene throughput. Comprehensive annotating of cell types in spatially resolved transcriptomics to understand biological processes at the single cell level remains challenging. Here we propose Spatial-ID, a supervision-based cell typing method, that combines the existing knowledge of reference single-cell RNA-seq data and the spatial information of spatially resolved transcriptomics data. We present a series of benchmarking analyses on publicly available spatially resolved transcriptomics datasets, that demonstrate the superiority of Spatial-ID compared with state-of-the-art methods. Besides, we apply Spatial-ID on a self-collected mouse brain hemisphere dataset measured by Stereo-seq, that shows the scalability of Spatial-ID to three-dimensional large field tissues with subcellular spatial resolution.
Collapse
|
27
|
Brbić M, Cao K, Hickey JW, Tan Y, Snyder MP, Nolan GP, Leskovec J. Annotation of spatially resolved single-cell data with STELLAR. Nat Methods 2022; 19:1411-1418. [PMID: 36280720 DOI: 10.1038/s41592-022-01651-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 09/14/2022] [Indexed: 11/09/2022]
Abstract
Accurate cell-type annotation from spatially resolved single cells is crucial to understand functional spatial biology that is the basis of tissue organization. However, current computational methods for annotating spatially resolved single-cell data are typically based on techniques established for dissociated single-cell technologies and thus do not take spatial organization into account. Here we present STELLAR, a geometric deep learning method for cell-type discovery and identification in spatially resolved single-cell datasets. STELLAR automatically assigns cells to cell types present in the annotated reference dataset and discovers novel cell types and cell states. STELLAR transfers annotations across different dissection regions, different tissues and different donors, and learns cell representations that capture higher-order tissue structures. We successfully applied STELLAR to CODEX multiplexed fluorescent microscopy data and multiplexed RNA imaging datasets. Within the Human BioMolecular Atlas Program, STELLAR has annotated 2.6 million spatially resolved single cells with dramatic time savings.
Collapse
Affiliation(s)
- Maria Brbić
- Department of Computer Science, Stanford University, Stanford, CA, USA
- School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Kaidi Cao
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - John W Hickey
- Baxter Laboratories Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Yuqi Tan
- Baxter Laboratories Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Michael P Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford University, Stanford, CA, USA
| | - Garry P Nolan
- Baxter Laboratories Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
- Department of Pathology, Stanford University School of Medicine, Stanford University, Stanford, CA, USA.
| | - Jure Leskovec
- Department of Computer Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
28
|
Brendel M, Su C, Bai Z, Zhang H, Elemento O, Wang F. Application of Deep Learning on Single-cell RNA Sequencing Data Analysis: A Review. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:814-835. [PMID: 36528240 PMCID: PMC10025684 DOI: 10.1016/j.gpb.2022.11.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 08/17/2022] [Accepted: 11/24/2022] [Indexed: 12/23/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a routinely used technique to quantify the gene expression profile of thousands of single cells simultaneously. Analysis of scRNA-seq data plays an important role in the study of cell states and phenotypes, and has helped elucidate biological processes, such as those occurring during the development of complex organisms, and improved our understanding of disease states, such as cancer, diabetes, and coronavirus disease 2019 (COVID-19). Deep learning, a recent advance of artificial intelligence that has been used to address many problems involving large datasets, has also emerged as a promising tool for scRNA-seq data analysis, as it has a capacity to extract informative and compact features from noisy, heterogeneous, and high-dimensional scRNA-seq data to improve downstream analysis. The present review aims at surveying recently developed deep learning techniques in scRNA-seq data analysis, identifying key steps within the scRNA-seq data analysis pipeline that have been advanced by deep learning, and explaining the benefits of deep learning over more conventional analytic tools. Finally, we summarize the challenges in current deep learning approaches faced within scRNA-seq data and discuss potential directions for improvements in deep learning algorithms for scRNA-seq data analysis.
Collapse
Affiliation(s)
- Matthew Brendel
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA; Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Chang Su
- Department of Health Service Administration and Policy, Temple University, Philadelphia, PA 19122, USA.
| | - Zilong Bai
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Hao Zhang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Olivier Elemento
- Institute for Computational Biomedicine, Caryl and Israel Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, Cornell University, New York, NY 10065, USA.
| |
Collapse
|
29
|
Yang F, Wang W, Wang F, Fang Y, Tang D, Huang J, Lu H, Yao J. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. NAT MACH INTELL 2022. [DOI: 10.1038/s42256-022-00534-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
30
|
Roux AE, Zhang C, Paw J, Zavala-Solorio J, Malahias E, Vijay T, Kolumam G, Kenyon C, Kimmel JC. Diverse partial reprogramming strategies restore youthful gene expression and transiently suppress cell identity. Cell Syst 2022; 13:574-587.e11. [PMID: 35690067 DOI: 10.1016/j.cels.2022.05.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 02/15/2022] [Accepted: 05/10/2022] [Indexed: 01/25/2023]
Abstract
Partial pluripotent reprogramming can reverse features of aging in mammalian cells, but the impact on somatic identity and the necessity of individual reprogramming factors remain unknown. Here, we used single-cell genomics to map the identity trajectory induced by partial reprogramming in multiple murine cell types and dissected the influence of each factor by screening all Yamanaka Factor subsets with pooled single-cell screens. We found that partial reprogramming restored youthful expression in adipogenic and mesenchymal stem cells but also temporarily suppressed somatic identity programs. Our pooled screens revealed that many subsets of the Yamanaka Factors both restore youthful expression and suppress somatic identity, but these effects were not tightly entangled. We also found that a multipotent reprogramming strategy inspired by amphibian regeneration restored youthful expression in myogenic cells. Our results suggest that various sets of reprogramming factors can restore youthful expression with varying degrees of somatic identity suppression. A record of this paper's Transparent Peer Review process is included in the supplemental information.
Collapse
Affiliation(s)
- Antoine E Roux
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Chunlian Zhang
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Jonathan Paw
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - José Zavala-Solorio
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Evangelia Malahias
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Twaritha Vijay
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Ganesh Kolumam
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Cynthia Kenyon
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA
| | - Jacob C Kimmel
- Calico Life Sciences, LLC, 1170 Veterans Blvd, South San Francisco, CA 94080, USA.
| |
Collapse
|
31
|
Elmentaite R, Domínguez Conde C, Yang L, Teichmann SA. Single-cell atlases: shared and tissue-specific cell types across human organs. Nat Rev Genet 2022; 23:395-410. [PMID: 35217821 DOI: 10.1038/s41576-022-00449-w] [Citation(s) in RCA: 48] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2022] [Indexed: 12/12/2022]
Abstract
The development of single-cell and spatial transcriptomics methods was instrumental in the conception of the Human Cell Atlas initiative, which aims to generate an integrated map of all cells across the human body. These technology advances are bringing increasing depth and resolution to maps of human organs and tissues, as well as our understanding of individual human cell types. Commonalities as well as tissue-specific features of primary and supportive cell types across human organs are beginning to emerge from these human tissue maps. In this Review, we highlight key biological insights obtained from cross-tissue studies into epithelial, fibroblast, vascular and immune cells based on single-cell gene expression data in humans and contrast it with mechanisms reported in mice.
Collapse
Affiliation(s)
- Rasa Elmentaite
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Lu Yang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, UK.
| |
Collapse
|
32
|
Dohmen J, Baranovskii A, Ronen J, Uyar B, Franke V, Akalin A. Identifying tumor cells at the single-cell level using machine learning. Genome Biol 2022; 23:123. [PMID: 35637521 PMCID: PMC9150321 DOI: 10.1186/s13059-022-02683-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 05/06/2022] [Indexed: 12/15/2022] Open
Abstract
Tumors are complex tissues of cancerous cells surrounded by a heterogeneous cellular microenvironment with which they interact. Single-cell sequencing enables molecular characterization of single cells within the tumor. However, cell annotation—the assignment of cell type or cell state to each sequenced cell—is a challenge, especially identifying tumor cells within single-cell or spatial sequencing experiments. Here, we propose ikarus, a machine learning pipeline aimed at distinguishing tumor cells from normal cells at the single-cell level. We test ikarus on multiple single-cell datasets, showing that it achieves high sensitivity and specificity in multiple experimental contexts.
Collapse
Affiliation(s)
- Jan Dohmen
- Bioinformatics and Omics Data Science Platform, Berlin Institute For Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Str.28, 10115, Berlin, Germany
| | - Artem Baranovskii
- Non-coding RNAs and Mechanisms of Cytoplasmic Gene Regulation Lab, Berlin Institute for Medical Systems Biology, Hannoversche Str. 28, 10115, Berlin, Germany.,Free University Berlin, Kaiserswerther Str. 16-18, 14195, Berlin, Germany
| | - Jonathan Ronen
- Bioinformatics and Omics Data Science Platform, Berlin Institute For Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Str.28, 10115, Berlin, Germany
| | - Bora Uyar
- Bioinformatics and Omics Data Science Platform, Berlin Institute For Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Str.28, 10115, Berlin, Germany
| | - Vedran Franke
- Bioinformatics and Omics Data Science Platform, Berlin Institute For Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Str.28, 10115, Berlin, Germany.
| | - Altuna Akalin
- Bioinformatics and Omics Data Science Platform, Berlin Institute For Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Hannoversche Str.28, 10115, Berlin, Germany.
| |
Collapse
|
33
|
Domínguez Conde C, Xu C, Jarvis LB, Rainbow DB, Wells SB, Gomes T, Howlett SK, Suchanek O, Polanski K, King HW, Mamanova L, Huang N, Szabo PA, Richardson L, Bolt L, Fasouli ES, Mahbubani KT, Prete M, Tuck L, Richoz N, Tuong ZK, Campos L, Mousa HS, Needham EJ, Pritchard S, Li T, Elmentaite R, Park J, Rahmani E, Chen D, Menon DK, Bayraktar OA, James LK, Meyer KB, Yosef N, Clatworthy MR, Sims PA, Farber DL, Saeb-Parsy K, Jones JL, Teichmann SA. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 2022; 376:eabl5197. [PMID: 35549406 PMCID: PMC7612735 DOI: 10.1126/science.abl5197] [Citation(s) in RCA: 244] [Impact Index Per Article: 122.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Despite their crucial role in health and disease, our knowledge of immune cells within human tissues remains limited. We surveyed the immune compartment of 16 tissues from 12 adult donors by single-cell RNA sequencing and VDJ sequencing generating a dataset of ~360,000 cells. To systematically resolve immune cell heterogeneity across tissues, we developed CellTypist, a machine learning tool for rapid and precise cell type annotation. Using this approach, combined with detailed curation, we determined the tissue distribution of finely phenotyped immune cell types, revealing hitherto unappreciated tissue-specific features and clonal architecture of T and B cells. Our multitissue approach lays the foundation for identifying highly resolved immune cell types by leveraging a common reference dataset, tissue-integrated expression analysis, and antigen receptor sequencing.
Collapse
Affiliation(s)
- C Domínguez Conde
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - C Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - LB Jarvis
- Department of Clinical Neurosciences, University of Cambridge
| | - DB Rainbow
- Department of Clinical Neurosciences, University of Cambridge
| | - SB Wells
- Department of Systems Biology, Columbia University Irving Medical Center
| | - T Gomes
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - SK Howlett
- Department of Clinical Neurosciences, University of Cambridge
| | - O Suchanek
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK
| | - K Polanski
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - HW King
- Centre for Immunobiology, Blizard Institute, Queen Mary University of London, London, UK
| | - L Mamanova
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - N Huang
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - PA Szabo
- Department of Microbiology and Immunology, Columbia University Irving Medical Center
| | - L Richardson
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - L Bolt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - ES Fasouli
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - KT Mahbubani
- Department of Surgery, University of Cambridge and NIHR Cambridge Biomedical Research Centre, Cambridge, UK
| | - M Prete
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - L Tuck
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - N Richoz
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK
| | - ZK Tuong
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK
| | - L Campos
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- West Suffolk Hospital NHS Trust, Bury Saint Edmunds, UK
| | - HS Mousa
- Department of Clinical Neurosciences, University of Cambridge
| | - EJ Needham
- Department of Clinical Neurosciences, University of Cambridge
| | - S Pritchard
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - T Li
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - R Elmentaite
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - J Park
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - E Rahmani
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - D Chen
- Department of Systems Biology, Columbia University Irving Medical Center
| | - DK Menon
- Department of Anaesthesia, University of Cambridge, Cambridge, UK
| | - OA Bayraktar
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - LK James
- Centre for Immunobiology, Blizard Institute, Queen Mary University of London, London, UK
| | - KB Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - N Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - MR Clatworthy
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Molecular Immunity Unit, Department of Medicine, University of Cambridge, Cambridge, UK
| | - PA Sims
- Department of Systems Biology, Columbia University Irving Medical Center
| | - DL Farber
- Department of Microbiology and Immunology, Columbia University Irving Medical Center
| | - K Saeb-Parsy
- Department of Surgery, University of Cambridge and NIHR Cambridge Biomedical Research Centre, Cambridge, UK
| | - JL Jones
- Department of Clinical Neurosciences, University of Cambridge
| | - SA Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
- Theory of Condensed Matter, Cavendish Laboratory, Department of Physics, University of Cambridge, JJ Thomson Ave, Cambridge CB3 0HE, UK
| |
Collapse
|
34
|
Ha Y, Du Z, Tian J. Fine-grained interactive attention learning for semi-supervised white blood cell classification. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103611] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
35
|
Li K, Yan C, Li C, Chen L, Zhao J, Zhang Z, Bao S, Sun J, Zhou M. Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data. MOLECULAR THERAPY - NUCLEIC ACIDS 2022; 27:404-411. [PMID: 35036053 PMCID: PMC8728308 DOI: 10.1016/j.omtn.2021.12.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Recent advances in spatially resolved transcriptomics (SRT) have revolutionized biological and medical research and enabled unprecedented insight into the functional organization and cell communication of tissues and organs in situ. Identifying and elucidating gene spatial expression variation (SE analysis) is fundamental to elucidate the SRT landscape. There is an urgent need for public repositories and computational techniques of SRT data in SE analysis alongside technological breakthroughs and large-scale data generation. Increasing efforts to use in silico techniques in SE analysis have been made. However, these attempts are widely scattered among a large number of studies that are not easily accessible or comprehensible by both medical and life scientists. This study provides a survey and a summary of public resources on SE analysis in SRT studies. An updated systematic overview of state-of-the-art computational approaches and tools currently available in SE analysis are presented herein, emphasizing recent advances. Finally, the present study explores the future perspectives and challenges of in silico techniques in SE analysis. This study guides medical and life scientists to look for dedicated resources and more competent tools for characterizing spatial patterns of gene expression.
Collapse
Affiliation(s)
- Ke Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Congcong Yan
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Chenghao Li
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Lu Chen
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jingting Zhao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Zicheng Zhang
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Siqi Bao
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
| | - Jie Sun
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Jie Sun, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| | - Meng Zhou
- School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China
- Corresponding author Meng Zhou, School of Biomedical Engineering, School of Ophthalmology & Optometry and Eye Hospital, Wenzhou Medical University, Wenzhou 325027, P. R. China.
| |
Collapse
|
36
|
Cuperus JT. Single-cell genomics in plants: current state, future directions, and hurdles to overcome. PLANT PHYSIOLOGY 2022; 188:749-755. [PMID: 34662424 PMCID: PMC8825463 DOI: 10.1093/plphys/kiab478] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 09/21/2021] [Indexed: 05/26/2023]
Abstract
Single-cell genomics has the potential to revolutionize the study of plant development and tissue-specific responses to environmental stimuli by revealing heretofore unknown players and gene regulatory processes. Here, I focus on the current state of single-cell genomics in plants, emerging technologies and applications, in addition to outlining possible future directions for experiments. I describe approaches to enable cheaper and larger experiments and technologies to measure multiple types of molecules to better model and understand cell types and their different states and trajectories throughout development. Lastly, I discuss the inherent limitations of single-cell studies and the technological hurdles that need to be overcome to widely apply single-cell genomics in crops to generate the greatest possible knowledge gain.
Collapse
Affiliation(s)
- Josh T Cuperus
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
37
|
Zeng Y, Wei Z, Pan Z, Lu Y, Yang Y. A robust and scalable graph neural network for accurate single-cell classification. Brief Bioinform 2022; 23:6501353. [PMID: 35018408 DOI: 10.1093/bib/bbab570] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/01/2021] [Accepted: 12/11/2021] [Indexed: 12/25/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) techniques provide high-resolution data on cellular heterogeneity in diverse tissues, and a critical step for the data analysis is cell type identification. Traditional methods usually cluster the cells and manually identify cell clusters through marker genes, which is time-consuming and subjective. With the launch of several large-scale single-cell projects, millions of sequenced cells have been annotated and it is promising to transfer labels from the annotated datasets to newly generated datasets. One powerful way for the transferring is to learn cell relations through the graph neural network (GNN), but traditional GNNs are difficult to process millions of cells due to the expensive costs of the message-passing procedure at each training epoch. Here, we have developed a robust and scalable GNN-based method for accurate single-cell classification (GraphCS), where the graph is constructed to connect similar cells within and between labelled and unlabeled scRNA-seq datasets for propagation of shared information. To overcome the slow information propagation of GNN at each training epoch, the diffused information is pre-calculated via the approximate Generalized PageRank algorithm, enabling sublinear complexity over cell numbers. Compared with existing methods, GraphCS demonstrates better performance on simulated, cross-platform, cross-species and cross-omics scRNA-seq datasets. More importantly, our model provides a high speed and scalability on large datasets, and can achieve superior performance for 1 million cells within 50 min.
Collapse
Affiliation(s)
- Yuansong Zeng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhuoyi Wei
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Zixiang Pan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China.,Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou 510000, China
| |
Collapse
|
38
|
Mahin KF, Robiuddin M, Islam M, Ashraf S, Yeasmin F, Shatabda S. PanClassif: Improving pan cancer classification of single cell RNA-seq gene expression data using machine learning. Genomics 2022; 114:110264. [PMID: 34998929 DOI: 10.1016/j.ygeno.2022.01.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 12/23/2021] [Accepted: 01/03/2022] [Indexed: 11/04/2022]
Abstract
Cancer is one of the major causes of human death per year. In recent years, cancer identification and classification using machine learning have gained momentum due to the availability of high throughput sequencing data. Using RNA-seq, cancer research is blooming day by day and new insights of cancer and related treatments are coming into light. In this paper, we propose PanClassif, a method that requires a very few and effective genes to detect cancer from RNA-seq data and is able to provide performance gain in several wide range machine learning classifiers. We have taken 22 types of cancer samples from The Cancer Genome Atlas (TCGA) having 8287 cancer samples and 680 normal samples. Firstly, PanClassif uses k-Nearest Neighbour (k-NN) smoothing to smooth the samples to handle noise in the data. Then effective genes are selected by Anova based test. For balancing the train data, PanClassif applies an oversampling method, SMOTE. We have performed comprehensive experiments on the datasets using several classification algorithms. Experimental results shows that PanClassif outperform existing state-of-the-art methods available and shows consistent performance for two single cell RNA-seq datasets taken from Gene Expression Omnibus (GEO). PanClassif improves performances of a wide variety of classifiers for both binary cancer prediction and multi-class cancer classification. PanClassif is available as a python package (https://pypi.org/project/panclassif/). All the source code and materials of PanClassif are available at https://github.com/Zwei-inc/panclassif.
Collapse
Affiliation(s)
- Kazi Ferdous Mahin
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh.
| | - Md Robiuddin
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh.
| | - Mujahidul Islam
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Shayed Ashraf
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Farjana Yeasmin
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Plot-2, United City, Madani Avenue, Satarkul, Badda, Dhaka 1212, Bangladesh.
| |
Collapse
|
39
|
Xu Y, Das P, McCord RP. SMILE: mutual information learning for integration of single-cell omics data. Bioinformatics 2022; 38:476-486. [PMID: 34623402 DOI: 10.1093/bioinformatics/btab706] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 09/15/2021] [Accepted: 10/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single-cell omics data to be integrated across sources, types and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). RESULTS Using a unique cell-pairing design, SMILE successfully integrates multisource single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint-profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome-wide peaks for ATAC-seq. Integrated representations learned from joint-profiling technologies can then be used as a framework for comparing independent single source data. AVAILABILITY AND IMPLEMENTATION The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE, implemented in Python. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Xu
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA
| | - Priyojit Das
- UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN 37996, USA
| | - Rachel Patton McCord
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996, USA
| |
Collapse
|
40
|
James KR, Elmentaite R, Teichmann SA, Hold GL. Redefining intestinal immunity with single-cell transcriptomics. Mucosal Immunol 2022; 15:531-541. [PMID: 34848830 PMCID: PMC8630196 DOI: 10.1038/s41385-021-00470-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 10/27/2021] [Accepted: 11/03/2021] [Indexed: 02/04/2023]
Abstract
The intestinal immune system represents the largest collection of immune cells in the body and is continually exposed to antigens from food and the microbiota. Here we discuss the contribution of single-cell transcriptomics in shaping our understanding of this complex system. We consider the impact on resolving early intestine development, engagement with the neighbouring microbiota, diversity of intestinal immune cells, compartmentalisation within the intestines and interactions with non-immune cells. Finally, we offer a perspective on open questions about gut immunity that evolving single-cell technologies are well placed to address.
Collapse
Affiliation(s)
- Kylie Renee James
- grid.415306.50000 0000 9983 6924Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW 2010 Australia ,grid.1005.40000 0004 4902 0432School of Medical Sciences, University of New South Wales, Sydney, NSW 2006 Australia
| | - Rasa Elmentaite
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA UK
| | - Sarah Amalia Teichmann
- grid.10306.340000 0004 0606 5382Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA UK ,grid.5335.00000000121885934Theory of Condensed Matter Group, Cavendish Laboratory/Department of Physics, University of Cambridge, Cambridge, NSW CB3 0HE UK
| | - Georgina Louise Hold
- grid.1005.40000 0004 4902 0432University of New South Wales Microbiome Research Centre, Sydney, NSW 2217 Australia
| |
Collapse
|
41
|
Elemento O, Leslie C, Lundin J, Tourassi G. Artificial intelligence in cancer research, diagnosis and therapy. Nat Rev Cancer 2021; 21:747-752. [PMID: 34535775 DOI: 10.1038/s41568-021-00399-1] [Citation(s) in RCA: 71] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/10/2021] [Indexed: 11/09/2022]
Abstract
Artificial intelligence and machine learning techniques are breaking into biomedical research and health care, which importantly includes cancer research and oncology, where the potential applications are vast. These include detection and diagnosis of cancer, subtype classification, optimization of cancer treatment and identification of new therapeutic targets in drug discovery. While big data used to train machine learning models may already exist, leveraging this opportunity to realize the full promise of artificial intelligence in both the cancer research space and the clinical space will first require significant obstacles to be surmounted. In this Viewpoint article, we asked four experts for their opinions on how we can begin to implement artificial intelligence while ensuring standards are maintained so as transform cancer diagnosis and the prognosis and treatment of patients with cancer and to drive biological discovery.
Collapse
Affiliation(s)
- Olivier Elemento
- Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, Cornell University, New York, NY, USA.
| | - Christina Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Johan Lundin
- Department of Global Public Health, Karolinska Institutet, Stockholm, Sweden.
- Institute for Molecular Medicine Finland - FIMM, University of Helsinki, Helsinki, Finland.
- iCAN Digital Precision Cancer Medicine Flagship, University of Helsinki, Helsinki, Finland.
| | - Georgia Tourassi
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| |
Collapse
|
42
|
Osumi-Sutherland D, Xu C, Keays M, Levine AP, Kharchenko PV, Regev A, Lein E, Teichmann SA. Cell type ontologies of the Human Cell Atlas. Nat Cell Biol 2021; 23:1129-1135. [PMID: 34750578 DOI: 10.1038/s41556-021-00787-7] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 09/28/2021] [Indexed: 12/24/2022]
Abstract
Massive single-cell profiling efforts have accelerated our discovery of the cellular composition of the human body while at the same time raising the need to formalize this new knowledge. Here, we discuss current efforts to harmonize and integrate different sources of annotations of cell types and states into a reference cell ontology. We illustrate with examples how a unified ontology can consolidate and advance our understanding of cell types across scientific communities and biological domains.
Collapse
Affiliation(s)
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Maria Keays
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam P Levine
- Research Department of Pathology, University College London, London, UK
| | - Peter V Kharchenko
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Aviv Regev
- Genentech, South San Francisco, CA, USA.,Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ed Lein
- Allen Institute for Brain Science, Seattle, WA, USA
| | - Sarah A Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK. .,Cavendish Laboratory, University of Cambridge, Cambridge, UK.
| |
Collapse
|
43
|
|
44
|
Kimmel JC, Yi N, Roy M, Hendrickson DG, Kelley DR. Differentiation reveals latent features of aging and an energy barrier in murine myogenesis. Cell Rep 2021; 35:109046. [PMID: 33910007 DOI: 10.1016/j.celrep.2021.109046] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 06/23/2020] [Accepted: 04/07/2021] [Indexed: 12/14/2022] Open
Abstract
Skeletal muscle experiences a decline in lean mass and regenerative potential with age, in part due to intrinsic changes in progenitor cells. However, it remains unclear how age-related changes in progenitors manifest across a differentiation trajectory. Here, we perform single-cell RNA sequencing (RNA-seq) on muscle mononuclear cells from young and aged mice and profile muscle stem cells (MuSCs) and fibro-adipose progenitors (FAPs) after differentiation. Differentiation increases the magnitude of age-related change in MuSCs and FAPs, but it also masks a subset of age-related changes present in progenitors. Using a dynamical systems approach and RNA velocity, we find that aged MuSCs follow the same differentiation trajectory as young cells but stall in differentiation near a commitment decision. Our results suggest that differentiation reveals latent features of aging and that fate commitment decisions are delayed in aged myogenic cells in vitro.
Collapse
Affiliation(s)
- Jacob C Kimmel
- Calico Life Sciences, 1170 Veterans Blvd., South San Francisco, CA 94080, USA.
| | - Nelda Yi
- Calico Life Sciences, 1170 Veterans Blvd., South San Francisco, CA 94080, USA
| | - Margaret Roy
- Calico Life Sciences, 1170 Veterans Blvd., South San Francisco, CA 94080, USA
| | - David G Hendrickson
- Calico Life Sciences, 1170 Veterans Blvd., South San Francisco, CA 94080, USA
| | - David R Kelley
- Calico Life Sciences, 1170 Veterans Blvd., South San Francisco, CA 94080, USA.
| |
Collapse
|