1
|
Yan Y, Zhu S, Jia M, Chen X, Qi W, Gu F, Valencak TG, Liu JX, Sun HZ. Advances in single-cell transcriptomics in animal research. J Anim Sci Biotechnol 2024; 15:102. [PMID: 39090689 PMCID: PMC11295521 DOI: 10.1186/s40104-024-01063-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 06/12/2024] [Indexed: 08/04/2024] Open
Abstract
Understanding biological mechanisms is fundamental for improving animal production and health to meet the growing demand for high-quality protein. As an emerging biotechnology, single-cell transcriptomics has been gradually applied in diverse aspects of animal research, offering an effective method to study the gene expression of high-throughput single cells of different tissues/organs in animals. In an unprecedented manner, researchers have identified cell types/subtypes and their marker genes, inferred cellular fate trajectories, and revealed cell‒cell interactions in animals using single-cell transcriptomics. In this paper, we introduce the development of single-cell technology and review the processes, advancements, and applications of single-cell transcriptomics in animal research. We summarize recent efforts using single-cell transcriptomics to obtain a more profound understanding of animal nutrition and health, reproductive performance, genetics, and disease models in different livestock species. Moreover, the practical experience accumulated based on a large number of cases is highlighted to provide a reference for determining key factors (e.g., sample size, cell clustering, and cell type annotation) in single-cell transcriptomics analysis. We also discuss the limitations and outlook of single-cell transcriptomics in the current stage. This paper describes the comprehensive progress of single-cell transcriptomics in animal research, offering novel insights and sustainable advancements in agricultural productivity and animal health.
Collapse
Affiliation(s)
- Yunan Yan
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Senlin Zhu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Minghui Jia
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Xinyi Chen
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Wenlingli Qi
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Fengfei Gu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, Zhejiang University, Hangzhou, 310058, China
| | - Teresa G Valencak
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
- Agency for Health and Food Safety Austria, 1220, Vienna, Austria
| | - Jian-Xin Liu
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Hui-Zeng Sun
- Institute of Dairy Science, Ministry of Education Key Laboratory of Molecular Animal Nutrition, College of Animal Sciences, Zhejiang University, Hangzhou, 310058, China.
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
2
|
Li J, Shyr Y, Liu Q. aKNNO: single-cell and spatial transcriptomics clustering with an optimized adaptive k-nearest neighbor graph. Genome Biol 2024; 25:203. [PMID: 39090647 PMCID: PMC11293182 DOI: 10.1186/s13059-024-03339-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 07/16/2024] [Indexed: 08/04/2024] Open
Abstract
Typical clustering methods for single-cell and spatial transcriptomics struggle to identify rare cell types, while approaches tailored to detect rare cell types gain this ability at the cost of poorer performance for grouping abundant ones. Here, we develop aKNNO to simultaneously identify abundant and rare cell types based on an adaptive k-nearest neighbor graph with optimization. Benchmarking on 38 simulated and 20 single-cell and spatial transcriptomics datasets demonstrates that aKNNO identifies both abundant and rare cell types more accurately than general and specialized methods. Using only gene expression aKNNO maps abundant and rare cells more precisely compared to integrative approaches.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37203, USA.
| |
Collapse
|
3
|
Ramirez A, Orcutt-Jahns BT, Pascoe S, Abraham A, Remigio B, Thomas N, Meyer AS. Integrative, high-resolution analysis of single cells across experimental conditions with PARAFAC2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.29.605698. [PMID: 39131377 PMCID: PMC11312543 DOI: 10.1101/2024.07.29.605698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Effective tools for exploration and analysis are needed to extract insights from large-scale single-cell measurement data. However, current techniques for handling single-cell studies performed across experimental conditions (e.g., samples, perturbations, or patients) require restrictive assumptions, lack flexibility, or do not adequately deconvolute condition-to-condition variation from cell-to-cell variation. Here, we report that the tensor decomposition method PARAFAC2 (Pf2) enables the dimensionality reduction of single-cell data across conditions. We demonstrate these benefits across two distinct contexts of single-cell RNA-sequencing (scRNA-seq) experiments of peripheral immune cells: pharmacologic drug perturbations and systemic lupus erythematosus (SLE) patient samples. By isolating relevant gene modules across cells and conditions, Pf2 enables straightforward associations of gene variation patterns across specific patients or perturbations while connecting each coordinated change to certain cells without pre-defining cell types. The theoretical grounding of Pf2 suggests a unified framework for many modeling tasks associated with single-cell data. Thus, Pf2 provides an intuitive universal dimensionality reduction approach for multi-sample single-cell studies across diverse biological contexts. Highlights PARAFAC2 enables tensor-based analysis of single-cell experiments across conditions.PARAFAC2 separates condition-specific effects from cell-to-cell variation.PARAFAC2 provides intuitive isolation of patterns into condition-, cell-, and gene-specific patterns.
Collapse
|
4
|
Sun F, Li H, Sun D, Fu S, Gu L, Shao X, Wang Q, Dong X, Duan B, Xing F, Wu J, Xiao M, Zhao F, Han JDJ, Liu Q, Fan X, Li C, Wang C, Shi T. Single-cell omics: experimental workflow, data analyses and applications. SCIENCE CHINA. LIFE SCIENCES 2024:10.1007/s11427-023-2561-0. [PMID: 39060615 DOI: 10.1007/s11427-023-2561-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 04/18/2024] [Indexed: 07/28/2024]
Abstract
Cells are the fundamental units of biological systems and exhibit unique development trajectories and molecular features. Our exploration of how the genomes orchestrate the formation and maintenance of each cell, and control the cellular phenotypes of various organismsis, is both captivating and intricate. Since the inception of the first single-cell RNA technology, technologies related to single-cell sequencing have experienced rapid advancements in recent years. These technologies have expanded horizontally to include single-cell genome, epigenome, proteome, and metabolome, while vertically, they have progressed to integrate multiple omics data and incorporate additional information such as spatial scRNA-seq and CRISPR screening. Single-cell omics represent a groundbreaking advancement in the biomedical field, offering profound insights into the understanding of complex diseases, including cancers. Here, we comprehensively summarize recent advances in single-cell omics technologies, with a specific focus on the methodology section. This overview aims to guide researchers in selecting appropriate methods for single-cell sequencing and related data analysis.
Collapse
Affiliation(s)
- Fengying Sun
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China
| | - Haoyan Li
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Dongqing Sun
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Shaliu Fu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Lei Gu
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Shao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China
| | - Qinqin Wang
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
| | - Xin Dong
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Bin Duan
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China
| | - Feiyang Xing
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jun Wu
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China
| | - Minmin Xiao
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, 100101, China.
| | - Jing-Dong J Han
- Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Center for Quantitative Biology (CQB), Peking University, Beijing, 100871, China.
| | - Qi Liu
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Translational Medical Center for Stem Cell Therapy and Institute for Regenerative Medicine, Shanghai East Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Research Institute of Intelligent Computing, Zhejiang Lab, Hangzhou, 311121, China.
- Shanghai Research Institute for Intelligent Autonomous Systems, Shanghai, 201210, China.
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China.
- National Key Laboratory of Chinese Medicine Modernization, Innovation Center of Yangtze River Delta, Zhejiang University, Jiaxing, 314103, China.
- Zhejiang Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, China.
| | - Chen Li
- Center for Single-cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| | - Chenfei Wang
- Key Laboratory of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University), Ministry of Education, Orthopaedic Department, Tongji Hospital, Bioinformatics Department, School of Life Sciences and Technology, Tongji University, Shanghai, 200082, China.
- Frontier Science Center for Stem Cells, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| | - Tieliu Shi
- Department of Clinical Laboratory, the Affiliated Wuhu Hospital of East China Normal University (The Second People's Hospital of Wuhu City), Wuhu, 241000, China.
- Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, 200241, China.
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
5
|
de Winter N, Ji J, Sintou A, Forte E, Lee M, Noseda M, Li A, Koenig AL, Lavine KJ, Hayat S, Rosenthal N, Emanueli C, Srivastava PK, Sattler S. Persistent transcriptional changes in cardiac adaptive immune cells following myocardial infarction: New evidence from the re-analysis of publicly available single cell and nuclei RNA-sequencing data sets. J Mol Cell Cardiol 2024; 192:48-64. [PMID: 38734060 DOI: 10.1016/j.yjmcc.2024.04.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 03/17/2024] [Accepted: 04/29/2024] [Indexed: 05/13/2024]
Abstract
INTRODUCTION Chronic immunopathology contributes to the development of heart failure after a myocardial infarction. Both T and B cells of the adaptive immune system are present in the myocardium and have been suggested to be involved in post-MI immunopathology. METHODS We analyzed the B and T cell populations isolated from previously published single cell RNA-sequencing data sets (PMID: 32130914, PMID: 35948637, PMID: 32971526 and PMID: 35926050), of the mouse and human heart, using differential expression analysis, functional enrichment analysis, gene regulatory inferences, and integration with autoimmune and cardiovascular GWAS. RESULTS Already at baseline, mature effector B and T cells are present in the human and mouse heart, having increased activity in transcription factors maintaining tolerance (e.g. DEAF1, JDP2, SPI-B). Following MI, T cells upregulate pro-inflammatory transcript levels (e.g. Cd11, Gzmk, Prf1), while B cells upregulate activation markers (e.g. Il6, Il1rn, Ccl6) and collagen (e.g. Col5a2, Col4a1, Col1a2). Importantly, pro-inflammatory and fibrotic transcription factors (e.g. NFKB1, CREM, REL) remain active in T cells, while B cells maintain elevated activity in transcription factors related to immunoglobulin production (e.g. ERG, REL) in both mouse and human post-MI hearts. Notably, genes differentially expressed in post-MI T and B cells are associated with cardiovascular and autoimmune disease. CONCLUSION These findings highlight the varied and time-dependent dynamic roles of post-MI T and B cells. They appear ready-to-go and are activated immediately after MI, thus participate in the acute wound healing response. However, they subsequently remain in a state of pro-inflammatory activation contributing to persistent immunopathology.
Collapse
Affiliation(s)
- Natasha de Winter
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Jiahui Ji
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Amalia Sintou
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Elvira Forte
- The Jackson Laboratory, Bar Harbor, United States
| | - Michael Lee
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Michela Noseda
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Aoxue Li
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Medicine Solna, Division of Cardiovascular Medicine, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
| | - Andrew L Koenig
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | - Kory J Lavine
- Center for Cardiovascular Research, Department of Medicine, Cardiovascular Division, Washington University School of Medicine, St. Louis, MO, United States
| | | | - Nadia Rosenthal
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; The Jackson Laboratory, Bar Harbor, United States
| | - Costanza Emanueli
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; British Heart Foundation Centre For Research Excellence, Imperial College London, United Kingdom
| | - Prashant K Srivastava
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom
| | - Susanne Sattler
- National Heart and Lung Institute, Faculty of Medicine, Imperial College London, United Kingdom; Department of Cardiology, Medical University of Graz, Austria; Division of Pharmacology, Otto Loewi Research Center, Medical University of Graz, Austria.
| |
Collapse
|
6
|
Huynh KLA, Tyc KM, Matuck BF, Easter QT, Pratapa A, Kumar NV, Pérez P, Kulchar RJ, Pranzatelli TJ, de Souza D, Weaver TM, Qu X, Soares Junior LAV, Dolhnokoff M, Kleiner DE, Hewitt SM, Ferraz da Silva LF, Rocha VG, Warner BM, Byrd KM, Liu J. Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT. RESEARCH SQUARE 2024:rs.3.rs-4536158. [PMID: 38978567 PMCID: PMC11230516 DOI: 10.21203/rs.3.rs-4536158/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Identifying cell types and states remains a time-consuming, error-prone challenge for spatial biology. While deep learning is increasingly used, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we developed TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data. TACIT uses unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000-cells; 51-cell types) from three niches (brain, intestine, gland), TACIT outperformed existing unsupervised methods in accuracy and scalability. Integrating TACIT-identified cell types with a novel Shiny app revealed new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discovered under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.
Collapse
Affiliation(s)
- Khoa L. A. Huynh
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Katarzyna M. Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| | - Bruno F. Matuck
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Quinn T. Easter
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Aditya Pratapa
- Department of Cell Biology, Duke University, Durham, NC, USA
| | - Nikhil V. Kumar
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Paola Pérez
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel J. Kulchar
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Thomas J.F. Pranzatelli
- Adeno-Associated Virus Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Deiziane de Souza
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - Theresa M. Weaver
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Xufeng Qu
- Massey Cancer Center, Richmond VA, USA
| | | | - Marisa Dolhnokoff
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - David E. Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M. Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Vanderson Geraldo Rocha
- Department of Hematology, Transfusion and Cell Therapy Service, University of Sao Paulo, Sao Paulo, Brazil
| | - Blake M. Warner
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Kevin M. Byrd
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| |
Collapse
|
7
|
Jia Y, Ma P, Yao Q. CellMarkerPipe: cell marker identification and evaluation pipeline in single cell transcriptomes. Sci Rep 2024; 14:13151. [PMID: 38849445 PMCID: PMC11161599 DOI: 10.1038/s41598-024-63492-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Accepted: 05/29/2024] [Indexed: 06/09/2024] Open
Abstract
Assessing marker genes from all cell clusters can be time-consuming and lack systematic strategy. Streamlining this process through a unified computational platform that automates identification and benchmarking will greatly enhance efficiency and ensure a fair evaluation. We therefore developed a novel computational platform, cellMarkerPipe ( https://github.com/yao-laboratory/cellMarkerPipe ), for automated cell-type specific marker gene identification from scRNA-seq data, coupled with comprehensive evaluation schema. CellMarkerPipe adaptively wraps around a collection of commonly used and state-of-the-art tools, including Seurat, COSG, SC3, SCMarker, COMET, and scGeneFit. From rigorously testing across diverse samples, we ascertain SCMarker's overall reliable performance in single marker gene selection, with COSG showing commendable speed and comparable efficacy. Furthermore, we demonstrate the pivotal role of our approach in real-world medical datasets. This general and opensource pipeline stands as a significant advancement in streamlining cell marker gene identification and evaluation, fitting broad applications in the field of cellular biology and medical research.
Collapse
Affiliation(s)
- Yinglu Jia
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
- Department of Chemistry, University of Nebraska Lincoln, Hamilton Hall, Lincoln, NE, 68588, USA
| | - Pengchong Ma
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA
| | - Qiuming Yao
- School of Computing, University of Nebraska Lincoln, 256 Avery Hall, Lincoln, NE, 68588, USA.
- Nebraska Center for the Prevention of Obesity Diseases, 316C Leverton Hall, Lincoln, NE, 68583, USA.
- Nebraska Center for Virology, University of Nebraska, 4240 Fair St., Lincoln, NE, 68583, USA.
| |
Collapse
|
8
|
Duo H, Li Y, Lan Y, Tao J, Yang Q, Xiao Y, Sun J, Li L, Nie X, Zhang X, Liang G, Liu M, Hao Y, Li B. Systematic evaluation with practical guidelines for single-cell and spatially resolved transcriptomics data simulation under multiple scenarios. Genome Biol 2024; 25:145. [PMID: 38831386 PMCID: PMC11149245 DOI: 10.1186/s13059-024-03290-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 05/28/2024] [Indexed: 06/05/2024] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) have led to groundbreaking advancements in life sciences. To develop bioinformatics tools for scRNA-seq and SRT data and perform unbiased benchmarks, data simulation has been widely adopted by providing explicit ground truth and generating customized datasets. However, the performance of simulation methods under multiple scenarios has not been comprehensively assessed, making it challenging to choose suitable methods without practical guidelines. RESULTS We systematically evaluated 49 simulation methods developed for scRNA-seq and/or SRT data in terms of accuracy, functionality, scalability, and usability using 152 reference datasets derived from 24 platforms. SRTsim, scDesign3, ZINB-WaVE, and scDesign2 have the best accuracy performance across various platforms. Unexpectedly, some methods tailored to scRNA-seq data have potential compatibility for simulating SRT data. Lun, SPARSim, and scDesign3-tree outperform other methods under corresponding simulation scenarios. Phenopath, Lun, Simple, and MFA yield high scalability scores but they cannot generate realistic simulated data. Users should consider the trade-offs between method accuracy and scalability (or functionality) when making decisions. Additionally, execution errors are mainly caused by failed parameter estimations and appearance of missing or infinite values in calculations. We provide practical guidelines for method selection, a standard pipeline Simpipe ( https://github.com/duohongrui/simpipe ; https://doi.org/10.5281/zenodo.11178409 ), and an online tool Simsite ( https://www.ciblab.net/software/simshiny/ ) for data simulation. CONCLUSIONS No method performs best on all criteria, thus a good-yet-not-the-best method is recommended if it solves problems effectively and reasonably. Our comprehensive work provides crucial insights for developers on modeling gene expression data and fosters the simulation process for users.
Collapse
Affiliation(s)
- Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Yinghong Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, People's Republic of China
| | - Yang Lan
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Army Medical University, Chongqing, 400038, People's Republic of China
| | - Jingxin Tao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Qingxia Yang
- Zhejiang Provincial Key Laboratory of Precision Diagnosis and Therapy for Major Gynecological Diseases, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, People's Republic of China
| | - Yingxue Xiao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Jing Sun
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Lei Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Xiaoxi Zhang
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, People's Republic of China
| | - Mingwei Liu
- Key Laboratory of Clinical Laboratory Diagnostics, College of Laboratory Medicine, Chongqing Medical University, Chongqing, 400016, People's Republic of China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 401331, People's Republic of China.
| |
Collapse
|
9
|
Huynh KLA, Tyc KM, Matuck BF, Easter QT, Pratapa A, Kumar NV, Pérez P, Kulchar R, Pranzatelli T, de Souza D, Weaver TM, Qu X, Valente Soares LA, Dolhnokoff M, Kleiner DE, Hewitt SM, da Silva LFF, Rocha VG, Warner BM, Byrd KM, Liu J. Spatial Deconvolution of Cell Types and Cell States at Scale Utilizing TACIT. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.31.596861. [PMID: 38895230 PMCID: PMC11185514 DOI: 10.1101/2024.05.31.596861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Identifying cell types and states remains a time-consuming and error-prone challenge for spatial biology. While deep learning is increasingly used, it is difficult to generalize due to variability at the level of cells, neighborhoods, and niches in health and disease. To address this, we developed TACIT, an unsupervised algorithm for cell annotation using predefined signatures that operates without training data, using unbiased thresholding to distinguish positive cells from background, focusing on relevant markers to identify ambiguous cells in multiomic assays. Using five datasets (5,000,000-cells; 51-cell types) from three niches (brain, intestine, gland), TACIT outperformed existing unsupervised methods in accuracy and scalability. Integration of TACIT-identified cell with a novel Shiny app revealed new phenotypes in two inflammatory gland diseases. Finally, using combined spatial transcriptomics and proteomics, we discover under- and overrepresented immune cell types and states in regions of interest, suggesting multimodality is essential for translating spatial biology to clinical applications.
Collapse
Affiliation(s)
- Khoa L. A. Huynh
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Katarzyna M. Tyc
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| | - Bruno F. Matuck
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Quinn T. Easter
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Aditya Pratapa
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Nikhil V. Kumar
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Paola Pérez
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Rachel Kulchar
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Thomas Pranzatelli
- Adeno-Associated Virus Biology Section, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Deiziane de Souza
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - Theresa M. Weaver
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
| | - Xufeng Qu
- Massey Cancer Center, Richmond VA, USA
| | | | - Marisa Dolhnokoff
- Department of Pathology, Medicine School of University of Sao Paulo, SP, BR
| | - David E. Kleiner
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Stephen M. Hewitt
- Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Vanderson Geraldo Rocha
- Department of Hematology, Transfusion and Cell Therapy Service, University of Sao Paulo, Sao Paulo, Brazil
| | - Blake M. Warner
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
| | - Kevin M. Byrd
- Lab of Oral & Craniofacial Innovation (LOCI), Department of Innovation & Technology Research, ADA Science & Research Institute, Gaithersburg, MD, USA
- Salivary Disorders Unit, National Institute of Dental and Craniofacial Research, National Institutes of Health, Bethesda, MD, USA
- Division of Oral and Craniofacial Health Sciences, Adams School of Dentistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Jinze Liu
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
- Massey Cancer Center, Richmond VA, USA
| |
Collapse
|
10
|
Cai L, Anastassiou D. CASCC: a co-expression-assisted single-cell RNA-seq data clustering method. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae283. [PMID: 38662553 DOI: 10.1093/bioinformatics/btae283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/28/2024] [Accepted: 04/23/2024] [Indexed: 05/15/2024]
Abstract
SUMMARY Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.
Collapse
Affiliation(s)
- Lingyi Cai
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
| | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY 10032, United States
- Department of Electrical Engineering, Columbia University, New York, NY 10027, United States
- Irving Comprehensive Cancer Center, Columbia University, New York, NY 10032, United States
| |
Collapse
|
11
|
Tadi AA, Alhadidi D, Rueda L. PPPCT: Privacy-Preserving framework for Parallel Clustering Transcriptomics data. Comput Biol Med 2024; 173:108351. [PMID: 38520921 DOI: 10.1016/j.compbiomed.2024.108351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 03/18/2024] [Accepted: 03/18/2024] [Indexed: 03/25/2024]
Abstract
Single-cell transcriptomics data provides crucial insights into patients' health, yet poses significant privacy concerns. Genomic data privacy attacks can have deep implications, encompassing not only the patients' health information but also extending widely to compromise their families'. Moreover, the permanence of leaked data exacerbates the challenges, making retraction an impossibility. While extensive efforts have been directed towards clustering single-cell transcriptomics data, addressing critical challenges, especially in the realm of privacy, remains pivotal. This paper introduces an efficient, fast, privacy-preserving approach for clustering single-cell RNA-sequencing (scRNA-seq) datasets. The key contributions include ensuring data privacy, achieving high-quality clustering, accommodating the high dimensionality inherent in the datasets, and maintaining reasonable computation time for big-scale datasets. Our proposed approach utilizes the map-reduce scheme to parallelize clustering, addressing intensive calculation challenges. Intel Software Guard eXtension (SGX) processors are used to ensure the security of sensitive code and data during processing. Additionally, the approach incorporates a logarithm transformation as a preprocessing step, employs non-negative matrix factorization for dimensionality reduction, and utilizes parallel k-means for clustering. The approach fully leverages the computing capabilities of all processing resources within a secure private cloud environment. Experimental results demonstrate the efficacy of our approach in preserving patient privacy while surpassing state-of-the-art methods in both clustering quality and computation time. Our method consistently achieves a minimum of 7% higher Adjusted Rand Index (ARI) than existing approaches, contingent on dataset size. Additionally, due to parallel computations and dimensionality reduction, our approach exhibits efficiency, converging to very good results in less than 10 seconds for a scRNA-seq dataset with 5000 genes and 6000 cells when prioritizing privacy and under two seconds without privacy considerations. Availability and implementation Code and datasets availability: https://github.com/University-of-Windsor/PPPCT.
Collapse
Affiliation(s)
- Ali Abbasi Tadi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada.
| | - Dima Alhadidi
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| | - Luis Rueda
- University of Windsor, 401 Sunset Ave, Windsor, N9B 3P4, Ontario, Canada
| |
Collapse
|
12
|
An S, Shi J, Liu R, Chen Y, Wang J, Hu S, Xia X, Dong G, Bo X, He Z, Ying X. scDAC: deep adaptive clustering of single-cell transcriptomic data with coupled autoencoder and Dirichlet process mixture model. Bioinformatics 2024; 40:btae198. [PMID: 38603616 PMCID: PMC11256937 DOI: 10.1093/bioinformatics/btae198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 03/20/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Clustering analysis for single-cell RNA sequencing (scRNA-seq) data is an important step in revealing cellular heterogeneity. Many clustering methods have been proposed to discover heterogenous cell types from scRNA-seq data. However, adaptive clustering with accurate cluster number reflecting intrinsic biology nature from large-scale scRNA-seq data remains quite challenging. RESULTS Here, we propose a single-cell Deep Adaptive Clustering (scDAC) model by coupling the Autoencoder (AE) and the Dirichlet Process Mixture Model (DPMM). By jointly optimizing the model parameters of AE and DPMM, scDAC achieves adaptive clustering with accurate cluster numbers on scRNA-seq data. We verify the performance of scDAC on five subsampled datasets with different numbers of cell types and compare it with 15 widely used clustering methods across nine scRNA-seq datasets. Our results demonstrate that scDAC can adaptively find accurate numbers of cell types or subtypes and outperforms other methods. Moreover, the performance of scDAC is robust to hyperparameter changes. AVAILABILITY AND IMPLEMENTATION The scDAC is implemented in Python. The source code is available at https://github.com/labomics/scDAC.
Collapse
Affiliation(s)
- Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Jinhui Shi
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Jing Wang
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xinyu Xia
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xiaochen Bo
- Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China
| | - Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing 100850, China
| |
Collapse
|
13
|
Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers (Basel) 2024; 16:1350. [PMID: 38611028 PMCID: PMC11011054 DOI: 10.3390/cancers16071350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/25/2024] [Accepted: 03/28/2024] [Indexed: 04/14/2024] Open
Abstract
Topic modeling is a popular technique in machine learning and natural language processing, where a corpus of text documents is classified into themes or topics using word frequency analysis. This approach has proven successful in various biological data analysis applications, such as predicting cancer subtypes with high accuracy and identifying genes, enhancers, and stable cell types simultaneously from sparse single-cell epigenomics data. The advantage of using a topic model is that it not only serves as a clustering algorithm, but it can also explain clustering results by providing word probability distributions over topics. Our study proposes a novel topic modeling approach for clustering single cells and detecting topics (gene signatures) in single-cell datasets that measure multiple omics simultaneously. We applied this approach to examine the transcriptional heterogeneity of luminal and triple-negative breast cancer cells using patient-derived xenograft models with acquired resistance to chemotherapy and targeted therapy. Through this approach, we identified protein-coding genes and long non-coding RNAs (lncRNAs) that group thousands of cells into biologically similar clusters, accurately distinguishing drug-sensitive and -resistant breast cancer types. In comparison to standard state-of-the-art clustering analyses, our approach offers an optimal partitioning of genes into topics and cells into clusters simultaneously, producing easily interpretable clustering outcomes. Additionally, we demonstrate that an integrative clustering approach, which combines the information from mRNAs and lncRNAs treated as disjoint omics layers, enhances the accuracy of cell classification.
Collapse
Affiliation(s)
- Gabriele Malagoli
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Filippo Valle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Emmanuel Barillot
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| | - Michele Caselle
- Physics Department, University of Turin and INFN, 10125 Turin, Italy;
| | - Loredana Martignetti
- Institut Curie, Inserm U900, Mines ParisTech, PSL Research University, 75248 Paris, France; (G.M.); (E.B.)
| |
Collapse
|
14
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. Brief Bioinform 2024; 25:bbae216. [PMID: 38725155 PMCID: PMC11082074 DOI: 10.1093/bib/bbae216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 03/01/2024] [Accepted: 04/25/2024] [Indexed: 05/13/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Jack R Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
| | - Maigan A Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Todd M Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, United States
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, United States
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, United States
- Diabetes Institute, University of Florida, Gainesville, FL 32610, United States
| |
Collapse
|
15
|
Lodi MK, Lodi M, Osei K, Ranganathan V, Hwang P, Ghosh P. CHAI: Consensus Clustering Through Similarity Matrix Integration for Cell-Type Identification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.19.585758. [PMID: 38562750 PMCID: PMC10983883 DOI: 10.1101/2024.03.19.585758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Several methods have been developed to computationally predict cell-types for single cell RNA sequencing (scRNAseq) data. As methods are developed, a common problem for investigators has been identifying the best method they should apply to their specific use-case. To address this challenge, we present CHAI (consensus Clustering tHrough similArIty matrix integratIon for single cell type identification), a wisdom of crowds approach for scRNAseq clustering. CHAI presents two competing methods which aggregate the clustering results from seven state of the art clustering methods: CHAI-AvgSim and CHAI-SNF. Both methods demonstrate improved performance on a diverse selection of benchmarking datasets, besides also outperforming a previous consensus clustering method. We demonstrate CHAI's practical use case by identifying a leader tumor cell cluster enriched with CDH3. CHAI provides a platform for multiomic integration, and we demonstrate CHAI-SNF to have improved performance when including spatial transcriptomics data. CHAI is intuitive and easily customizable; it provides a way for users to add their own clustering methods to the pipeline, or down-select just the ones they want to use for the clustering aggregation. CHAI is available as an open source R package on GitHub: https://github.com/lodimk2/chai.
Collapse
Affiliation(s)
- Musaddiq K Lodi
- Integrative Life Sciences, Virginia Commonwealth University, Richmond, VA 23284
| | - Muzammil Lodi
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| | - Kezie Osei
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284
| | | | - Priscilla Hwang
- Department of Biomedical Engineering, Virginia Commonwealth University, Richmond, VA 23284
| | - Preetam Ghosh
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284
| |
Collapse
|
16
|
Tan CL, Lindner K, Boschert T, Meng Z, Rodriguez Ehrenfried A, De Roia A, Haltenhof G, Faenza A, Imperatore F, Bunse L, Lindner JM, Harbottle RP, Ratliff M, Offringa R, Poschke I, Platten M, Green EW. Prediction of tumor-reactive T cell receptors from scRNA-seq data for personalized T cell therapy. Nat Biotechnol 2024:10.1038/s41587-024-02161-y. [PMID: 38454173 DOI: 10.1038/s41587-024-02161-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 02/01/2024] [Indexed: 03/09/2024]
Abstract
The identification of patient-derived, tumor-reactive T cell receptors (TCRs) as a basis for personalized transgenic T cell therapies remains a time- and cost-intensive endeavor. Current approaches to identify tumor-reactive TCRs analyze tumor mutations to predict T cell activating (neo)antigens and use these to either enrich tumor infiltrating lymphocyte (TIL) cultures or validate individual TCRs for transgenic autologous therapies. Here we combined high-throughput TCR cloning and reactivity validation to train predicTCR, a machine learning classifier that identifies individual tumor-reactive TILs in an antigen-agnostic manner based on single-TIL RNA sequencing. PredicTCR identifies tumor-reactive TCRs in TILs from diverse cancers better than previous gene set enrichment-based approaches, increasing specificity and sensitivity (geometric mean) from 0.38 to 0.74. By predicting tumor-reactive TCRs in a matter of days, TCR clonotypes can be prioritized to accelerate the manufacture of personalized T cell therapies.
Collapse
Affiliation(s)
- C L Tan
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - K Lindner
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - T Boschert
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
| | - Z Meng
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - A Rodriguez Ehrenfried
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- Helmholtz Institute for Translational Oncology, Mainz, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
| | - A De Roia
- Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - G Haltenhof
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | | | - L Bunse
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany
| | | | - R P Harbottle
- DNA Vector Laboratory, German Cancer Research Center, Heidelberg, Germany
| | - M Ratliff
- Department of Neurosurgery, University Hospital Mannheim, Mannheim, Germany
| | - R Offringa
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
- Division of Molecular Oncology of Gastrointestinal Tumors, German Cancer Research Center, Heidelberg, Germany
- Sino-German Laboratory of Personalized Medicine for Pancreatic Cancer, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - I Poschke
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany
| | - M Platten
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
- Immune Monitoring Unit, National Center for Tumor Diseases, Heidelberg, Germany.
- Helmholtz Institute for Translational Oncology, Mainz, Germany.
- German Cancer Research Center-Hector Cancer Institute at the Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| | - E W Green
- CCU Neuroimmunology and Brain Tumor Immunology, German Cancer Research Center, Heidelberg, Germany.
- German Cancer Consortium, Core Center Heidelberg, Heidelberg, Germany.
- Department of Neurology, Medical Faculty Mannheim, Mannheim Center for Translational Neuroscience, Heidelberg University, Mannheim, Germany.
| |
Collapse
|
17
|
Nwizu C, Hughes M, Ramseier ML, Navia AW, Shalek AK, Fusi N, Raghavan S, Winter PS, Amini AP, Crawford L. Scalable nonparametric clustering with unified marker gene selection for single-cell RNA-seq data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.11.579839. [PMID: 38405697 PMCID: PMC10888887 DOI: 10.1101/2024.02.11.579839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Clustering is commonly used in single-cell RNA-sequencing (scRNA-seq) pipelines to characterize cellular heterogeneity. However, current methods face two main limitations. First, they require user-specified heuristics which add time and complexity to bioinformatic workflows; second, they rely on post-selective differential expression analyses to identify marker genes driving cluster differences, which has been shown to be subject to inflated false discovery rates. We address these challenges by introducing nonparametric clustering of single-cell populations (NCLUSION): an infinite mixture model that leverages Bayesian sparse priors to identify marker genes while simultaneously performing clustering on single-cell expression data. NCLUSION uses a scalable variational inference algorithm to perform these analyses on datasets with up to millions of cells. By analyzing publicly available scRNA-seq studies, we demonstrate that NCLUSION (i) matches the performance of other state-of-the-art clustering techniques with significantly reduced runtime and (ii) provides statistically robust and biologically relevant transcriptomic signatures for each of the clusters it identifies. Overall, NCLUSION represents a reliable hypothesis-generating tool for understanding patterns of expression variation present in single-cell populations.
Collapse
Affiliation(s)
- Chibuikem Nwizu
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Warren Alpert Medical School of Brown University, Providence, RI, USA
| | | | - Michelle L. Ramseier
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrew W. Navia
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Alex K. Shalek
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | | | - Srivatsan Raghavan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Peter S. Winter
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Microsoft Research, Cambridge, MA, USA
- Department of Biostatistics, Brown University, Providence, RI, USA
| |
Collapse
|
18
|
He D, Mount SM, Patro R. scCensus: Off-target scRNA-seq reads reveal meaningful biology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.29.577807. [PMID: 38352549 PMCID: PMC10862729 DOI: 10.1101/2024.01.29.577807] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Single-cell RNA-sequencing (scRNA-seq) provides unprecedented insights into cellular heterogeneity. Although scRNA-seq reads from most prevalent and popular tagged-end protocols are expected to arise from the 3' end of polyadenylated RNAs, recent studies have shown that "off-target" reads can constitute a substantial portion of the read population. In this work, we introduced scCensus, a comprehensive analysis workflow for systematically evaluating and categorizing off-target reads in scRNA-seq. We applied scCensus to seven scRNA-seq datasets. Our analysis of intergenic reads shows that these off-target reads contain information about chromatin structure and can be used to identify similar cells across modalities. Our analysis of antisense reads suggests that these reads can be used to improve gene detection and capture interesting transcriptional activities like antisense transcription. Furthermore, using splice-aware quantification, we find that spliced and unspliced reads provide distinct information about cell clusters and biomarkers, suggesting the utility of integrating signals from reads with different splicing statuses. Overall, our results suggest that off-target scRNA-seq reads contain underappreciated information about various transcriptional activities. These observations about yet-unexploited information in existing scRNA-seq data will help guide and motivate the community to improve current algorithms and analysis methods, and to develop novel approaches that utilize off-target reads to extend the reach and accuracy of single-cell data analysis pipelines.
Collapse
Affiliation(s)
- Dongze He
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, USA
| | - Stephen M. Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Rob Patro
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
19
|
Dong X, Leary JR, Yang C, Brusko MA, Brusko TM, Bacher R. Data-driven selection of analysis decisions in single-cell RNA-seq trajectory inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.572214. [PMID: 38187768 PMCID: PMC10769271 DOI: 10.1101/2023.12.18.572214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics, however researchers still encounter challenges in their analysis due to uncertainties in selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort navigates single-cell trajectory analysis through data-driven assessments, reducing uncertainty and much of the decision burden associated with trajectory inference. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.
Collapse
Affiliation(s)
- Xiaoru Dong
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Jack R. Leary
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Chuanhao Yang
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
| | - Maigan A. Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Todd M. Brusko
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
- Department of Pathology, Immunology, and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL 32610, USA
- Department of Pediatrics, College of Medicine, University of Florida, Gainesville, FL 32610, USA
| | - Rhonda Bacher
- Department of Biostatistics, College of Public Health and Health Professions, University of Florida, Gainesville, FL 32610, USA
- Diabetes Institute, University of Florida, Gainesville, FL 32610, USA
| |
Collapse
|
20
|
Karakurt HU, Pir P. SUMA: a lightweight machine learning model-powered shared nearest neighbour-based clustering application interface for scRNA-Seq data. Turk J Biol 2023; 47:413-422. [PMID: 38681777 PMCID: PMC11045205 DOI: 10.55730/1300-0152.2675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/28/2023] [Accepted: 12/18/2023] [Indexed: 05/01/2024] Open
Abstract
Background/aim Single-cell transcriptomics (scRNA-Seq) explores cellular diversity at the gene expression level. Due to the inherent sparsity and noise in scRNA-Seq data and the uncertainty on the types of sequenced cells, effective clustering and cell type annotation are essential. The graph-based clustering of scRNA-Seq data is a simple yet powerful approach that presents data as a "shared nearest neighbour" graph and clusters the cells using graph clustering algorithms. These algorithms are dependent on several user-defined parameters.Here we present SUMA, a lightweight tool that uses a random forest model to predict the optimum number of neighbours to obtain the optimum clustering results. Moreover, we integrated our method with other commonly used methods in an RShiny application. SUMA can be used in a local environment (https://github.com/hkarakurt8742/SUMA) or as a browser tool (https://hkarakurt.shinyapps.io/suma/). Materials and methods Publicly available scRNA-Seq datasets and 3 different graph-based clustering algorithms were used to develop SUMA, and a large range for number of neighbours and variant genes was taken into consideration. The quality of clustering was assessed using the adjusted Rand index (ARI) and true labels of each dataset. The data were split into training and test datasets, and the model was built and optimised using Scikit-learn (Python) and randomForest (R) libraries. Results The accuracy of our machine learning model was 0.96, while the AUC of the ROC curve was 0.98. The model indicated that the number of cells in scRNA-Seq data is the most important feature when deciding the number of neighbours. Conclusion We developed and evaluated the SUMA model and implemented the method in the SUMAShiny app, which integrates SUMA with different clustering methods and enables nonbioinformatician users to cluster and visualise their scRNA data easily. The SUMAShiny app is available both for desktop and browser use.
Collapse
Affiliation(s)
- Hamza Umut Karakurt
- Department of Bioengineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkiye
- Idea Technology Solutions R&D Center, İstanbul, Turkiye
| | - Pınar Pir
- Department of Bioengineering, Faculty of Engineering, Gebze Technical University, Kocaeli, Turkiye
- Idea Technology Solutions R&D Center, İstanbul, Turkiye
| |
Collapse
|
21
|
Li J, Shyr Y, Liu Q. Single-cell and Spatial Transcriptomics Clustering with an Optimized Adaptive K-Nearest Neighbor Graph. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562261. [PMID: 37905097 PMCID: PMC10614787 DOI: 10.1101/2023.10.13.562261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
Single-cell and spatial transcriptomics have been widely used to characterize cellular landscape in complex tissues. To understand cellular heterogeneity, one essential step is to define cell types through unsupervised clustering. While typical clustering methods have difficulty in identifying rare cell types, approaches specifically tailored to detect rare cell types gain their ability at the cost of poorer performance for grouping abundant ones. Here, we developed aKNNO, a method to identify abundant and rare cell types simultaneously based on an adaptive k-nearest neighbor graph with optimization. Benchmarked on 38 simulated and 20 single-cell and spatial transcriptomics datasets, aKNNO identified both abundant and rare cell types accurately. Without sacrificing performance for clustering abundant cell types, aKNNO discovered known and novel rare cell types that those typical and even specifically tailored methods failed to detect. aKNNO, using transcriptome alone, stereotyped fine-grained anatomical structures more precisely than those integrative approaches combining expression with spatial locations and histology image.
Collapse
Affiliation(s)
- Jia Li
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37203, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| |
Collapse
|
22
|
Meng R, Yin S, Sun J, Hu H, Zhao Q. scAAGA: Single cell data analysis framework using asymmetric autoencoder with gene attention. Comput Biol Med 2023; 165:107414. [PMID: 37660567 DOI: 10.1016/j.compbiomed.2023.107414] [Citation(s) in RCA: 50] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Revised: 08/02/2023] [Accepted: 08/28/2023] [Indexed: 09/05/2023]
Abstract
In recent years, single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique for investigating cellular heterogeneity and structure. However, analyzing scRNA-seq data remains challenging, especially in the context of COVID-19 research. Single-cell clustering is a key step in analyzing scRNA-seq data, and deep learning methods have shown great potential in this area. In this work, we propose a novel scRNA-seq analysis framework called scAAGA. Specifically, we utilize an asymmetric autoencoder with a gene attention module to learn important gene features adaptively from scRNA-seq data, with the aim of improving the clustering effect. We apply scAAGA to COVID-19 peripheral blood mononuclear cell (PBMC) scRNA-seq data and compare its performance with state-of-the-art methods. Our results consistently demonstrate that scAAGA outperforms existing methods in terms of adjusted rand index (ARI), normalized mutual information (NMI), and adjusted mutual information (AMI) scores, achieving improvements ranging from 2.8% to 27.8% in NMI scores. Additionally, we discuss a data augmentation technology to expand the datasets and improve the accuracy of scAAGA. Overall, scAAGA presents a robust tool for scRNA-seq data analysis, enhancing the accuracy and reliability of clustering results in COVID-19 research.
Collapse
Affiliation(s)
- Rui Meng
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Shuaidong Yin
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi, 276000, China
| | - Huan Hu
- Institute of Applied Genomics, Fuzhou University, Fuzhou, 350108, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
23
|
Zhang C, Duan ZW, Xu YP, Liu J, Li HD. FEED: a feature selection method based on gene expression decomposition for single cell clustering. Brief Bioinform 2023; 24:bbad389. [PMID: 37935617 DOI: 10.1093/bib/bbad389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 08/31/2023] [Accepted: 09/22/2023] [Indexed: 11/09/2023] Open
Abstract
Single-cell clustering is a critical step in biological downstream analysis. The clustering performance could be effectively improved by extracting cell-type-specific genes. The state-of-the-art feature selection methods usually calculate the importance of a single gene without considering the information contained in the gene expression distribution. Moreover, these methods ignore the intrinsic expression patterns of genes and heterogeneity within groups of different mean expression levels. In this work, we present a Feature sElection method based on gene Expression Decomposition (FEED) of scRNA-seq data, which selects informative genes to enhance clustering performance. First, the expression levels of genes are decomposed into multiple Gaussian components. Then, a novel gene correlation calculation method is proposed to measure the relationship between genes from the perspective of distribution. Finally, a permutation-based approach is proposed to determine the threshold of gene importance to obtain marker gene subsets. Compared with state-of-the-art feature selection methods, applying FEED on various scRNA-seq datasets including large datasets followed by different common clustering algorithms results in significant improvements in the accuracy of cell-type identification. The source codes for FEED are freely available at https://github.com/genemine/FEED.
Collapse
Affiliation(s)
- Chao Zhang
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Zhi-Wei Duan
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Jin Liu
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
24
|
Lei T, Chen R, Zhang S, Chen Y. Self-supervised deep clustering of single-cell RNA-seq data to hierarchically detect rare cell populations. Brief Bioinform 2023; 24:bbad335. [PMID: 37769630 PMCID: PMC10539043 DOI: 10.1093/bib/bbad335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/05/2023] [Accepted: 09/06/2023] [Indexed: 10/02/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a widely used technique for characterizing individual cells and studying gene expression at the single-cell level. Clustering plays a vital role in grouping similar cells together for various downstream analyses. However, the high sparsity and dimensionality of large scRNA-seq data pose challenges to clustering performance. Although several deep learning-based clustering algorithms have been proposed, most existing clustering methods have limitations in capturing the precise distribution types of the data or fully utilizing the relationships between cells, leaving a considerable scope for improving the clustering performance, particularly in detecting rare cell populations from large scRNA-seq data. We introduce DeepScena, a novel single-cell hierarchical clustering tool that fully incorporates nonlinear dimension reduction, negative binomial-based convolutional autoencoder for data fitting, and a self-supervision model for cell similarity enhancement. In comprehensive evaluation using multiple large-scale scRNA-seq datasets, DeepScena consistently outperformed seven popular clustering tools in terms of accuracy. Notably, DeepScena exhibits high proficiency in identifying rare cell populations within large datasets that contain large numbers of clusters. When applied to scRNA-seq data of multiple myeloma cells, DeepScena successfully identified not only previously labeled large cell types but also subpopulations in CD14 monocytes, T cells and natural killer cells, respectively.
Collapse
Affiliation(s)
- Tianyuan Lei
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Ruoyu Chen
- Moorestown High School, Moorestown, NJ 08057, USA
| | - Shaoqiang Zhang
- College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China
| | - Yong Chen
- Department of Biological and Biomedical Sciences, Rowan University, NJ 08028, USA
| |
Collapse
|
25
|
He X, Qian K, Wang Z, Zeng S, Li H, Li WV. scAce: an adaptive embedding and clustering method for single-cell gene expression data. Bioinformatics 2023; 39:btad546. [PMID: 37672035 PMCID: PMC10500084 DOI: 10.1093/bioinformatics/btad546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 08/01/2023] [Accepted: 09/05/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION Since the development of single-cell RNA sequencing (scRNA-seq) technologies, clustering analysis of single-cell gene expression data has been an essential tool for distinguishing cell types and identifying novel cell types. Even though many methods have been available for scRNA-seq clustering analysis, the majority of them are constrained by the requirement on predetermined cluster numbers or the dependence on selected initial cluster assignment. RESULTS In this article, we propose an adaptive embedding and clustering method named scAce, which constructs a variational autoencoder to simultaneously learn cell embeddings and cluster assignments. In the scAce method, we develop an adaptive cluster merging approach which achieves improved clustering results without the need to estimate the number of clusters in advance. In addition, scAce provides an option to perform clustering enhancement, which can update and enhance cluster assignments based on previous clustering results from other methods. Based on computational analysis of both simulated and real datasets, we demonstrate that scAce outperforms state-of-the-art clustering methods for scRNA-seq data, and achieves better clustering accuracy and robustness. AVAILABILITY AND IMPLEMENTATION The scAce package is implemented in python 3.8 and is freely available from https://github.com/sldyns/scAce.
Collapse
Affiliation(s)
- Xinwei He
- School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
| | - Kun Qian
- School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
| | - Ziqian Wang
- School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
| | - Shirou Zeng
- School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
| | - Hongwei Li
- School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China
| | - Wei Vivian Li
- Department of Statistics, University of California, Riverside, Riverside 92521, United States
| |
Collapse
|
26
|
Wong M, Wei Y, Ho YC. Single-cell multiomic understanding of HIV-1 reservoir at epigenetic, transcriptional, and protein levels. Curr Opin HIV AIDS 2023; 18:246-256. [PMID: 37535039 PMCID: PMC10442869 DOI: 10.1097/coh.0000000000000809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023]
Abstract
PURPOSE OF REVIEW The success of HIV-1 eradication strategies relies on in-depth understanding of HIV-1-infected cells. However, HIV-1-infected cells are extremely heterogeneous and rare. Single-cell multiomic approaches resolve the heterogeneity and rarity of HIV-1-infected cells. RECENT FINDINGS Advancement in single-cell multiomic approaches enabled HIV-1 reservoir profiling across the epigenetic (ATAC-seq), transcriptional (RNA-seq), and protein levels (CITE-seq). Using HIV-1 RNA as a surrogate, ECCITE-seq identified enrichment of HIV-1-infected cells in clonally expanded cytotoxic CD4+ T cells. Using HIV-1 DNA PCR-activated microfluidic sorting, FIND-seq captured the bulk transcriptome of HIV-1 DNA+ cells. Using targeted HIV-1 DNA amplification, PheP-seq identified surface protein expression of intact versus defective HIV-1-infected cells. Using ATAC-seq to identify HIV-1 DNA, ASAP-seq captured transcription factor activity and surface protein expression of HIV-1 DNA+ cells. Combining HIV-1 mapping by ATAC-seq and HIV-1 RNA mapping by RNA-seq, DOGMA-seq captured the epigenetic, transcriptional, and surface protein expression of latent and transcriptionally active HIV-1-infected cells. To identify reproducible biological insights and authentic HIV-1-infected cells and avoid false-positive discovery of artifacts, we reviewed current practices of single-cell multiomic experimental design and bioinformatic analysis. SUMMARY Single-cell multiomic approaches may identify innovative mechanisms of HIV-1 persistence, nominate therapeutic strategies, and accelerate discoveries.
Collapse
Affiliation(s)
- Michelle Wong
- Department of Microbial Pathogenesis, Yale University School of Medicine, New Haven, Connecticut, USA
| | | | | |
Collapse
|
27
|
Gunawan I, Vafaee F, Meijering E, Lock JG. An introduction to representation learning for single-cell data analysis. CELL REPORTS METHODS 2023; 3:100547. [PMID: 37671013 PMCID: PMC10475795 DOI: 10.1016/j.crmeth.2023.100547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/07/2023]
Abstract
Single-cell-resolved systems biology methods, including omics- and imaging-based measurement modalities, generate a wealth of high-dimensional data characterizing the heterogeneity of cell populations. Representation learning methods are routinely used to analyze these complex, high-dimensional data by projecting them into lower-dimensional embeddings. This facilitates the interpretation and interrogation of the structures, dynamics, and regulation of cell heterogeneity. Reflecting their central role in analyzing diverse single-cell data types, a myriad of representation learning methods exist, with new approaches continually emerging. Here, we contrast general features of representation learning methods spanning statistical, manifold learning, and neural network approaches. We consider key steps involved in representation learning with single-cell data, including data pre-processing, hyperparameter optimization, downstream analysis, and biological validation. Interdependencies and contingencies linking these steps are also highlighted. This overview is intended to guide researchers in the selection, application, and optimization of representation learning strategies for current and future single-cell research applications.
Collapse
Affiliation(s)
- Ihuan Gunawan
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - Fatemeh Vafaee
- School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
| | - Erik Meijering
- School of Computer Science and Engineering, Faculty of Engineering, University of New South Wales, Sydney, NSW, Australia
| | - John George Lock
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW, Australia
- UNSW Data Science Hub, University of New South Wales, Sydney, NSW, Australia
- Ingham Institute for Applied Medical Research, Liverpool, NSW, Australia
| |
Collapse
|
28
|
Hegarty C, Neto N, Cahill P, Floudas A. Computational approaches in rheumatic diseases - Deciphering complex spatio-temporal cell interactions. Comput Struct Biotechnol J 2023; 21:4009-4020. [PMID: 37649712 PMCID: PMC10462794 DOI: 10.1016/j.csbj.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/04/2023] [Accepted: 08/04/2023] [Indexed: 09/01/2023] Open
Abstract
Inflammatory arthritis, including rheumatoid (RA), and psoriatic (PsA) arthritis, are clinically and immunologically heterogeneous diseases with no identified cure. Chronic inflammation of the synovial tissue ushers loss of function of the joint that severely impacts the patient's quality of life, eventually leading to disability and life-threatening comorbidities. The pathogenesis of synovial inflammation is the consequence of compounded immune and stromal cell interactions influenced by genetic and environmental factors. Deciphering the complexity of the synovial cellular landscape has accelerated primarily due to the utilisation of bulk and single cell RNA sequencing. Particularly the capacity to generate cell-cell interaction networks could reveal evidence of previously unappreciated processes leading to disease. However, there is currently a lack of universal nomenclature as a result of varied experimental and technological approaches that discombobulates the study of synovial inflammation. While spatial transcriptomic analysis that combines anatomical information with transcriptomic data of synovial tissue biopsies promises to provide more insights into disease pathogenesis, in vitro functional assays with single-cell resolution will be required to validate current bioinformatic applications. In order to provide a comprehensive approach and translate experimental data to clinical practice, a combination of clinical and molecular data with machine learning has the potential to enhance patient stratification and identify individuals at risk of arthritis that would benefit from early therapeutic intervention. This review aims to provide a comprehensive understanding of the effect of computational approaches in deciphering synovial inflammation pathogenesis and discuss the impact that further experimental and novel computational tools may have on therapeutic target identification and drug development.
Collapse
Affiliation(s)
- Ciara Hegarty
- Translational Immunology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Nuno Neto
- Trinity Centre for Biomedical Engineering, Trinity College Dublin, Ireland
| | - Paul Cahill
- Vascular Biology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Achilleas Floudas
- Translational Immunology lab, School of Biotechnology, Dublin City University, Dublin, Ireland
| |
Collapse
|
29
|
Cheng C, Chen W, Jin H, Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell-Cell Communication. Cells 2023; 12:1970. [PMID: 37566049 PMCID: PMC10417635 DOI: 10.3390/cells12151970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/10/2023] [Accepted: 07/21/2023] [Indexed: 08/12/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful tool for investigating cellular biology at an unprecedented resolution, enabling the characterization of cellular heterogeneity, identification of rare but significant cell types, and exploration of cell-cell communications and interactions. Its broad applications span both basic and clinical research domains. In this comprehensive review, we survey the current landscape of scRNA-seq analysis methods and tools, focusing on count modeling, cell-type annotation, data integration, including spatial transcriptomics, and the inference of cell-cell communication. We review the challenges encountered in scRNA-seq analysis, including issues of sparsity or low expression, reliability of cell annotation, and assumptions in data integration, and discuss the potential impact of suboptimal clustering and differential expression analysis tools on downstream analyses, particularly in identifying cell subpopulations. Finally, we discuss recent advancements and future directions for enhancing scRNA-seq analysis. Specifically, we highlight the development of novel tools for annotating single-cell data, integrating and interpreting multimodal datasets covering transcriptomics, epigenomics, and proteomics, and inferring cellular communication networks. By elucidating the latest progress and innovation, we provide a comprehensive overview of the rapidly advancing field of scRNA-seq analysis.
Collapse
Affiliation(s)
- Changde Cheng
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| | - Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Hongjian Jin
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA; (W.C.); (H.J.)
| | - Xiang Chen
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA;
| |
Collapse
|
30
|
Samur MK, Szalat R, Munshi NC. Single-cell profiling in multiple myeloma: insights, problems, and promises. Blood 2023; 142:313-324. [PMID: 37196627 PMCID: PMC10485379 DOI: 10.1182/blood.2022017145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Revised: 04/05/2023] [Accepted: 05/11/2023] [Indexed: 05/19/2023] Open
Abstract
In a short time, single-cell platforms have become the norm in many fields of research, including multiple myeloma (MM). In fact, the large amount of cellular heterogeneity in MM makes single-cell platforms particularly attractive because bulk assessments can miss valuable information about cellular subpopulations and cell-to-cell interactions. The decreasing cost and increasing accessibility of single-cell platform, combined with breakthroughs in obtaining multiomics data for the same cell and innovative computational programs for analyzing data, have allowed single-cell studies to make important insights into MM pathogenesis; yet, there is still much to be done. In this review, we will first focus on the types of single-cell profiling and the considerations for designing a single-cell profiling experiment. Then, we will discuss what have learned from single-cell profiling about myeloma clonal evolution, transcriptional reprogramming, and drug resistance, and about the MM microenvironment during precursor and advanced disease.
Collapse
Affiliation(s)
- Mehmet Kemal Samur
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Raphael Szalat
- Department of Hematology and Medical Oncology, Boston University Medical Center, Boston, MA
| | - Nikhil C. Munshi
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
- VA Boston Healthcare System, Boston, MA
| |
Collapse
|
31
|
Zhang J, Li J, Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum Vaccin Immunother 2023:2234792. [PMID: 37485833 PMCID: PMC10373621 DOI: 10.1080/21645515.2023.2234792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships and potentially developing personalized intervention strategies. Single-cell data have emerged as a promising source for identifying immune response biomarkers. In this review, we discuss the current state-of-the-art methods for immunoprofiling, including those for reducing the dimensionality of high-dimensional single-cell data and methods for clustering, classification, and prediction. We also draw attention to recent developments in data integration.
Collapse
Affiliation(s)
- Jingxuan Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jia Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
32
|
Arts JA, Laberthonnière C, Lima Cunha D, Zhou H. Single-Cell RNA Sequencing: Opportunities and Challenges for Studies on Corneal Biology in Health and Disease. Cells 2023; 12:1808. [PMID: 37443842 PMCID: PMC10340756 DOI: 10.3390/cells12131808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/15/2023] Open
Abstract
The structure and major cell types of the multi-layer human cornea have been extensively studied. However, various cell states in specific cell types and key genes that define the cell states are not fully understood, hindering our comprehension of corneal homeostasis, related diseases, and therapeutic discovery. Single-cell RNA sequencing is a revolutionary and powerful tool for identifying cell states within tissues such as the cornea. This review provides an overview of current single-cell RNA sequencing studies on the human cornea, highlighting similarities and differences between them, and summarizing the key genes that define corneal cell states reported in these studies. In addition, this review discusses the opportunities and challenges of using single-cell RNA sequencing to study corneal biology in health and disease.
Collapse
Affiliation(s)
- Julian A. Arts
- Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, The Netherlands; (J.A.A.)
| | - Camille Laberthonnière
- Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, The Netherlands; (J.A.A.)
| | - Dulce Lima Cunha
- Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, The Netherlands; (J.A.A.)
| | - Huiqing Zhou
- Molecular Developmental Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, The Netherlands; (J.A.A.)
- Department of Human Genetics, Radboud University Medical Center, 6500 HB Nijmegen, The Netherlands
| |
Collapse
|
33
|
Yu L, Liu C, Yang JYH, Yang P. Ensemble deep learning of embeddings for clustering multimodal single-cell omics data. Bioinformatics 2023; 39:btad382. [PMID: 37314966 PMCID: PMC10287920 DOI: 10.1093/bioinformatics/btad382] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/16/2023] [Accepted: 06/12/2023] [Indexed: 06/16/2023] Open
Abstract
MOTIVATION Recent advances in multimodal single-cell omics technologies enable multiple modalities of molecular attributes, such as gene expression, chromatin accessibility, and protein abundance, to be profiled simultaneously at a global level in individual cells. While the increasing availability of multiple data modalities is expected to provide a more accurate clustering and characterization of cells, the development of computational methods that are capable of extracting information embedded across data modalities is still in its infancy. RESULTS We propose SnapCCESS for clustering cells by integrating data modalities in multimodal single-cell omics data using an unsupervised ensemble deep learning framework. By creating snapshots of embeddings of multimodality using variational autoencoders, SnapCCESS can be coupled with various clustering algorithms for generating consensus clustering of cells. We applied SnapCCESS with several clustering algorithms to various datasets generated from popular multimodal single-cell omics technologies. Our results demonstrate that SnapCCESS is effective and more efficient than conventional ensemble deep learning-based clustering methods and outperforms other state-of-the-art multimodal embedding generation methods in integrating data modalities for clustering cells. The improved clustering of cells from SnapCCESS will pave the way for more accurate characterization of cell identity and types, an essential step for various downstream analyses of multimodal single-cell omics data. AVAILABILITY AND IMPLEMENTATION SnapCCESS is implemented as a Python package and is freely available from https://github.com/PYangLab/SnapCCESS under the open-source license of GPL-3. The data used in this study are publicly available (see section 'Data availability').
Collapse
Affiliation(s)
- Lijia Yu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Chunlei Liu
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
| | - Jean Yee Hwa Yang
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| | - Pengyi Yang
- Computational Systems Biology Group, Children’s Medical Research Institute, Faculty of Medicine and Health, The University of Sydney, Westmead, NSW 2145, Australia
- School of Mathematics and Statistics, Faculty of Science, University of Sydney, NSW 2006, Australia
- Sydney Precision Data Science Centre, University of Sydney, NSW 2006, Australia
- Charles Perkins Centre, The University of Sydney, Sydney, NSW 2006, Australia
- Laboratory of Data Discovery for Health Limited (D4H), Hong Kong Science Park, Hong Kong SAR, China
| |
Collapse
|
34
|
Chen D, Jin J, Ke ZT. Subject clustering by IF-PCA and several recent methods. Front Genet 2023; 14:1166404. [PMID: 37287536 PMCID: PMC10242062 DOI: 10.3389/fgene.2023.1166404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 05/03/2023] [Indexed: 06/09/2023] Open
Abstract
Subject clustering (i.e., the use of measured features to cluster subjects, such as patients or cells, into multiple groups) is a problem of significant interest. In recent years, many approaches have been proposed, among which unsupervised deep learning (UDL) has received much attention. Two interesting questions are 1) how to combine the strengths of UDL and other approaches and 2) how these approaches compare to each other. We combine the variational auto-encoder (VAE), a popular UDL approach, with the recent idea of influential feature-principal component analysis (IF-PCA) and propose IF-VAE as a new method for subject clustering. We study IF-VAE and compare it with several other methods (including IF-PCA, VAE, Seurat, and SC3) on 10 gene microarray data sets and eight single-cell RNA-seq data sets. We find that IF-VAE shows significant improvement over VAE, but still underperforms compared to IF-PCA. We also find that IF-PCA is quite competitive, slightly outperforming Seurat and SC3 over the eight single-cell data sets. IF-PCA is conceptually simple and permits delicate analysis. We demonstrate that IF-PCA is capable of achieving phase transition in a rare/weak model. Comparatively, Seurat and SC3 are more complex and theoretically difficult to analyze (for these reasons, their optimality remains unclear).
Collapse
Affiliation(s)
- Dieyi Chen
- Department of Statistics, Harvard University, Cambridge, MA, United States
| | - Jiashun Jin
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Zheng Tracy Ke
- Department of Statistics, Harvard University, Cambridge, MA, United States
| |
Collapse
|
35
|
Zhang Z, Wei X. Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
Affiliation(s)
- Zhe Zhang
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China; State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, and Collaborative Innovation Center for Biotherapy, Chengdu 610041, PR China
| | - Xiawei Wei
- Laboratory of Aging Research and Cancer Drug Target, State Key Laboratory of Biotherapy and Cancer Center, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
36
|
Nie X, Qin D, Zhou X, Duo H, Hao Y, Li B, Liang G. Clustering ensemble in scRNA-seq data analysis: Methods, applications and challenges. Comput Biol Med 2023; 159:106939. [PMID: 37075602 DOI: 10.1016/j.compbiomed.2023.106939] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/31/2023] [Accepted: 04/14/2023] [Indexed: 04/21/2023]
Abstract
With the rapid development of single-cell RNA-sequencing techniques, various computational methods and tools were proposed to analyze these high-throughput data, which led to an accelerated reveal of potential biological information. As one of the core steps of single-cell transcriptome data analysis, clustering plays a crucial role in identifying cell types and interpreting cellular heterogeneity. However, the results generated by different clustering methods showed distinguishing, and those unstable partitions can affect the accuracy of the analysis to a certain extent. To overcome this challenge and obtain more accurate results, currently clustering ensemble is frequently applied to cluster analysis of single-cell transcriptome datasets, and the results generated by all clustering ensembles are nearly more reliable than those from most of the single clustering partitions. In this review, we summarize applications and challenges of the clustering ensemble method in single-cell transcriptome data analysis, and provide constructive thoughts and references for researchers in this field.
Collapse
Affiliation(s)
- Xiner Nie
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China; College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Dan Qin
- Department of Biology, College of Science, Northeastern University, Boston, MA, 02115, USA
| | - Xinyi Zhou
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Hongrui Duo
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Youjin Hao
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China
| | - Bo Li
- College of Life Sciences, Chongqing Normal University, Chongqing, 400044, PR China.
| | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, Bioengineering College, Chongqing University, Chongqing, 400044, China.
| |
Collapse
|
37
|
Naydenov DD, Vashukova ES, Barbitoff YA, Nasykhova YA, Glotov AS. Current Status and Prospects of the Single-Cell Sequencing Technologies for Revealing the Pathogenesis of Pregnancy-Associated Disorders. Genes (Basel) 2023; 14:756. [PMID: 36981026 PMCID: PMC10048492 DOI: 10.3390/genes14030756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/12/2023] [Accepted: 03/16/2023] [Indexed: 03/30/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) is a method that focuses on the analysis of gene expression profile in individual cells. This method has been successfully applied to answer the challenging questions of the pathogenesis of multifactorial diseases and open up new possibilities in the prognosis and prevention of reproductive diseases. In this article, we have reviewed the application of scRNA-seq to the analysis of the various cell types and their gene expression changes in normal pregnancy and pregnancy complications. The main principle, advantages, and limitations of single-cell technologies and data analysis methods are described. We discuss the possibilities of using the scRNA-seq method for solving the fundamental and applied tasks related to various pregnancy-associated disorders. Finally, we provide an overview of the scRNA-seq findings for the common pregnancy-associated conditions, such as hyperglycemia in pregnancy, recurrent pregnancy loss, preterm labor, polycystic ovary syndrome, and pre-eclampsia.
Collapse
Affiliation(s)
- Dmitry D. Naydenov
- Faculty of Biology, St. Petersburg State University, 199034 Saint-Petersburg, Russia
| | - Elena S. Vashukova
- D. O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, 199034 Saint-Petersburg, Russia
| | - Yury A. Barbitoff
- Faculty of Biology, St. Petersburg State University, 199034 Saint-Petersburg, Russia
- D. O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, 199034 Saint-Petersburg, Russia
| | - Yulia A. Nasykhova
- D. O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, 199034 Saint-Petersburg, Russia
| | - Andrey S. Glotov
- Faculty of Biology, St. Petersburg State University, 199034 Saint-Petersburg, Russia
- D. O. Ott Research Institute of Obstetrics, Gynaecology and Reproductology, 199034 Saint-Petersburg, Russia
| |
Collapse
|
38
|
Alvarez M, Benhammou JN, Rao S, Mishra L, Pisegna JR, Pajukanta P. Isolation of Nuclei from Human Snap-frozen Liver Tissue for Single-nucleus RNA Sequencing. Bio Protoc 2023; 13:e4601. [PMID: 36874905 PMCID: PMC9976782 DOI: 10.21769/bioprotoc.4601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/03/2022] [Accepted: 01/05/2023] [Indexed: 02/04/2023] Open
Abstract
Single-nucleus RNA sequencing (snRNA-seq) provides a powerful tool for studying cell type composition in heterogenous tissues. The liver is a vital organ composed of a diverse set of cell types; thus, single-cell technologies could greatly facilitate the deconvolution of liver tissue composition and various downstream omics analyses at the cell-type level. Applying single-cell technologies to fresh liver biopsies can, however, be very challenging, and snRNA-seq of snap-frozen liver biopsies requires some optimization given the high nucleic acid content of the solid liver tissue. Therefore, an optimized protocol for snRNA-seq specifically targeted for the use of frozen liver samples is needed to improve our understanding of human liver gene expression at the cell-type resolution. We present a protocol for performing nuclei isolation from snap-frozen liver tissues, as well as guidance on the application of snRNA-seq. We also provide guidance on optimizing the protocol to different tissue and sample types.
Collapse
Affiliation(s)
- Marcus Alvarez
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Jihane N Benhammou
- Vatche and Tamar Manoukian Division of Digestive Diseases, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.,Division of Gastroenterology, Hepatology and Parenteral Nutrition, Department of Medicine, VA Greater Los Angeles Healthcare System, CA, USA
| | - Shuyun Rao
- Center for Translational Medicine, Department of Surgery, George Washington University, Washington DC, USA
| | - Lopa Mishra
- Center for Translational Medicine, Department of Surgery, George Washington University, Washington DC, USA.,Institute for Bioelectronic Medicine, Feinstein Institutes for Medical Research; Divisions of Gastroenterology and Hepatology, Department of Medicine, Northwell Health, Manhasset, NY, USA.,Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Joseph R Pisegna
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.,Division of Gastroenterology, Hepatology and Parenteral Nutrition, Department of Medicine, VA Greater Los Angeles Healthcare System, CA, USA
| | - Päivi Pajukanta
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.,Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.,Institute for Precision Health, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| |
Collapse
|
39
|
Ding L, Shi H, Qian C, Burdyshaw C, Veloso JP, Khatamian A, Pan Q, Dhungana Y, Xie Z, Risch I, Yang X, Huang X, Yan L, Rusch M, Brewer M, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for identifying hidden drivers from single-cell omics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.523391. [PMID: 36747870 PMCID: PMC9901187 DOI: 10.1101/2023.01.26.523391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The sparse nature of single-cell omics data makes it challenging to dissect the wiring and rewiring of the transcriptional and signaling drivers that regulate cellular states. Many of the drivers, referred to as "hidden drivers", are difficult to identify via conventional expression analysis due to low expression and inconsistency between RNA and protein activity caused by post-translational and other modifications. To address this issue, we developed scMINER, a mutual information (MI)-based computational framework for unsupervised clustering analysis and cell-type specific inference of intracellular networks, hidden drivers and network rewiring from single-cell RNA-seq data. We designed scMINER to capture nonlinear cell-cell and gene-gene relationships and infer driver activities. Systematic benchmarking showed that scMINER outperforms popular single-cell clustering algorithms, especially in distinguishing similar cell types. With respect to network inference, scMINER does not rely on the binding motifs which are available for a limited set of transcription factors, therefore scMINER can provide quantitative activity assessment for more than 6,000 transcription and signaling drivers from a scRNA-seq experiment. As demonstrations, we used scMINER to expose hidden transcription and signaling drivers and dissect their regulon rewiring in immune cell heterogeneity, lineage differentiation, and tissue specification. Overall, activity-based scMINER is a widely applicable, highly accurate, reproducible and scalable method for inferring cellular transcriptional and signaling networks in each cell state from scRNA-seq data. The scMINER software is publicly accessible via: https://github.com/jyyulab/scMINER.
Collapse
Affiliation(s)
- Liang Ding
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- These authors contributed equally
| | - Hao Shi
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- These authors contributed equally
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Qingfei Pan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Hongbo Chi
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
40
|
Ding L, Shi H, Qian C, Burdyshaw C, Veloso JP, Khatamian A, Pan Q, Dhungana Y, Xie Z, Risch I, Yang X, Huang X, Yan L, Rusch M, Brewer M, Yan KK, Chi H, Yu J. scMINER: a mutual information-based framework for identifying hidden drivers from single-cell omics data. RESEARCH SQUARE 2023:rs.3.rs-2476875. [PMID: 36747874 PMCID: PMC9901036 DOI: 10.21203/rs.3.rs-2476875/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The sparse nature of single-cell omics data makes it challenging to dissect the wiring and rewiring of the transcriptional and signaling drivers that regulate cellular states. Many of the drivers, referred to as "hidden drivers", are difficult to identify via conventional expression analysis due to low expression and inconsistency between RNA and protein activity caused by post-translational and other modifications. To address this issue, we developed scMINER, a mutual information (MI)-based computational framework for unsupervised clustering analysis and cell-type specific inference of intracellular networks, hidden drivers and network rewiring from single-cell RNA-seq data. We designed scMINER to capture nonlinear cell-cell and gene-gene relationships and infer driver activities. Systematic benchmarking showed that scMINER outperforms popular single-cell clustering algorithms, especially in distinguishing similar cell types. With respect to network inference, scMINER does not rely on the binding motifs which are available for a limited set of transcription factors, therefore scMINER can provide quantitative activity assessment for more than 6,000 transcription and signaling drivers from a scRNA-seq experiment. As demonstrations, we used scMINER to expose hidden transcription and signaling drivers and dissect their regulon rewiring in immune cell heterogeneity, lineage differentiation, and tissue specification. Overall, activity-based scMINER is a widely applicable, highly accurate, reproducible and scalable method for inferring cellular transcriptional and signaling networks in each cell state from scRNA-seq data. The scMINER software is publicly accessible via: https://github.com/jyyulab/scMINER.
Collapse
Affiliation(s)
- Liang Ding
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Hao Shi
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Chenxi Qian
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Chad Burdyshaw
- Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Joao Pedro Veloso
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Alireza Khatamian
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Qingfei Pan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Yogesh Dhungana
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Graduate School of Biomedical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Zhen Xie
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Physiology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Isabel Risch
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Xu Yang
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Xin Huang
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Lei Yan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Michael Rusch
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Michael Brewer
- Department of Information Services, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Koon-Kiu Yan
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Hongbo Chi
- Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| |
Collapse
|
41
|
Matsushima A, Pineda SS, Crittenden JR, Lee H, Galani K, Mantero J, Tombaugh G, Kellis M, Heiman M, Graybiel AM. Transcriptional vulnerabilities of striatal neurons in human and rodent models of Huntington's disease. Nat Commun 2023; 14:282. [PMID: 36650127 PMCID: PMC9845362 DOI: 10.1038/s41467-022-35752-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/23/2022] [Indexed: 01/19/2023] Open
Abstract
Striatal projection neurons (SPNs), which progressively degenerate in human patients with Huntington's disease (HD), are classified along two axes: the canonical direct-indirect pathway division and the striosome-matrix compartmentation. It is well established that the indirect-pathway SPNs are susceptible to neurodegeneration and transcriptomic disturbances, but less is known about how the striosome-matrix axis is compromised in HD in relation to the canonical axis. Here we show, using single-nucleus RNA-sequencing data from male Grade 1 HD patient post-mortem brain samples and male zQ175 and R6/2 mouse models, that the two axes are multiplexed and differentially compromised in HD. In human HD, striosomal indirect-pathway SPNs are the most depleted SPN population. In mouse HD models, the transcriptomic distinctiveness of striosome-matrix SPNs is diminished more than that of direct-indirect pathway SPNs. Furthermore, the loss of striosome-matrix distinction is more prominent within indirect-pathway SPNs. These results open the possibility that the canonical direct-indirect pathway and striosome-matrix compartments are differentially compromised in late and early stages of disease progression, respectively, differentially contributing to the symptoms, thus calling for distinct therapeutic strategies.
Collapse
Affiliation(s)
- Ayano Matsushima
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sergio Sebastian Pineda
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Jill R Crittenden
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Hyeseung Lee
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kyriakitsa Galani
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Julio Mantero
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | | | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
| | - Myriam Heiman
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Ann M Graybiel
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
42
|
Chen S, Wang R, Long W, Jiang R. ASTER: accurately estimating the number of cell types in single-cell chromatin accessibility data. Bioinformatics 2023; 39:6969102. [PMID: 36610708 PMCID: PMC9825259 DOI: 10.1093/bioinformatics/btac842] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 12/04/2022] [Accepted: 12/26/2022] [Indexed: 12/28/2022] Open
Abstract
SUMMARY Recent innovations in single-cell chromatin accessibility sequencing (scCAS) have revolutionized the characterization of epigenomic heterogeneity. Estimation of the number of cell types is a crucial step for downstream analyses and biological implications. However, efforts to perform estimation specifically for scCAS data are limited. Here, we propose ASTER, an ensemble learning-based tool for accurately estimating the number of cell types in scCAS data. ASTER outperformed baseline methods in systematic evaluation on 27 datasets of various protocols, sizes, numbers of cell types, degrees of cell-type imbalance, cell states and qualities, providing valuable guidance for scCAS data analysis. AVAILABILITY AND IMPLEMENTATION ASTER along with detailed documentation is freely accessible at https://aster.readthedocs.io/ under the MIT License. It can be seamlessly integrated into existing scCAS analysis workflows. The source code is available at https://github.com/biox-nku/aster. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Rongxiang Wang
- Ministry of Education Key Laboratory of Bioinformatics, Research Department of Bioinformatics at the Beijing National Research Center for Information Science and Technology, Center for Synthetic and Systems Biology, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Wenxin Long
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China
| | - Rui Jiang
- To whom correspondence should be addressed. or
| |
Collapse
|
43
|
Huang Y, Chang H, Chen X, Meng J, Han M, Huang T, Yuan L, Zhang G. A cell marker-based clustering strategy (cmCluster) for precise cell type identification of scRNA-seq data. QUANTITATIVE BIOLOGY 2023. [DOI: 10.15302/j-qb-022-0311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]
|
44
|
Ratnasiri K, Wilk AJ, Lee MJ, Khatri P, Blish CA. Single-cell RNA-seq methods to interrogate virus-host interactions. Semin Immunopathol 2023; 45:71-89. [PMID: 36414692 PMCID: PMC9684776 DOI: 10.1007/s00281-022-00972-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/31/2022] [Indexed: 11/23/2022]
Abstract
The twenty-first century has seen the emergence of many epidemic and pandemic viruses, with the most recent being the SARS-CoV-2-driven COVID-19 pandemic. As obligate intracellular parasites, viruses rely on host cells to replicate and produce progeny, resulting in complex virus and host dynamics during an infection. Single-cell RNA sequencing (scRNA-seq), by enabling broad and simultaneous profiling of both host and virus transcripts, represents a powerful technology to unravel the delicate balance between host and virus. In this review, we summarize technological and methodological advances in scRNA-seq and their applications to antiviral immunity. We highlight key scRNA-seq applications that have enabled the understanding of viral genomic and host response heterogeneity, differential responses of infected versus bystander cells, and intercellular communication networks. We expect further development of scRNA-seq technologies and analytical methods, combined with measurements of additional multi-omic modalities and increased availability of publicly accessible scRNA-seq datasets, to enable a better understanding of viral pathogenesis and enhance the development of antiviral therapeutics strategies.
Collapse
Affiliation(s)
- Kalani Ratnasiri
- grid.168010.e0000000419368956Stanford Immunology Program, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Aaron J. Wilk
- grid.168010.e0000000419368956Stanford Immunology Program, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Madeline J. Lee
- grid.168010.e0000000419368956Stanford Immunology Program, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA
| | - Purvesh Khatri
- Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA, 94305, USA. .,Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA, 94305, USA. .,Department of Medicine, Center for Biomedical Informatics Research, Stanford, CA, USA. .,Inflammatix, Inc., Sunnyvale, CA, 94085, USA.
| | - Catherine A. Blish
- grid.168010.e0000000419368956Stanford Immunology Program, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Department of Medicine, Division of Infectious Diseases and Geographic Medicine, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Medical Scientist Training Program, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.168010.e0000000419368956Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, Stanford, CA 94305 USA ,grid.499295.a0000 0004 9234 0175Chan Zuckerberg Biohub, San Francisco, CA 94158 USA
| |
Collapse
|
45
|
Wang J, Zhang N, Yuan S, Shang J, Dai L, Li F, Liu J. Non-negative low-rank representation based on dictionary learning for single-cell RNA-sequencing data analysis. BMC Genomics 2022; 23:851. [PMID: 36564711 PMCID: PMC9789616 DOI: 10.1186/s12864-022-09027-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/21/2022] [Indexed: 12/24/2022] Open
Abstract
In the analysis of single-cell RNA-sequencing (scRNA-seq) data, how to effectively and accurately identify cell clusters from a large number of cell mixtures is still a challenge. Low-rank representation (LRR) method has achieved excellent results in subspace clustering. But in previous studies, most LRR-based methods usually choose the original data matrix as the dictionary. In addition, the methods based on LRR usually use spectral clustering algorithm to complete cell clustering. Therefore, there is a matching problem between the spectral clustering method and the affinity matrix, which is difficult to ensure the optimal effect of clustering. Considering the above two points, we propose the DLNLRR method to better identify the cell type. First, DLNLRR can update the dictionary during the optimization process instead of using the predefined fixed dictionary, so it can realize dictionary learning and LRR learning at the same time. Second, DLNLRR can realize subspace clustering without relying on spectral clustering algorithm, that is, we can perform clustering directly based on the low-rank matrix. Finally, we carry out a large number of experiments on real single-cell datasets and experimental results show that DLNLRR is superior to other scRNA-seq data analysis algorithms in cell type identification.
Collapse
Affiliation(s)
- Juan Wang
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Nana Zhang
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Shasha Yuan
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Junliang Shang
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Lingyun Dai
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Feng Li
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| | - Jinxing Liu
- grid.412638.a0000 0001 0227 8151School of Computer Science, Qufu Normal University, Rizhao, China
| |
Collapse
|
46
|
Identifying Gene Markers Associated with Cell Subpopulations. Methods Mol Biol 2022; 2584:251-268. [PMID: 36495455 DOI: 10.1007/978-1-0716-2756-3_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
An important point of the analysis of a single-cell RNA experiment is the identification of the key elements, i.e., genes, characterizing each cell subpopulation cluster. In this chapter, we describe the use of sparsely connected autoencoder, as a tool to convert single-cell clusters in pseudo-RNAseq experiments to be used as input for differential expression analysis, and the use of COMET, as a tool to depict cluster-specific gene markers.
Collapse
|
47
|
Zandavi SM, Liu D, Chung V, Anaissi A, Vafaee F. Fotomics: fourier transform-based omics imagification for deep learning-based cell-identity mapping using single-cell omics profiles. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10357-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
48
|
Watson ER, Mora A, Taherian Fard A, Mar JC. How does the structure of data impact cell-cell similarity? Evaluating how structural properties influence the performance of proximity metrics in single cell RNA-seq data. Brief Bioinform 2022; 23:6712300. [PMID: 36151725 PMCID: PMC9677483 DOI: 10.1093/bib/bbac387] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/26/2022] [Accepted: 08/11/2022] [Indexed: 12/14/2022] Open
Abstract
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the 'best-performing' metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
Collapse
Affiliation(s)
- Ebony Rose Watson
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia
| | - Ariane Mora
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD, Australia
| | - Atefeh Taherian Fard
- Corresponding authors. Jessica Cara Mar, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +614 90 733 703; E-mail: ; Atefeh Taherian Fard, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +61 7 3346 3894; E-mail:
| | - Jessica Cara Mar
- Corresponding authors. Jessica Cara Mar, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +614 90 733 703; E-mail: ; Atefeh Taherian Fard, Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, QLD, Australia. Tel.: +61 7 3346 3894; E-mail:
| |
Collapse
|
49
|
Cao Y, Fu L, Wu J, Peng Q, Nie Q, Zhang J, Xie X. Integrated analysis of multimodal single-cell data with structural similarity. Nucleic Acids Res 2022; 50:e121. [PMID: 36130281 PMCID: PMC9757079 DOI: 10.1093/nar/gkac781] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 08/15/2022] [Accepted: 09/02/2022] [Indexed: 12/24/2022] Open
Abstract
Multimodal single-cell sequencing technologies provide unprecedented information on cellular heterogeneity from multiple layers of genomic readouts. However, joint analysis of two modalities without properly handling the noise often leads to overfitting of one modality by the other and worse clustering results than vanilla single-modality analysis. How to efficiently utilize the extra information from single cell multi-omics to delineate cell states and identify meaningful signal remains as a significant computational challenge. In this work, we propose a deep learning framework, named SAILERX, for efficient, robust, and flexible analysis of multi-modal single-cell data. SAILERX consists of a variational autoencoder with invariant representation learning to correct technical noises from sequencing process, and a multimodal data alignment mechanism to integrate information from different modalities. Instead of performing hard alignment by projecting both modalities to a shared latent space, SAILERX encourages the local structures of two modalities measured by pairwise similarities to be similar. This strategy is more robust against overfitting of noises, which facilitates various downstream analysis such as clustering, imputation, and marker gene detection. Furthermore, the invariant representation learning part enables SAILERX to perform integrative analysis on both multi- and single-modal datasets, making it an applicable and scalable tool for more general scenarios.
Collapse
Affiliation(s)
| | | | - Jie Wu
- Department of Biological Chemistry, University of California, Irvine, CA 92697, USA
| | - Qinke Peng
- Systems Engineering Institute, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, CA 92697, USA,Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA,NSF-Simons Center for Multiscale Cell Fate Research, University of California, Irvine, CA 92697, USA
| | - Jing Zhang
- To whom correspondence should be addressed. Tel: +1 949 824 9979;
| | - Xiaohui Xie
- Correspondence may also be addressed to Xiaohui Xie. Tel: +1 949 824 9289;
| |
Collapse
|
50
|
Ke M, Elshenawy B, Sheldon H, Arora A, Buffa FM. Single cell RNA-sequencing: A powerful yet still challenging technology to study cellular heterogeneity. Bioessays 2022; 44:e2200084. [PMID: 36068142 DOI: 10.1002/bies.202200084] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 08/18/2022] [Accepted: 08/19/2022] [Indexed: 11/11/2022]
Abstract
Almost all biomedical research to date has relied upon mean measurements from cell populations, however it is well established that what it is observed at this macroscopic level can be the result of many interactions of several different single cells. Thus, the observable macroscopic 'average' cannot outright be used as representative of the 'average cell'. Rather, it is the resulting emerging behaviour of the actions and interactions of many different cells. Single-cell RNA sequencing (scRNA-Seq) enables the comparison of the transcriptomes of individual cells. This provides high-resolution maps of the dynamic cellular programmes allowing us to answer fundamental biological questions on their function and evolution. It also allows to address medical questions such as the role of rare cell populations contributing to disease progression and therapeutic resistance. Furthermore, it provides an understanding of context-specific dependencies, namely the behaviour and function that a cell has in a specific context, which can be crucial to understand some complex diseases, such as diabetes, cardiovascular disease and cancer. Here, we provide an overview of scRNA-Seq, including a comparative review of emerging technologies and computational pipelines. We discuss the current and emerging applications and focus on tumour heterogeneity a clear example of how scRNA-Seq can provide new understanding of a complex disease. Additionally, we review the limitations and highlight the need of powerful computational pipelines and reproducible protocols for the broader acceptance of this technique in basic and clinical research.
Collapse
Affiliation(s)
- May Ke
- Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK
| | - Badran Elshenawy
- Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK
| | - Helen Sheldon
- Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK
| | - Anjali Arora
- Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK
| | - Francesca M Buffa
- Department of Oncology, Medical Sciences Division, University of Oxford, Oxford, UK.,Department of Computing Sciences, Bocconi University, Milano, Italy.,Institute for Data Science and Analytics, Bocconi University, Milano, Italy
| |
Collapse
|