1
|
Bilous M, Hérault L, Gabriel AA, Teleman M, Gfeller D. Building and analyzing metacells in single-cell genomics data. Mol Syst Biol 2024; 20:744-766. [PMID: 38811801 PMCID: PMC11220014 DOI: 10.1038/s44320-024-00045-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The advent of high-throughput single-cell genomics technologies has fundamentally transformed biological sciences. Currently, millions of cells from complex biological tissues can be phenotypically profiled across multiple modalities. The scaling of computational methods to analyze and visualize such data is a constant challenge, and tools need to be regularly updated, if not redesigned, to cope with ever-growing numbers of cells. Over the last few years, metacells have been introduced to reduce the size and complexity of single-cell genomics data while preserving biologically relevant information and improving interpretability. Here, we review recent studies that capitalize on the concept of metacells-and the many variants in nomenclature that have been used. We further outline how and when metacells should (or should not) be used to analyze single-cell genomics data and what should be considered when analyzing such data at the metacell level. To facilitate the exploration of metacells, we provide a comprehensive tutorial on the construction and analysis of metacells from single-cell RNA-seq data ( https://github.com/GfellerLab/MetacellAnalysisTutorial ) as well as a fully integrated pipeline to rapidly build, visualize and evaluate metacells with different methods ( https://github.com/GfellerLab/MetacellAnalysisToolkit ).
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Léonard Hérault
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Aurélie Ag Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - Matei Teleman
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, 1011, Lausanne, Switzerland.
- Agora Cancer Research Centre, 1011, Lausanne, Switzerland.
- Swiss Cancer Center Leman (SCCL), Lausanne, Switzerland.
- Swiss Institute of Bioinformatics (SIB), 1015, Lausanne, Switzerland.
| |
Collapse
|
2
|
Xie J, Ruan S, Tu M, Yuan Z, Hu J, Li H, Li S. Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding. Oncogene 2024:10.1038/s41388-024-03074-5. [PMID: 38834657 DOI: 10.1038/s41388-024-03074-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 05/22/2024] [Accepted: 05/28/2024] [Indexed: 06/06/2024]
Abstract
Single-cell transcriptome sequencing (scRNA-seq) is a high-throughput technique used to study gene expression at the single-cell level. Clustering analysis is a commonly used method in scRNA-seq data analysis, helping researchers identify cell types and uncover interactions between cells. However, the choice of a robust similarity metric in the clustering procedure is still an open challenge due to the complex underlying structures of the data and the inherent noise in data acquisition. Here, we propose a deep clustering method for scRNA-seq data called scRISE (scRNA-seq Iterative Smoothing and self-supervised discriminative Embedding model) to resolve this challenge. The model consists of two main modules: an iterative smoothing module based on graph autoencoders designed to denoise the data and refine the pairwise similarity in turn to gradually incorporate cell structural features and enrich the data information; and a self-supervised discriminative embedding module with adaptive similarity threshold for partitioning samples into correct clusters. Our approach has shown improved quality of data representation and clustering on seventeen scRNA-seq datasets against a number of state-of-the-art deep learning clustering methods. Furthermore, utilizing the scRISE method in biological analysis against the HNSCC dataset has unveiled 62 informative genes, highlighting their potential roles as therapeutic targets and biomarkers.
Collapse
Affiliation(s)
- Jinxin Xie
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Shanshan Ruan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Mingyan Tu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Jianguo Hu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Honglin Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
- Lingang Laboratory, Shanghai, 200031, China.
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
- Innovation Center for AI and Drug Discovery, School of Pharmacy, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
3
|
Chen H, You R, Guo J, Zhou W, Chew G, Devapragash N, Loh JZ, Gesualdo L, Li Y, Jiang Y, Tan ELS, Chen S, Pontrelli P, Pesce F, Behmoaras J, Zhang A, Petretto E. WWP2 Regulates Renal Fibrosis and the Metabolic Reprogramming of Profibrotic Myofibroblasts. J Am Soc Nephrol 2024; 35:696-718. [PMID: 38502123 PMCID: PMC11164121 DOI: 10.1681/asn.0000000000000328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 02/28/2024] [Indexed: 03/20/2024] Open
Abstract
Key Points WWP2 expression is elevated in the tubulointerstitium of fibrotic kidneys and contributes to CKD pathogenesis and progression. WWP2 uncouples the profibrotic activation and cell proliferation in renal myofibroblasts. WWP2 controls mitochondrial respiration in renal myofibroblasts through the metabolic regulator peroxisome proliferator-activated receptor gamma coactivator 1-alpha. Background Renal fibrosis is a common pathologic end point in CKD that is challenging to reverse, and myofibroblasts are responsible for the accumulation of a fibrillar collagen–rich extracellular matrix. Recent studies have unveiled myofibroblasts' diversity in proliferative and fibrotic characteristics, which are linked to different metabolic states. We previously demonstrated the regulation of extracellular matrix genes and tissue fibrosis by WWP2, a multifunctional E3 ubiquitin–protein ligase. Here, we investigate WWP2 in renal fibrosis and in the metabolic reprograming of myofibroblasts in CKD. Methods We used kidney samples from patients with CKD and WWP2 -null kidney disease mice models and leveraged single-cell RNA sequencing analysis to detail the cell-specific regulation of WWP2 in fibrotic kidneys. Experiments in primary cultured myofibroblasts by bulk-RNA sequencing, chromatin immunoprecipitation sequencing, metabolomics, and cellular metabolism assays were used to study the metabolic regulation of WWP2 and its downstream signaling. Results The tubulointerstitial expression of WWP2 was associated with fibrotic progression in patients with CKD and in murine kidney disease models. WWP2 deficiency promoted myofibroblast proliferation and halted profibrotic activation, reducing the severity of renal fibrosis in vivo . In renal myofibroblasts, WWP2 deficiency increased fatty acid oxidation and activated the pentose phosphate pathway, boosting mitochondrial respiration at the expense of glycolysis. WWP2 suppressed the transcription of peroxisome proliferator-activated receptor gamma coactivator 1-alpha (PGC-1α), a metabolic mediator of fibrotic response, and pharmacologic inhibition of PGC-1α partially abrogated the protective effects of WWP2 deficiency on myofibroblasts. Conclusions WWP2 regulates the metabolic reprogramming of profibrotic myofibroblasts by a WWP2-PGC-1α axis, and WWP2 deficiency protects against renal fibrosis in CKD.
Collapse
Affiliation(s)
- Huimei Chen
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Ran You
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Jing Guo
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Wei Zhou
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Gabriel Chew
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Nithya Devapragash
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Jui Zhi Loh
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Loreto Gesualdo
- Nephrology, Dialysis and Transplantation Unit, Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe-J), University of Bari Aldo Moro, Bari, Italy
| | - Yanwei Li
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Yuteng Jiang
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Elisabeth Li Sa Tan
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
| | - Shuang Chen
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
- School of Science, Institute for Big Data and Artificial Intelligence in Medicine, China Pharmaceutical University, Nanjing, China
| | - Paola Pontrelli
- Nephrology, Dialysis and Transplantation Unit, Department of Precision and Regenerative Medicine and Ionian Area (DiMePRe-J), University of Bari Aldo Moro, Bari, Italy
| | - Francesco Pesce
- Division of Renal Medicine, Fatebenefratelli Isola Tiberina—Gemelli Isola, Rome, Italy
| | - Jacques Behmoaras
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
- Centre for Inflammatory Disease, Imperial College London, Hammersmith Hospital, London, United Kingdom
| | - Aihua Zhang
- Department of Nephrology, Children's Hospital of Nanjing Medical University, Nanjing, China
| | - Enrico Petretto
- Programme in Cardiovascular and Metabolic Disorders (CVMD) and Centre for Computational Biology (CCB), Duke-NUS Medical School, Singapore
- School of Science, Institute for Big Data and Artificial Intelligence in Medicine, China Pharmaceutical University, Nanjing, China
| |
Collapse
|
4
|
Greatbatch CJ, Lu Q, Hung S, Barnett AJ, Wing K, Liang H, Han X, Zhou T, Siggs OM, Mackey DA, Cook AL, Senabouth A, Liu GS, Craig JE, MacGregor S, Powell JE, Hewitt AW. High throughput functional profiling of genes at intraocular pressure loci reveals distinct networks for glaucoma. Hum Mol Genet 2024; 33:739-751. [PMID: 38272457 PMCID: PMC11031357 DOI: 10.1093/hmg/ddae003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 12/18/2023] [Accepted: 04/06/2024] [Indexed: 01/27/2024] Open
Abstract
INTRODUCTION Primary open angle glaucoma (POAG) is a leading cause of blindness globally. Characterized by progressive retinal ganglion cell degeneration, the precise pathogenesis remains unknown. Genome-wide association studies (GWAS) have uncovered many genetic variants associated with elevated intraocular pressure (IOP), one of the key risk factors for POAG. We aimed to identify genetic and morphological variation that can be attributed to trabecular meshwork cell (TMC) dysfunction and raised IOP in POAG. METHODS 62 genes across 55 loci were knocked-out in a primary human TMC line. Each knockout group, including five non-targeting control groups, underwent single-cell RNA-sequencing (scRNA-seq) for differentially-expressed gene (DEG) analysis. Multiplexed fluorescence coupled with CellProfiler image analysis allowed for single-cell morphological profiling. RESULTS Many gene knockouts invoked DEGs relating to matrix metalloproteinases and interferon-induced proteins. We have prioritized genes at four loci of interest to identify gene knockouts that may contribute to the pathogenesis of POAG, including ANGPTL2, LMX1B, CAV1, and KREMEN1. Three genetic networks of gene knockouts with similar transcriptomic profiles were identified, suggesting a synergistic function in trabecular meshwork cell physiology. TEK knockout caused significant upregulation of nuclear granularity on morphological analysis, while knockout of TRIOBP, TMCO1 and PLEKHA7 increased granularity and intensity of actin and the cell-membrane. CONCLUSION High-throughput analysis of cellular structure and function through multiplex fluorescent single-cell analysis and scRNA-seq assays enabled the direct study of genetic perturbations at the single-cell resolution. This work provides a framework for investigating the role of genes in the pathogenesis of glaucoma and heterogenous diseases with a strong genetic basis.
Collapse
Affiliation(s)
- Connor J Greatbatch
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
| | - Qinyi Lu
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
| | - Sandy Hung
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye and Ear Hospital, 32 Gisborne St, East Melbourne 3002, Australia
| | - Alexander J Barnett
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
| | - Kristof Wing
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
| | - Helena Liang
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye and Ear Hospital, 32 Gisborne St, East Melbourne 3002, Australia
| | - Xikun Han
- QIMR Berghofer Medical Research Institute, 300 Herston Rd, Herston, Brisbane 4006, Australia
| | - Tiger Zhou
- Department of Ophthalmology, Flinders University, Flinders Medical Centre, 1 Flinders Dr, Bedford Park, South Australia 5042, Australia
| | - Owen M Siggs
- Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, Sydney, NSW 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, Short Street, St George Hospital KOGARAH UNSW, Sydney 2217, Australia
| | - David A Mackey
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
- Lions Eye Institute, Centre for Vision Sciences, University of Western Australia, 2 Verdun Street Nedlands WA 6009, Australia
| | - Anthony L Cook
- Wicking Dementia Research and Education Centre, University of Tasmania, 17 Liverpool Street, Hobart, TAS 7000, Australia
| | - Anne Senabouth
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, Sydney, NSW 2010, Australia
| | - Guei-Sheung Liu
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
| | - Jamie E Craig
- Department of Ophthalmology, Flinders University, Flinders Medical Centre, 1 Flinders Dr, Bedford Park, South Australia 5042, Australia
| | - Stuart MacGregor
- QIMR Berghofer Medical Research Institute, 300 Herston Rd, Herston, Brisbane 4006, Australia
| | - Joseph E Powell
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, Sydney, NSW 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, 384 Victoria St, Darlinghurst, Sydney, NSW 2010, Australia
| | - Alex W Hewitt
- Menzies Institute for Medical Research, University of Tasmania, 17 Liverpool Street, Hobart, Tasmania 7000, Australia
- Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye and Ear Hospital, 32 Gisborne St, East Melbourne 3002, Australia
| |
Collapse
|
5
|
Alghamdi S, Turki T. A novel interpretable deep transfer learning combining diverse learnable parameters for improved T2D prediction based on single-cell gene regulatory networks. Sci Rep 2024; 14:4491. [PMID: 38396138 PMCID: PMC10891129 DOI: 10.1038/s41598-024-54923-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 02/18/2024] [Indexed: 02/25/2024] Open
Abstract
Accurate deep learning (DL) models to predict type 2 diabetes (T2D) are concerned not only with targeting the discrimination task but also with learning useful feature representation. However, existing DL tools are far from perfect and do not provide appropriate interpretation as a guideline to explain and promote superior performance in the target task. Therefore, we provide an interpretable approach for our presented deep transfer learning (DTL) models to overcome such drawbacks, working as follows. We utilize several pre-trained models including SEResNet152, and SEResNeXT101. Then, we transfer knowledge from pre-trained models via keeping the weights in the convolutional base (i.e., feature extraction part) while modifying the classification part with the use of Adam optimizer to deal with classifying healthy controls and T2D based on single-cell gene regulatory network (SCGRN) images. Another DTL models work in a similar manner but just with keeping weights of the bottom layers in the feature extraction unaltered while updating weights of consecutive layers through training from scratch. Experimental results on the whole 224 SCGRN images using five-fold cross-validation show that our model (TFeSEResNeXT101) achieving the highest average balanced accuracy (BAC) of 0.97 and thereby significantly outperforming the baseline that resulted in an average BAC of 0.86. Moreover, the simulation study demonstrated that the superiority is attributed to the distributional conformance of model weight parameters obtained with Adam optimizer when coupled with weights from a pre-trained model.
Collapse
Affiliation(s)
- Sumaya Alghamdi
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia
- Department of Computer Science, Albaha University, 65799, Albaha, Saudi Arabia
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, 21589, Jeddah, Saudi Arabia.
| |
Collapse
|
6
|
Li G, Zhao H, Cheng Z, Liu J, Li G, Guo Y. Single-cell transcriptomic profiling of heart reveals ANGPTL4 linking fibroblasts and angiogenesis in heart failure with preserved ejection fraction. J Adv Res 2024:S2090-1232(24)00068-7. [PMID: 38346487 DOI: 10.1016/j.jare.2024.02.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 02/06/2024] [Accepted: 02/07/2024] [Indexed: 02/19/2024] Open
Abstract
INTRODUCTION Despite the high morbidity and mortality, the effective therapies for heart failure with preserved fraction (HFpEF) are limited as the poor understand of its pathophysiological basis. OBJECTIVE This study was aimed to characterize the cellular heterogeneity and potential mechanisms of HFpEF at single-cell resolution. METHODS An HFpEF mouse model was induced by a high-fat diet with N-nitro-L-arginine methyl ester. Cells from the hearts were subjected to single-cell sequencing. The key protein expression was measured with Immunohistochemistry and immunofluorescence staining. RESULTS In HFpEF hearts, myocardial fibroblasts exhibited higher levels of fibrosis. Furthermore, an increased number of fibroblasts differentiated into high-metabolism and high-fibrosis phenotypes. The expression levels of genes encoding certain pro-angiogenic secreted proteins were decreased in the HFpEF group, as confirmed by bulk RNA sequencing. Additionally, the proportion of the endothelial cell (EC) lineages in the HFpEF group was significantly downregulated, with low angiogenesis and high apoptosis phenotypes observed in these EC lineages. Interestingly, the fibroblasts in the HFpEF heart might cross-link with the EC lineages via over-secretion of ANGPTL4, thus displaying an anti-angiogenic function. Immunohistochemistry and immunofluorescence staining then revealed the downregulation of vascular density and upregulation of ANGPTL4 expression in HFpEF hearts. Finally, we predicted ANGPTL4as a potential druggable target using DrugnomeAI. CONCLUSION In conclusion, this study comprehensively characterized the angiogenesis impairment in HFpEF hearts at single-cell resolution and proposed that ANGPTL4 secretion by fibroblasts may be a potential mechanism underlying this angiogenic abnormality.
Collapse
Affiliation(s)
- Guoxing Li
- Institute of Life Sciences, Chongqing Medical University, 400016, China
| | - Huilin Zhao
- Institute of Life Sciences, Chongqing Medical University, 400016, China
| | - Zhe Cheng
- Department of Cardiology, Chongqing University Three Gorges Hospital, Chongqing 404199, China
| | - Junjin Liu
- Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Gang Li
- Institute of Life Sciences, Chongqing Medical University, 400016, China; Molecular Medicine Diagnostic and Testing Center, Chongqing Medical University, 400016, China.
| | - Yongzheng Guo
- Department of Cardiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| |
Collapse
|
7
|
Li Z, Liu X, Wang L, Zhao H, Wang S, Yu G, Wu D, Chu J, Han J. Integrated analysis of single-cell RNA-seq and bulk RNA-seq reveals RNA N6-methyladenosine modification associated with prognosis and drug resistance in acute myeloid leukemia. Front Immunol 2023; 14:1281687. [PMID: 38022588 PMCID: PMC10644381 DOI: 10.3389/fimmu.2023.1281687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Accepted: 10/19/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction Acute myeloid leukemia (AML) is a type of blood cancer that is identified by the unrestricted growth of immature myeloid cells within the bone marrow. Despite therapeutic advances, AML prognosis remains highly variable, and there is a lack of biomarkers for customizing treatment. RNA N6-methyladenosine (m6A) modification is a reversible and dynamic process that plays a critical role in cancer progression and drug resistance. Methods To investigate the m6A modification patterns in AML and their potential clinical significance, we used the AUCell method to describe the m6A modification activity of cells in AML patients based on 23 m6A modification enzymes and further integrated with bulk RNA-seq data. Results We found that m6A modification was more effective in leukemic cells than in immune cells and induced significant changes in gene expression in leukemic cells rather than immune cells. Furthermore, network analysis revealed a correlation between transcription factor activation and the m6A modification status in leukemia cells, while active m6A-modified immune cells exhibited a higher interaction density in their gene regulatory networks. Hierarchical clustering based on m6A-related genes identified three distinct AML subtypes. The immune dysregulation subtype, characterized by RUNX1 mutation and KMT2A copy number variation, was associated with a worse prognosis and exhibited a specific gene expression pattern with high expression level of IGF2BP3 and FMR1, and low expression level of ELAVL1 and YTHDF2. Notably, patients with the immune dysregulation subtype were sensitive to immunotherapy and chemotherapy. Discussion Collectively, our findings suggest that m6A modification could be a potential therapeutic target for AML, and the identified subtypes could guide personalized therapy.
Collapse
Affiliation(s)
- Zhongzheng Li
- State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Institute of Biomedical Science, Henan Normal University, Xinxiang, China
| | - Xin Liu
- Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Lan Wang
- State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Institute of Biomedical Science, Henan Normal University, Xinxiang, China
| | - Huabin Zhao
- State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Institute of Biomedical Science, Henan Normal University, Xinxiang, China
| | - Shenghui Wang
- State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Institute of Biomedical Science, Henan Normal University, Xinxiang, China
| | - Guoying Yu
- State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Institute of Biomedical Science, Henan Normal University, Xinxiang, China
| | - Depei Wu
- Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Jianhong Chu
- Institute of Blood and Marrow Transplantation, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| | - Jingjing Han
- The First Affiliated Hospital of Soochow University, National Clinical Research Center for Hematologic Diseases, Jiangsu Institute of Hematology, Collaborative Innovation Center of Hematology, Soochow University, Suzhou, China
| |
Collapse
|
8
|
Li Y, Zhang SW, Xie MY, Zhang T. PhenoDriver: interpretable framework for studying personalized phenotype-associated driver genes in breast cancer. Brief Bioinform 2023; 24:bbad291. [PMID: 37738403 DOI: 10.1093/bib/bbad291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 07/12/2023] [Accepted: 07/27/2023] [Indexed: 09/24/2023] Open
Abstract
Identifying personalized cancer driver genes and further revealing their oncogenic mechanisms is critical for understanding the mechanisms of cell transformation and aiding clinical diagnosis. Almost all existing methods primarily focus on identifying driver genes at the cohort or individual level but fail to further uncover their underlying oncogenic mechanisms. To fill this gap, we present an interpretable framework, PhenoDriver, to identify personalized cancer driver genes, elucidate their roles in cancer development and uncover the association between driver genes and clinical phenotypic alterations. By analyzing 988 breast cancer patients, we demonstrate the outstanding performance of PhenoDriver in identifying breast cancer driver genes at the cohort level compared to other state-of-the-art methods. Otherwise, our PhenoDriver can also effectively identify driver genes with both recurrent and rare mutations in individual patients. We further explore and reveal the oncogenic mechanisms of some known and unknown breast cancer driver genes (e.g. TP53, MAP3K1, HTT, etc.) identified by PhenoDriver, and construct their subnetworks for regulating clinical abnormal phenotypes. Notably, most of our findings are consistent with existing biological knowledge. Based on the personalized driver profiles, we discover two existing and one unreported breast cancer subtypes and uncover their molecular mechanisms. These results intensify our understanding for breast cancer mechanisms, guide therapeutic decisions and assist in the development of targeted anticancer therapies.
Collapse
Affiliation(s)
- Yan Li
- School of Automation from Northwestern Polytechnical University, China
| | - Shao-Wu Zhang
- School of Automation from Northwestern Polytechnical University, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, China
| | - Ming-Yu Xie
- School of Automation from Northwestern Polytechnical University, China
| | - Tong Zhang
- School of Automation from Northwestern Polytechnical University, China
| |
Collapse
|
9
|
Beppu AK, Zhao J, Yao C, Carraro G, Israely E, Coelho AL, Drake K, Hogaboam CM, Parks WC, Kolls JK, Stripp BR. Epithelial plasticity and innate immune activation promote lung tissue remodeling following respiratory viral infection. Nat Commun 2023; 14:5814. [PMID: 37726288 PMCID: PMC10509177 DOI: 10.1038/s41467-023-41387-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 09/02/2023] [Indexed: 09/21/2023] Open
Abstract
Epithelial plasticity has been suggested in lungs of mice following genetic depletion of stem cells but is of unknown physiological relevance. Viral infection and chronic lung disease share similar pathological features of stem cell loss in alveoli, basal cell (BC) hyperplasia in small airways, and innate immune activation, that contribute to epithelial remodeling and loss of lung function. We show that a subset of distal airway secretory cells, intralobar serous (IS) cells, are activated to assume BC fates following influenza virus infection. Injury-induced hyperplastic BC (hBC) differ from pre-existing BC by high expression of IL-22Ra1 and undergo IL-22-dependent expansion for colonization of injured alveoli. Resolution of virus-elicited inflammation results in BC to IS re-differentiation in repopulated alveoli, and increased local expression of protective antimicrobial factors, but fails to restore normal alveolar epithelium responsible for gas exchange.
Collapse
Affiliation(s)
- Andrew K Beppu
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Juanjuan Zhao
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Changfu Yao
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Gianni Carraro
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Edo Israely
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Anna Lucia Coelho
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Katherine Drake
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Cory M Hogaboam
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - William C Parks
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA
| | - Jay K Kolls
- Tulane Center for Translational Research in Infection and Inflammation, School of Medicine, New Orleans, LA, 70112, USA
| | - Barry R Stripp
- Department of Medicine, Women's Guild Lung Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
- Department of Medicine, Regenerative Medicine Institute, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA.
| |
Collapse
|
10
|
Wang RH, Wang J, Li SC. Probabilistic tensor decomposition extracts better latent embeddings from single-cell multiomic data. Nucleic Acids Res 2023; 51:e81. [PMID: 37403780 PMCID: PMC10450184 DOI: 10.1093/nar/gkad570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 06/01/2023] [Accepted: 06/23/2023] [Indexed: 07/06/2023] Open
Abstract
Single-cell sequencing technology enables the simultaneous capture of multiomic data from multiple cells. The captured data can be represented by tensors, i.e. the higher-rank matrices. However, the existing analysis tools often take the data as a collection of two-order matrices, renouncing the correspondences among the features. Consequently, we propose a probabilistic tensor decomposition framework, SCOIT, to extract embeddings from single-cell multiomic data. SCOIT incorporates various distributions, including Gaussian, Poisson, and negative binomial distributions, to deal with sparse, noisy, and heterogeneous single-cell data. Our framework can decompose a multiomic tensor into a cell embedding matrix, a gene embedding matrix, and an omic embedding matrix, allowing for various downstream analyses. We applied SCOIT to eight single-cell multiomic datasets from different sequencing protocols. With cell embeddings, SCOIT achieves superior performance for cell clustering compared to nine state-of-the-art tools under various metrics, demonstrating its ability to dissect cellular heterogeneity. With the gene embeddings, SCOIT enables cross-omics gene expression analysis and integrative gene regulatory network study. Furthermore, the embeddings allow cross-omics imputation simultaneously, outperforming current imputation methods with the Pearson correlation coefficient increased by 3.38-39.26%; moreover, SCOIT accommodates the scenario that subsets of the cells are with merely one omic profile available.
Collapse
Affiliation(s)
- Ruo Han Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Jianping Wang
- Department of Computer Science, City University of Hong Kong, Hong Kong
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong
| |
Collapse
|
11
|
Lei Y, Meng Q, Hong F, Zhao M, Gao X. Pan-cancer survey of lncRNA rewiring and functional alternation in tumor-infiltrating T cell by scLNC. Cancer Lett 2023:216319. [PMID: 37468058 DOI: 10.1016/j.canlet.2023.216319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/27/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Long non-coding RNAs (lncRNAs) have been reported to involve in diverse biological processes, including tumor immunity. Since lncRNAs are expressed with high cell-type specificity, investigation of lncRNAs at the single-cell level will unveil the cell-type-specific functions of lncRNAs. However, at the single-cell level, a systematic pan-cancer analysis of lncRNA functions in tumor immune microenvironments (TIMEs) remains lacking. Here, we performed pan-cancer single-cell profiling of lncRNA functions in TIMEs and developed a tool, scLNC, tailored for lncRNA functional characterization at the single-cell level. scLNC enabled the comparison of lncRNA function from the levels of lncRNA-mRNA pairs, lncRNA regulatory unit activity and unit function in a cell-type-specific manner. Applying scLNC, our analysis depicted the cross-tumor and tumor-specific lncRNA regulatory profiles in the T cell subtypes and revealed the new regulatory units that lncRNAs established in tumor-infiltrating T cells, particularly in the tumor-enriched T cells. We further characterized the activity and functional alternations of lncRNAs through their regulatory units. Overall, our findings suggested that lncRNAs played an important role in the regulation of cytokine production, cell activation and migration in tumor-enriched T cells and further in immunotherapy.
Collapse
Affiliation(s)
- Yang Lei
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Qianqian Meng
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Fang Hong
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Mengyu Zhao
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Xin Gao
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China.
| |
Collapse
|
12
|
Brunson T, Sanati N, Matthews L, Haw R, Beavers D, Shorser S, Sevilla C, Viteri G, Conley P, Rothfels K, Hermjakob H, Stein L, D’Eustachio P, Wu G. Illuminating Dark Proteins using Reactome Pathways. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.05.543335. [PMID: 37333417 PMCID: PMC10274615 DOI: 10.1101/2023.06.05.543335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Limited knowledge about a substantial portion of protein coding genes, known as "dark" proteins, hinders our understanding of their functions and potential therapeutic applications. To address this, we leveraged Reactome, the most comprehensive, open source, open-access pathway knowledgebase, to contextualize dark proteins within biological pathways. By integrating multiple resources and employing a random forest classifier trained on 106 protein/gene pairwise features, we predicted functional interactions between dark proteins and Reactome-annotated proteins. We then developed three scores to measure the interactions between dark proteins and Reactome pathways, utilizing enrichment analysis and fuzzy logic simulations. Correlation analysis of these scores with an independent single-cell RNA sequencing dataset provided supporting evidence for this approach. Furthermore, systematic natural language processing (NLP) analysis of over 22 million PubMed abstracts and manual checking of the literature associated with 20 randomly selected dark proteins reinforced the predicted interactions between proteins and pathways. To enhance the visualization and exploration of dark proteins within Reactome pathways, we developed the Reactome IDG portal, deployed at https://idg.reactome.org, a web application featuring tissue-specific protein and gene expression overlay, as well as drug interactions. Our integrated computational approach, together with the user-friendly web platform, offers a valuable resource for uncovering potential biological functions and therapeutic implications of dark proteins.
Collapse
Affiliation(s)
| | - Nasim Sanati
- Oregon Health & Science University, Portland, OR 97239, USA
| | | | - Robin Haw
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Deidre Beavers
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Solomon Shorser
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Cristoffer Sevilla
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Guilherme Viteri
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Patrick Conley
- Oregon Health & Science University, Portland, OR 97239, USA
| | - Karen Rothfels
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G0A3, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S1A1, Canada
| | | | - Guanming Wu
- Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
13
|
Chatterjee D, Deng WM. Standardization of Single-Cell RNA-Sequencing Analysis Workflow to Study Drosophila Ovary. Methods Mol Biol 2023; 2677:151-171. [PMID: 37464241 DOI: 10.1007/978-1-0716-3259-8_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
Developments in single-cell technology have considerably changed the way we study biology. Significant efforts have been made over the last few years to build comprehensive cell-type-specific transcriptomic atlases for a wide range of tissues in several model organisms in order to discover cell-type-specific markers and drivers of gene expression. One such tissue is the ovary of the fruit-fly Drosophila melanogaster, which is a popular model system with wide-ranging applications in the study of both development and disease. Three independent studies have recently produced comprehensive maps of cell-type-specific gene expression that describe both spatiotemporal regulation of the process of oogenesis and unique transcriptomic profiles of different cell types that constitute the ovary. In this chapter, we outlined the wet-lab protocol that was followed in our recent study for sample preparation and reanalyze the resultant dataset to discuss the benchmarks in data analysis, which are fundamental to comprehensive curation of the single-cell dataset representing the fly ovary.
Collapse
Affiliation(s)
- Deeptiman Chatterjee
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, Tulane Cancer Center, New Orleans, LA, USA.
- Current address: Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.
| | - Wu-Min Deng
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, Tulane Cancer Center, New Orleans, LA, USA.
| |
Collapse
|
14
|
Mahalanabis A, Turinsky A, Husic M, Christensen E, Luo P, Naidas A, Brudno M, Pugh T, Ramani A, Shooshtari P. Evaluation of Single-cell RNA-seq Clustering Algorithms on Cancer Tumor Datasets. Comput Struct Biotechnol J 2022; 20:6375-6387. [DOI: 10.1016/j.csbj.2022.10.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 10/19/2022] [Accepted: 10/20/2022] [Indexed: 11/03/2022] Open
|
15
|
Saul D, Kosinsky RL, Atkinson EJ, Doolittle ML, Zhang X, LeBrasseur NK, Pignolo RJ, Robbins PD, Niedernhofer LJ, Ikeno Y, Jurk D, Passos JF, Hickson LJ, Xue A, Monroe DG, Tchkonia T, Kirkland JL, Farr JN, Khosla S. A new gene set identifies senescent cells and predicts senescence-associated pathways across tissues. Nat Commun 2022; 13:4827. [PMID: 35974106 PMCID: PMC9381717 DOI: 10.1038/s41467-022-32552-1] [Citation(s) in RCA: 193] [Impact Index Per Article: 96.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 08/05/2022] [Indexed: 02/01/2023] Open
Abstract
Although cellular senescence drives multiple age-related co-morbidities through the senescence-associated secretory phenotype, in vivo senescent cell identification remains challenging. Here, we generate a gene set (SenMayo) and validate its enrichment in bone biopsies from two aged human cohorts. We further demonstrate reductions in SenMayo in bone following genetic clearance of senescent cells in mice and in adipose tissue from humans following pharmacological senescent cell clearance. We next use SenMayo to identify senescent hematopoietic or mesenchymal cells at the single cell level from human and murine bone marrow/bone scRNA-seq data. Thus, SenMayo identifies senescent cells across tissues and species with high fidelity. Using this senescence panel, we are able to characterize senescent cells at the single cell level and identify key intercellular signaling pathways. SenMayo also represents a potentially clinically applicable panel for monitoring senescent cell burden with aging and other conditions as well as in studies of senolytic drugs.
Collapse
Affiliation(s)
- Dominik Saul
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA.
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Trauma, Orthopedics and Reconstructive Surgery, Georg-August-University of Goettingen, Goettingen, Germany.
| | - Robyn Laura Kosinsky
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, 55905, USA
| | | | - Madison L Doolittle
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
| | - Xu Zhang
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Nathan K LeBrasseur
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Robert J Pignolo
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Paul D Robbins
- Institute on the Biology of Aging and Metabolism, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Laura J Niedernhofer
- Institute on the Biology of Aging and Metabolism, Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Yuji Ikeno
- Department of Pathology, University of Texas Health, San Antonio, TX, USA
| | - Diana Jurk
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - João F Passos
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - LaTonya J Hickson
- Division of Nephrology and Hypertension, Mayo Clinic, Jacksonville, FL, USA
| | - Ailing Xue
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
| | - David G Monroe
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
| | - Tamara Tchkonia
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - James L Kirkland
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA
| | - Joshua N Farr
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA.
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA.
| | - Sundeep Khosla
- Division of Endocrinology, Mayo Clinic, Rochester, MN, 55905, USA.
- Robert and Arlene Kogod Center on Aging, Mayo Clinic, Rochester, MN, 55905, USA.
- Department of Physiology and Biomedical Engineering, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
16
|
Bilous M, Tran L, Cianciaruso C, Gabriel A, Michel H, Carmona SJ, Pittet MJ, Gfeller D. Metacells untangle large and complex single-cell transcriptome networks. BMC Bioinformatics 2022; 23:336. [PMID: 35963997 PMCID: PMC9375201 DOI: 10.1186/s12859-022-04861-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 07/23/2022] [Indexed: 12/13/2022] Open
Abstract
Background Single-cell RNA sequencing (scRNA-seq) technologies offer unique opportunities for exploring heterogeneous cell populations. However, in-depth single-cell transcriptomic characterization of complex tissues often requires profiling tens to hundreds of thousands of cells. Such large numbers of cells represent an important hurdle for downstream analyses, interpretation and visualization. Results We develop a framework called SuperCell to merge highly similar cells into metacells and perform standard scRNA-seq data analyses at the metacell level. Our systematic benchmarking demonstrates that metacells not only preserve but often improve the results of downstream analyses including visualization, clustering, differential expression, cell type annotation, gene correlation, imputation, RNA velocity and data integration. By capitalizing on the redundancy inherent to scRNA-seq data, metacells significantly facilitate and accelerate the construction and interpretation of single-cell atlases, as demonstrated by the integration of 1.46 million cells from COVID-19 patients in less than two hours on a standard desktop. Conclusions SuperCell is a framework to build and analyze metacells in a way that efficiently preserves the results of scRNA-seq data analyses while significantly accelerating and facilitating them.
Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04861-1.
Collapse
Affiliation(s)
- Mariia Bilous
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Loc Tran
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Chiara Cianciaruso
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland
| | - Aurélie Gabriel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Hugo Michel
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland
| | - Santiago J Carmona
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Mikael J Pittet
- Department of Pathology and Immunology, University of Geneva, Geneva, Switzerland.,Department of Oncology, Geneva University Hospitals, Geneva, Switzerland.,Center for Systems Biology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - David Gfeller
- Department of Oncology, Ludwig Institute for Cancer Research, University of Lausanne, Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
17
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
18
|
Zhou P, Wang S, Li T, Nie Q. Dissecting transition cells from single-cell transcriptome data through multiscale stochastic dynamics. Nat Commun 2021; 12:5609. [PMID: 34556644 PMCID: PMC8460805 DOI: 10.1038/s41467-021-25548-w] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 08/11/2021] [Indexed: 11/25/2022] Open
Abstract
Advances in single-cell technologies allow scrutinizing of heterogeneous cell states, however, detecting cell-state transitions from snap-shot single-cell transcriptome data remains challenging. To investigate cells with transient properties or mixed identities, we present MuTrans, a method based on multiscale reduction technique to identify the underlying stochastic dynamics that prescribes cell-fate transitions. By iteratively unifying transition dynamics across multiple scales, MuTrans constructs the cell-fate dynamical manifold that depicts progression of cell-state transitions, and distinguishes stable and transition cells. In addition, MuTrans quantifies the likelihood of all possible transition trajectories between cell states using coarse-grained transition path theory. Downstream analysis identifies distinct genes that mark the transient states or drive the transitions. The method is consistent with the well-established Langevin equation and transition rate theory. Applying MuTrans to datasets collected from five different single-cell experimental platforms, we show its capability and scalability to robustly unravel complex cell fate dynamics induced by transition cells in systems such as tumor EMT, iPSC differentiation and blood cell differentiation. Overall, our method bridges data-driven and model-based approaches on cell-fate transitions at single-cell resolution.
Collapse
Affiliation(s)
- Peijie Zhou
- LMAM and School of Mathematical Sciences, Peking University, Beijing, China
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Shuxiong Wang
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA
| | - Tiejun Li
- LMAM and School of Mathematical Sciences, Peking University, Beijing, China.
| | - Qing Nie
- Department of Mathematics, University of California, Irvine, Irvine, CA, USA.
- Department of Cell and Developmental Biology, University of California, Irvine, CA, USA.
| |
Collapse
|
19
|
Chen Y, Zhang Y, Li JYH, Ouyang Z. LISA2: Learning Complex Single-Cell Trajectory and Expression Trends. Front Genet 2021; 12:681206. [PMID: 34512717 PMCID: PMC8428276 DOI: 10.3389/fgene.2021.681206] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 06/01/2021] [Indexed: 12/20/2022] Open
Abstract
Single-cell transcriptional and epigenomics profiles have been applied in a variety of tissues and diseases for discovering new cell types, differentiation trajectories, and gene regulatory networks. Many methods such as Monocle 2/3, URD, and STREAM have been developed for tree-based trajectory building. Here, we propose a fast and flexible trajectory learning method, LISA2, for single-cell data analysis. This new method has two distinctive features: (1) LISA2 utilizes specified leaves and root to reduce the complexity for building the developmental trajectory, especially for some special cases such as rare cell populations and adjacent terminal cell states; and (2) LISA2 is applicable for both transcriptomics and epigenomics data. LISA2 visualizes complex trajectories using 3D Landmark ISOmetric feature MAPping (L-ISOMAP). We apply LISA2 to simulation and real datasets in cerebellum, diencephalon, and hematopoietic stem cells including both single-cell transcriptomics data and single-cell assay for transposase-accessible chromatin data. LISA2 is efficient in estimating single-cell trajectory and expression trends for different kinds of molecular state of cells.
Collapse
Affiliation(s)
- Yang Chen
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| | - Yuping Zhang
- Department of Statistics, University of Connecticut, Storrs, CT, United States
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
| | - James Y. H. Li
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, United States
- Department of Genetics and Genome Sciences, School of Medicine, University of Connecticut, Farmington, CT, United States
| | - Zhengqing Ouyang
- Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, United States
| |
Collapse
|
20
|
Cardona-Alberich A, Tourbez M, Pearce SF, Sibley CR. Elucidating the cellular dynamics of the brain with single-cell RNA sequencing. RNA Biol 2021; 18:1063-1084. [PMID: 33499699 PMCID: PMC8216183 DOI: 10.1080/15476286.2020.1870362] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/17/2020] [Accepted: 12/24/2020] [Indexed: 12/18/2022] Open
Abstract
Single-cell RNA-sequencing (scRNA-seq) has emerged in recent years as a breakthrough technology to understand RNA metabolism at cellular resolution. In addition to allowing new cell types and states to be identified, scRNA-seq can permit cell-type specific differential gene expression changes, pre-mRNA processing events, gene regulatory networks and single-cell developmental trajectories to be uncovered. More recently, a new wave of multi-omic adaptations and complementary spatial transcriptomics workflows have been developed that facilitate the collection of even more holistic information from individual cells. These developments have unprecedented potential to provide penetrating new insights into the basic neural cell dynamics and molecular mechanisms relevant to the nervous system in both health and disease. In this review we discuss this maturation of single-cell RNA-sequencing over the past decade, and review the different adaptations of the technology that can now be applied both at different scales and for different purposes. We conclude by highlighting how these methods have already led to many exciting discoveries across neuroscience that have furthered our cellular understanding of the neurological disease.
Collapse
Affiliation(s)
- Aida Cardona-Alberich
- Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological Sciences, Edinburgh University, Edinburgh, UK
| | - Manon Tourbez
- Simons Initiative for the Developing Brain, University of Edinburgh, Edinburgh, UK
| | - Sarah F. Pearce
- Simons Initiative for the Developing Brain, University of Edinburgh, Edinburgh, UK
| | - Christopher R. Sibley
- Institute of Quantitative Biology, Biochemistry and Biotechnology, School of Biological Sciences, Edinburgh University, Edinburgh, UK
- Simons Initiative for the Developing Brain, University of Edinburgh, Edinburgh, UK
- Centre for Discovery Brain Sciences, University of Edinburgh, Edinburgh, UK
- Euan MacDonald Centre for MND Research, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
21
|
Causeret F, Moreau MX, Pierani A, Blanquie O. The multiple facets of Cajal-Retzius neurons. Development 2021; 148:268379. [PMID: 34047341 DOI: 10.1242/dev.199409] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Cajal-Retzius neurons (CRs) are among the first-born neurons in the developing cortex of reptiles, birds and mammals, including humans. The peculiarity of CRs lies in the fact they are initially embedded into the immature neuronal network before being almost completely eliminated by cell death at the end of cortical development. CRs are best known for controlling the migration of glutamatergic neurons and the formation of cortical layers through the secretion of the glycoprotein reelin. However, they have been shown to play numerous additional key roles at many steps of cortical development, spanning from patterning and sizing functional areas to synaptogenesis. The use of genetic lineage tracing has allowed the discovery of their multiple ontogenetic origins, migratory routes, expression of molecular markers and death dynamics. Nowadays, single-cell technologies enable us to appreciate the molecular heterogeneity of CRs with an unprecedented resolution. In this Review, we discuss the morphological, electrophysiological, molecular and genetic criteria allowing the identification of CRs. We further expose the various sources, migration trajectories, developmental functions and death dynamics of CRs. Finally, we demonstrate how the analysis of public transcriptomic datasets allows extraction of the molecular signature of CRs throughout their transient life and consider their heterogeneity within and across species.
Collapse
Affiliation(s)
- Frédéric Causeret
- Université de Paris, Imagine Institute, Team Genetics and Development of the Cerebral Cortex, F-75015 Paris, France.,Université de Paris, Institute of Psychiatry and Neuroscience of Paris, INSERM U1266, F-75014 Paris, France
| | - Matthieu X Moreau
- Université de Paris, Imagine Institute, Team Genetics and Development of the Cerebral Cortex, F-75015 Paris, France.,Université de Paris, Institute of Psychiatry and Neuroscience of Paris, INSERM U1266, F-75014 Paris, France
| | - Alessandra Pierani
- Université de Paris, Imagine Institute, Team Genetics and Development of the Cerebral Cortex, F-75015 Paris, France.,Université de Paris, Institute of Psychiatry and Neuroscience of Paris, INSERM U1266, F-75014 Paris, France.,Groupe Hospitalier Universitaire Paris Psychiatrie et Neurosciences, F-75014 Paris, France
| | - Oriane Blanquie
- Institute of Physiology, University Medical Center of the Johannes Gutenberg University Mainz, D-55128 Mainz, Germany
| |
Collapse
|
22
|
Single-cell technologies and analyses in hematopoiesis and hematological malignancies. Exp Hematol 2021; 98:1-13. [PMID: 33979683 DOI: 10.1016/j.exphem.2021.05.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 04/29/2021] [Accepted: 05/03/2021] [Indexed: 01/03/2023]
Abstract
In recent years, single-cell technologies have emerged as breakthrough techniques that enable the characterization of hematopoietic cell populations of normal and malignant tissue samples and will be combined in the near future with bulk technologies, currently used in clinical practice, to improve diagnosis, prognosis, and the search for novel molecular targets. These single-cell methods have the advantage of not masking cell-to-cell variation features and involve the study of genetic, epigenetic, transcriptional, and proteomic landscapes from a single-cell perspective. Latest advances in this field have enabled the development of novel strategies that significantly increase both sensitivity and high throughput. In this review, we emphasize emerging techniques aimed at assessing individual or multiomic parameters at single-cell resolution and analyze how these technologies have helped us understand hematopoietic variability and identify unknown and/or rare subpopulations. We also summarize the impact of these single-cell profiling strategies on the characterization of cell diversity within the tumor and the clonal evolution of multiple hematological malignancies in samples from untreated and treated patients, which provide valuable information for diagnosis, prognosis, and future treatments and explain why current therapies may fail. However, despite these improvements, new challenges lie ahead.
Collapse
|
23
|
Kathiriya IS, Rao KS, Iacono G, Devine WP, Blair AP, Hota SK, Lai MH, Garay BI, Thomas R, Gong HZ, Wasson LK, Goyal P, Sukonnik T, Hu KM, Akgun GA, Bernard LD, Akerberg BN, Gu F, Li K, Speir ML, Haeussler M, Pu WT, Stuart JM, Seidman CE, Seidman JG, Heyn H, Bruneau BG. Modeling Human TBX5 Haploinsufficiency Predicts Regulatory Networks for Congenital Heart Disease. Dev Cell 2021; 56:292-309.e9. [PMID: 33321106 PMCID: PMC7878434 DOI: 10.1016/j.devcel.2020.11.020] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 09/23/2020] [Accepted: 11/18/2020] [Indexed: 01/10/2023]
Abstract
Haploinsufficiency of transcriptional regulators causes human congenital heart disease (CHD); however, the underlying CHD gene regulatory network (GRN) imbalances are unknown. Here, we define transcriptional consequences of reduced dosage of the CHD transcription factor, TBX5, in individual cells during cardiomyocyte differentiation from human induced pluripotent stem cells (iPSCs). We discovered highly sensitive dysregulation of TBX5-dependent pathways-including lineage decisions and genes associated with heart development, cardiomyocyte function, and CHD genetics-in discrete subpopulations of cardiomyocytes. Spatial transcriptomic mapping revealed chamber-restricted expression for many TBX5-sensitive transcripts. GRN analysis indicated that cardiac network stability, including vulnerable CHD-linked nodes, is sensitive to TBX5 dosage. A GRN-predicted genetic interaction between Tbx5 and Mef2c, manifesting as ventricular septation defects, was validated in mice. These results demonstrate exquisite and diverse sensitivity to TBX5 dosage in heterogeneous subsets of iPSC-derived cardiomyocytes and predicts candidate GRNs for human CHDs, with implications for quantitative transcriptional regulation in disease.
Collapse
Affiliation(s)
- Irfan S Kathiriya
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94158, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA.
| | - Kavitha S Rao
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94158, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Giovanni Iacono
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08028 Barcelona, Spain
| | - W Patrick Devine
- Gladstone Institutes, San Francisco, CA 94158, USA; Department of Pathology, University of California, San Francisco, CA 94158, USA
| | - Andrew P Blair
- Gladstone Institutes, San Francisco, CA 94158, USA; Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Swetansu K Hota
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA; Cardiovascular Research Institute, University of California, San Francisco, CA 94158, USA
| | - Michael H Lai
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Bayardo I Garay
- Department of Anesthesia and Perioperative Care, University of California, San Francisco, CA 94158, USA; Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | | | - Henry Z Gong
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Lauren K Wasson
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Howard Hughes Medical Institute, Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Piyush Goyal
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Tatyana Sukonnik
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Kevin M Hu
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Gunes A Akgun
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Laure D Bernard
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA
| | - Brynn N Akerberg
- Department of Cardiology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Fei Gu
- Department of Cardiology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Kai Li
- Department of Cardiology, Boston Children's Hospital, Boston, MA 02115, USA
| | - Matthew L Speir
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | | | - William T Pu
- Department of Cardiology, Boston Children's Hospital, Boston, MA 02115, USA; Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02115, USA
| | - Joshua M Stuart
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christine E Seidman
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Howard Hughes Medical Institute, Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - J G Seidman
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Holger Heyn
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), 08028 Barcelona, Spain; Universitat Pompeu Fabra, 08028 Barcelona, Spain
| | - Benoit G Bruneau
- Gladstone Institutes, San Francisco, CA 94158, USA; Roddenberry Center for Stem Cell Biology and Medicine at Gladstone, San Francisco, CA 94158, USA; Cardiovascular Research Institute, University of California, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, CA 94158, USA.
| |
Collapse
|
24
|
Turki T, Taguchi YH. Discriminating the single-cell gene regulatory networks of human pancreatic islets: A novel deep learning application. Comput Biol Med 2021; 132:104257. [PMID: 33740535 DOI: 10.1016/j.compbiomed.2021.104257] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2020] [Revised: 02/01/2021] [Accepted: 02/03/2021] [Indexed: 12/24/2022]
Abstract
Analysis of single-cell pancreatic data can play an important role in understanding various metabolic diseases and health conditions. Due to the sparsity and noise present in such single-cell gene expression data, inference of single-cell gene regulatory networks remains a challenge. Since recent studies have reported the reliable inference of single-cell gene regulatory networks (SCGRNs), the current study focused on discriminating the SCGRNs of T2D patients from those of healthy controls. By accurately distinguishing SCGRNs of healthy pancreas from those of T2D pancreas, it would be possible to annotate, organize, visualize, and identify common patterns of SCGRNs in metabolic diseases. Such annotated SCGRNs could play an important role in accelerating the process of building large data repositories. This study aimed to contribute to the development of a novel deep learning (DL) application. First, we generated a dataset consisting of 224 SCGRNs belonging to both T2D and healthy pancreas and made it freely available. Next, we chose seven DL architectures, including VGG16, VGG19, Xception, ResNet50, ResNet101, DenseNet121, and DenseNet169, trained each of them on the dataset, and checked their prediction based on a test set. Of note, we evaluated the DL architectures on a single NVIDIA GeForce RTX 2080Ti GPU. Experimental results on the whole dataset, using several performance measures, demonstrated the superiority of VGG19 DL model in the automatic classification of SCGRNs, derived from the single-cell pancreatic data.
Collapse
Affiliation(s)
- Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
| | - Y-H Taguchi
- Department of Physics, Chuo University, Tokyo, 112-8551, Japan.
| |
Collapse
|
25
|
Wang HY, Zhao JP, Zheng CH. SUSCC: Secondary Construction of Feature Space based on UMAP for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data. Interdiscip Sci 2021; 13:83-90. [PMID: 33475958 DOI: 10.1007/s12539-020-00411-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/08/2020] [Accepted: 12/19/2020] [Indexed: 10/22/2022]
Abstract
Clustering is a common method to identify cell types in single cell analysis, but the increasing size of scRNA-seq datasets brings challenges to single cell clustering. Therefore, it is an urgent need to design a faster and more accurate clustering method for large-scale scRNA-seq data. In this paper, we proposed a new method for single cell clustering. First, a count matrix is constructed through normalization and gene filtration. Second, the raw data of gene expression matrix are projected to feature space constructed by secondary construction of feature space based on UMAP (Uniform Manifold Approximation and Projection). Third, the low-dimensional matrix on the feature space is randomly divided into two sub-matrices according to a certain proportion for clustering and classifying, respectively. Finally, one subset is clustered by k-means algorithm and then the other subset is classified by k-nearest neighbor algorithm based on clustering results. Experimental results show that our method can cluster the scRNA-seq datasets effectively.
Collapse
Affiliation(s)
- Hai-Yun Wang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jian-Ping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China. .,Institute of Mathematics and Physics, Xinjiang University, Urumqi, China.
| | - Chun-Hou Zheng
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China. .,College of Computer Science and Technology, Anhui University, Hefei, China.
| |
Collapse
|
26
|
Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm. Genes (Basel) 2020; 12:genes12010028. [PMID: 33379262 PMCID: PMC7824175 DOI: 10.3390/genes12010028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/16/2020] [Accepted: 12/22/2020] [Indexed: 12/02/2022] Open
Abstract
Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.
Collapse
|
27
|
Venkatasubramanian M, Chetal K, Schnell DJ, Atluri G, Salomonis N. Resolving single-cell heterogeneity from hundreds of thousands of cells through sequential hybrid clustering and NMF. Bioinformatics 2020; 36:3773-3780. [PMID: 32207533 PMCID: PMC7320606 DOI: 10.1093/bioinformatics/btaa201] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 02/20/2020] [Accepted: 03/19/2020] [Indexed: 12/13/2022] Open
Abstract
Motivation The rapid proliferation of single-cell RNA-sequencing (scRNA-Seq) technologies has spurred the development of diverse computational approaches to detect transcriptionally coherent populations. While the complexity of the algorithms for detecting heterogeneity has increased, most require significant user-tuning, are heavily reliant on dimension reduction techniques and are not scalable to ultra-large datasets. We previously described a multi-step algorithm, Iterative Clustering and Guide-gene Selection (ICGS), which applies intra-gene correlation and hybrid clustering to uniquely resolve novel transcriptionally coherent cell populations from an intuitive graphical user interface. Results We describe a new iteration of ICGS that outperforms state-of-the-art scRNA-Seq detection workflows when applied to well-established benchmarks. This approach combines multiple complementary subtype detection methods (HOPACH, sparse non-negative matrix factorization, cluster ‘fitness’, support vector machine) to resolve rare and common cell-states, while minimizing differences due to donor or batch effects. Using data from multiple cell atlases, we show that the PageRank algorithm effectively downsamples ultra-large scRNA-Seq datasets, without losing extremely rare or transcriptionally similar yet distinct cell types and while recovering novel transcriptionally distinct cell populations. We believe this new approach holds tremendous promise in reproducibly resolving hidden cell populations in complex datasets. Availability and implementation ICGS2 is implemented in Python. The source code and documentation are available at http://altanalyze.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Meenakshi Venkatasubramanian
- Department of Electrical Engineering and Computer Science, University of Cincinnati.,Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center
| | - Kashish Chetal
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center
| | - Daniel J Schnell
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center
| | - Gowtham Atluri
- Department of Electrical Engineering and Computer Science, University of Cincinnati
| | - Nathan Salomonis
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center.,Department of Biomedical Informatics, University of Cincinnati, Cincinnati, OH 45267, USA
| |
Collapse
|
28
|
Zhang Z, Cui F, Wang C, Zhao L, Zou Q. Goals and approaches for each processing step for single-cell RNA sequencing data. Brief Bioinform 2020; 22:6034054. [PMID: 33316046 DOI: 10.1093/bib/bbaa314] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/10/2020] [Accepted: 10/16/2020] [Indexed: 12/12/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at the cellular level. However, due to the extremely low levels of transcripts in a single cell and technical losses during reverse transcription, gene expression at a single-cell resolution is usually noisy and highly dimensional; thus, statistical analyses of single-cell data are a challenge. Although many scRNA-seq data analysis tools are currently available, a gold standard pipeline is not available for all datasets. Therefore, a general understanding of bioinformatics and associated computational issues would facilitate the selection of appropriate tools for a given set of data. In this review, we provide an overview of the goals and most popular computational analysis tools for the quality control, normalization, imputation, feature selection and dimension reduction of scRNA-seq data.
Collapse
Affiliation(s)
- Zilong Zhang
- University of Electronic Science and Technology of China
| | | | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology
| | - Lingling Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
| |
Collapse
|
29
|
Singh R. Single-Cell Sequencing in Human Genital Infections. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2020; 1255:203-220. [PMID: 32949402 DOI: 10.1007/978-981-15-4494-1_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
Human genital infections are one of the most concerning issues worldwide and can be categorized into sexually transmitted, urinary tract and vaginal infections. These infections, if left untreated, can disseminate to the other parts of the body and cause more complicated illnesses such as pelvic inflammatory disease, urethritis, and anogenital cancers. The effective treatment against these infections is further complicated by the emergence of antimicrobial resistance in the genital infection causing pathogens. Furthermore, the development and applications of single-cell sequencing technologies have open new possibilities to study the drug resistant clones, cell to cell variations, the discovery of acquired drug resistance mutations, transcriptional diversity of a pathogen across different infection stages, to identify rare cell types and investigate different cellular states of genital infection causing pathogens, and to develop novel therapeutical strategies. In this chapter, I will provide a complete review of the applications of single-cell sequencing in human genital infections before discussing their limitations and challenges.
Collapse
Affiliation(s)
- Reema Singh
- Department of Biochemistry, Microbiology and Immunology, College of Medicine, University of Saskatchewan, Saskatoon, SK, Canada. .,Vaccine and Infectious Disease Organization-International Vaccine Centre, Saskatoon, SK, Canada.
| |
Collapse
|
30
|
Hie B, Peters J, Nyquist SK, Shalek AK, Berger B, Bryson BD. Computational Methods for Single-Cell RNA Sequencing. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012220-100601] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions of cells across species and diseases. These data have spurred the development of hundreds of computational tools to derive novel biological insights. Here, we outline the components of scRNA-seq analytical pipelines and the computational methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, and identify opportunities for additional benchmarking studies and computational methods. As the biochemical approaches for single-cell omics advance, we propose coupled development of robust analytical pipelines suited for the challenges that new data present and principled selection of analytical methods that are suited for the biological questions to be addressed.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Joshua Peters
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| | - Sarah K. Nyquist
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Alex K. Shalek
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Institute for Medical Engineering & Science (IMES), and Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bryan D. Bryson
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
31
|
Turki T, Taguchi YH. SCGRNs: Novel supervised inference of single-cell gene regulatory networks of complex diseases. Comput Biol Med 2020; 118:103656. [PMID: 32174324 DOI: 10.1016/j.compbiomed.2020.103656] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2020] [Revised: 02/06/2020] [Accepted: 02/07/2020] [Indexed: 12/19/2022]
|
32
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 554] [Impact Index Per Article: 138.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
33
|
Elyanow R, Dumitrascu B, Engelhardt BE, Raphael BJ. netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. Genome Res 2020; 30:195-204. [PMID: 31992614 PMCID: PMC7050525 DOI: 10.1101/gr.251603.119] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 11/19/2019] [Indexed: 02/06/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene–gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene–gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.
Collapse
Affiliation(s)
- Rebecca Elyanow
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA.,Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA
| | - Barbara E Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.,Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| |
Collapse
|
34
|
Wan S, Kim J, Won KJ. SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection. Genome Res 2020; 30:205-213. [PMID: 31992615 PMCID: PMC7050522 DOI: 10.1101/gr.254557.119] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 01/23/2020] [Indexed: 01/01/2023]
Abstract
To process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm that is scalable to clustering 10 million cells. Comprehensive benchmarking tests on 17 public scRNA-seq data sets show that SHARP outperforms existing methods in terms of speed and accuracy. Particularly, for large-size data sets (more than 40,000 cells), SHARP runs faster than other competitors while maintaining high clustering accuracy and robustness. To the best of our knowledge, SHARP is the only R-based tool that is scalable to clustering scRNA-seq data with 10 million cells.
Collapse
Affiliation(s)
- Shibiao Wan
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Center for Applied Bioinformatics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Junil Kim
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Biotech Research and Innovation Centre (BRIC), University of Copenhagen, 2200 Copenhagen North, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology, DanStem, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen North, Denmark
| | - Kyoung Jae Won
- Institute for Diabetes, Obesity and Metabolism, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.,Biotech Research and Innovation Centre (BRIC), University of Copenhagen, 2200 Copenhagen North, Denmark.,Novo Nordisk Foundation Center for Stem Cell Biology, DanStem, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen North, Denmark
| |
Collapse
|
35
|
Tsuyuzaki K, Sato H, Sato K, Nikaido I. Benchmarking principal component analysis for large-scale single-cell RNA-sequencing. Genome Biol 2020; 21:9. [PMID: 31955711 PMCID: PMC6970290 DOI: 10.1186/s13059-019-1900-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 11/26/2019] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. RESULTS In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. CONCLUSION We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.
Collapse
Affiliation(s)
- Koki Tsuyuzaki
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Japan Science and Technology Agency, PRESTO, 5-3, Yonbancho, Chiyoda-ku, Tokyo, 102-8666 Japan
| | - Hiroyuki Sato
- Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Kenta Sato
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Itoshi Nikaido
- Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Wako, Saitama, 351-0198 Japan
- Bioinformatics Course, Master’s/Doctoral Program in Life Science Innovation (T-LSI), School of Integrative and Global Majors (SIGMA), University of Tsukuba, 1-1-1, Tennodai, Tsukuba, Ibaraki, 305-8577 Japan
| |
Collapse
|
36
|
Mou T, Deng W, Gu F, Pawitan Y, Vu TN. Reproducibility of Methods to Detect Differentially Expressed Genes from Single-Cell RNA Sequencing. Front Genet 2020; 10:1331. [PMID: 32010190 PMCID: PMC6979262 DOI: 10.3389/fgene.2019.01331] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Accepted: 12/05/2019] [Indexed: 12/31/2022] Open
Abstract
Detection of differentially expressed genes is a common task in single-cell RNA-seq (scRNA-seq) studies. Various methods based on both bulk-cell and single-cell approaches are in current use. Due to the unique distributional characteristics of single-cell data, it is important to compare these methods with rigorous statistical assessments. In this study, we assess the reproducibility of 9 tools for differential expression analysis in scRNA-seq data. These tools include four methods originally designed for scRNA-seq data, three popular methods originally developed for bulk-cell RNA-seq data but have been applied in scRNA-seq analysis, and two general statistical tests. Instead of comparing the performance across all genes, we compare the methods in terms of the rediscovery rates (RDRs) of top-ranked genes, separately for highly and lowly expressed genes. Three real and one simulated scRNA-seq data sets are used for the comparisons. The results indicate that some widely used methods, such as edgeR and monocle, have worse RDR performances compared to the other methods, especially for the top-ranked genes. For highly expressed genes, many bulk-cell–based methods can perform similarly to the methods designed for scRNA-seq data. But for the lowly expressed genes performance varies substantially; edgeR and monocle are too liberal and have poor control of false positives, while DESeq2 is too conservative and consequently loses sensitivity compared to the other methods. BPSC, Limma, DEsingle, MAST, t-test and Wilcoxon have similar performances in the real data sets. Overall, the scRNA-seq based method BPSC performs well against the other methods, particularly when there is a sufficient number of cells.
Collapse
Affiliation(s)
- Tian Mou
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Wenjiang Deng
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Fengyun Gu
- School of Mathematical Sciences, University College Cork, Cork, Ireland
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Trung Nghia Vu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
37
|
Abstract
Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.
Collapse
Affiliation(s)
- Dmitry Kobak
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.
| | - Philipp Berens
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany.
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
| |
Collapse
|
38
|
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM. Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities. AN INTERNATIONAL JOURNAL ON INFORMATION FUSION 2019; 50:71-91. [PMID: 30467459 PMCID: PMC6242341 DOI: 10.1016/j.inffus.2018.09.012] [Citation(s) in RCA: 215] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.
Collapse
Affiliation(s)
- Marinka Zitnik
- Department of Computer Science, Stanford University,
Stanford, CA, USA
| | - Francis Nguyen
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
| | - Bo Wang
- Hikvision Research Institute, Santa Clara, CA, USA
| | - Jure Leskovec
- Department of Computer Science, Stanford University,
Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anna Goldenberg
- Genetics & Genome Biology, SickKids Research Institute,
Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| | - Michael M. Hoffman
- Department of Medical Biophysics, University of Toronto,
Toronto, ON, Canada
- Princess Margaret Cancer Centre, Toronto, ON, Canada
- Department of Computer Science, University of Toronto,
Toronto, ON, Canada
- Vector Institute, Toronto, ON, Canada
| |
Collapse
|
39
|
Blencowe M, Arneson D, Ding J, Chen YW, Saleem Z, Yang X. Network modeling of single-cell omics data: challenges, opportunities, and progresses. Emerg Top Life Sci 2019; 3:379-398. [PMID: 32270049 PMCID: PMC7141415 DOI: 10.1042/etls20180176] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/07/2019] [Accepted: 06/24/2019] [Indexed: 01/07/2023]
Abstract
Single-cell multi-omics technologies are rapidly evolving, prompting both methodological advances and biological discoveries at an unprecedented speed. Gene regulatory network modeling has been used as a powerful approach to elucidate the complex molecular interactions underlying biological processes and systems, yet its application in single-cell omics data modeling has been met with unique challenges and opportunities. In this review, we discuss these challenges and opportunities, and offer an overview of the recent development of network modeling approaches designed to capture dynamic networks, within-cell networks, and cell-cell interaction or communication networks. Finally, we outline the remaining gaps in single-cell gene network modeling and the outlooks of the field moving forward.
Collapse
Affiliation(s)
- Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Jessica Ding
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Yen-Wei Chen
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Zara Saleem
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Molecular Toxicology Program, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, U.S.A
| |
Collapse
|
40
|
Zeng T, Dai H. Single-Cell RNA Sequencing-Based Computational Analysis to Describe Disease Heterogeneity. Front Genet 2019; 10:629. [PMID: 31354786 PMCID: PMC6640157 DOI: 10.3389/fgene.2019.00629] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 06/17/2019] [Indexed: 12/25/2022] Open
Abstract
The trillions of cells in the human body can be viewed as elementary but essential biological units that achieve different body states, but the low resolution of previous cell isolation and measurement approaches limits our understanding of the cell-specific molecular profiles. The recent establishment and rapid growth of single-cell sequencing technology has facilitated the identification of molecular profiles of heterogeneous cells, especially on the transcription level of single cells [single-cell RNA sequencing (scRNA-seq)]. As a novel method, the robustness of scRNA-seq under changing conditions will determine its practical potential in major research programs and clinical applications. In this review, we first briefly presented the scRNA-seq-related methods from the point of view of experiments and computation. Then, we compared several state-of-the-art scRNA-seq analysis frameworks mainly by analyzing their performance robustness on independent scRNA-seq datasets for the same complex disease. Finally, we elaborated on our hypothesis on consensus scRNA-seq analysis and summarized the potential indicative and predictive roles of individual cells in understanding disease heterogeneity by single-cell technologies.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | | |
Collapse
|
41
|
Hie B, Cho H, DeMeo B, Bryson B, Berger B. Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. Cell Syst 2019; 8:483-493.e7. [PMID: 31176620 PMCID: PMC6597305 DOI: 10.1016/j.cels.2019.05.003] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 02/12/2019] [Accepted: 05/07/2019] [Indexed: 12/21/2022]
Abstract
Large-scale single-cell RNA sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a dataset using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks, such as scRNA-seq data integration, while maintaining accuracy. We anticipate our algorithm will become an increasingly essential step when sharing and analyzing the rapidly growing volume of scRNA-seq data and help enable the democratization of single-cell omics.
Collapse
Affiliation(s)
- Brian Hie
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Benjamin DeMeo
- Department of Mathematics, MIT, Cambridge, MA 02139, USA; Department of Biomedical Informatics, Harvard University, Cambridge, MA 02138, USA
| | - Bryan Bryson
- Department of Biological Engineering, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| |
Collapse
|
42
|
Ren X, Zheng L, Zhang Z. SSCC: A Novel Computational Framework for Rapid and Accurate Clustering Large-scale Single Cell RNA-seq Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2019; 17:201-210. [PMID: 31202000 PMCID: PMC6624216 DOI: 10.1016/j.gpb.2018.10.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 09/27/2018] [Accepted: 10/18/2018] [Indexed: 11/03/2022]
Abstract
Clustering is a prevalent analytical means to analyze single cell RNA sequencing (scRNA-seq) data but the rapidly expanding data volume can make this process computationally challenging. New methods for both accurate and efficient clustering are of pressing need. Here we proposed Spearman subsampling-clustering-classification (SSCC), a new clustering framework based on random projection and feature construction, for large-scale scRNA-seq data. SSCC greatly improves clustering accuracy, robustness, and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, SSCC achieved 20% improvement for clustering accuracy and 50-fold acceleration, but only consumed 66% memory usage, compared to the widelyused software package SC3. Compared to k-means, the accuracy improvement of SSCC can reach 3-fold. An R implementation of SSCC is available at https://github.com/Japrin/sscClust.
Collapse
Affiliation(s)
- Xianwen Ren
- BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China.
| | - Liangtao Zheng
- BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China
| | - Zemin Zhang
- BIOPIC, Beijing Advanced Innovation Center for Genomics, and School of Life Sciences, Peking University, Beijing 100871, China.
| |
Collapse
|
43
|
Iacono G, Massoni-Badosa R, Heyn H. Single-cell transcriptomics unveils gene regulatory network plasticity. Genome Biol 2019; 20:110. [PMID: 31159854 PMCID: PMC6547541 DOI: 10.1186/s13059-019-1713-4] [Citation(s) in RCA: 123] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 05/08/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) plays a pivotal role in our understanding of cellular heterogeneity. Current analytical workflows are driven by categorizing principles that consider cells as individual entities and classify them into complex taxonomies. RESULTS We devise a conceptually different computational framework based on a holistic view, where single-cell datasets are used to infer global, large-scale regulatory networks. We develop correlation metrics that are specifically tailored to single-cell data, and then generate, validate, and interpret single-cell-derived regulatory networks from organs and perturbed systems, such as diabetes and Alzheimer's disease. Using tools from graph theory, we compute an unbiased quantification of a gene's biological relevance and accurately pinpoint key players in organ function and drivers of diseases. CONCLUSIONS Our approach detects multiple latent regulatory changes that are invisible to single-cell workflows based on clustering or differential expression analysis, significantly broadening the biological insights that can be obtained with this leading technology.
Collapse
Affiliation(s)
- Giovanni Iacono
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.
| | - Ramon Massoni-Badosa
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain
| | - Holger Heyn
- CNAG-CRG, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, 08028, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
44
|
Meyer G, González-Arnay E, Moll U, Nemajerova A, Tissir F, González-Gómez M. Cajal-Retzius neurons are required for the development of the human hippocampal fissure. J Anat 2019; 235:569-589. [PMID: 30861578 DOI: 10.1111/joa.12947] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2019] [Indexed: 01/14/2023] Open
Abstract
Cajal-Retzius neurons (CRN) are the main source of Reelin in the marginal zone of the developing neocortex and hippocampus (HC). They also express the transcription factor p73 and are complemented by later-appearing GABAergic Reelin+ interneurons. The human dorsal HC forms at gestational week 10 (GW10), when it develops a rudimentary Ammonic plate and incipient dentate migration, although the dorsal hippocampal fissure (HF) remains shallow and contains few CRN. The dorsal HC transforms into the indusium griseum (IG), concurrently with the rostro-caudal appearance of the corpus callosum, by GW14-17. Dorsal and ventral HC merge at the site of the former caudal hem, which is located at the level of the future atrium of the lateral ventricle and closely connected with the choroid plexus. The ventral HC forms at GW11 in the temporal lobe. The ventral HF is wide open at GW14-16 and densely populated by large numbers of CRNs. These are in intimate contact with the meninges and meningeal blood vessels, suggesting signalling through diverse pathways. At GW17, the fissure deepens and begins to fuse, although it is still marked by p73/Reelin+ CRNs. The p73KO mouse illustrates the importance of p73 in CRN for HF formation. In the mutant, Tbr1/Reelin+ CRNs are born in the hem but do not leave it and subsequently disappear, so that the mutant cortex and HC lack CRN from the onset of corticogenesis. The HF is absent, which leads to profound architectonic alterations of the HC. To determine which p73 isoform is important for HF formation, isoform-specific TAp73- and DeltaNp73-deficient embryonic and early postnatal mice were examined. In both mutants, the number of CRNs was reduced, but each of their phenotypes was much milder than in the global p73KO mutant missing both isoforms. In the TAp73KO mice, the HF of the dorsal HC failed to form, but was present in the ventral HC. In the DeltaNp73KO mice, the HC had a mild patterning defect along with a shorter HF. Complex interactions between both isoforms in CRNs may contribute to their crucial activity in the developing brain.
Collapse
Affiliation(s)
- Gundela Meyer
- Department of Basic Medical Sciences, University La Laguna, La Laguna, Spain
| | | | - Ute Moll
- Department of Pathology, Stony Brook University, Stony Brook, NY, USA
| | - Alice Nemajerova
- Department of Pathology, Stony Brook University, Stony Brook, NY, USA
| | - Fadel Tissir
- Developmental Neurobiology Group, Institute of NeuroScience, UCL Louvain, Brussels, Belgium
| | | |
Collapse
|
45
|
Abstract
Cellular heterogeneity within and across tumors has been a major obstacle in understanding and treating cancer, and the complex heterogeneity is masked if bulk tumor tissues are used for analysis. The advent of rapidly developing single-cell sequencing technologies, which include methods related to single-cell genome, epigenome, transcriptome, and multi-omics sequencing, have been applied to cancer research and led to exciting new findings in the fields of cancer evolution, metastasis, resistance to therapy, and tumor microenvironment. In this review, we discuss recent advances and limitations of these new technologies and their potential applications in cancer studies.
Collapse
Affiliation(s)
- Xianwen Ren
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China.
| | - Boxi Kang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China
| | - Zemin Zhang
- Beijing Advanced Innovation Centre for Genomics, Peking-Tsinghua Centre for Life Sciences, Biomedical Pioneering Innovation Center (BIOPIC), School of Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
46
|
AlJanahi AA, Danielsen M, Dunbar CE. An Introduction to the Analysis of Single-Cell RNA-Sequencing Data. Mol Ther Methods Clin Dev 2018; 10:189-196. [PMID: 30094294 PMCID: PMC6072887 DOI: 10.1016/j.omtm.2018.07.003] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The recent development of single-cell RNA sequencing has deepened our understanding of the cell as a functional unit, providing new insights based on gene expression profiles of hundreds to hundreds of thousands of individual cells, and revealing new populations of cells with distinct gene expression profiles previously hidden within analyses of gene expression performed on bulk cell populations. However, appropriate analysis and utilization of the massive amounts of data generated from single-cell RNA sequencing experiments are challenging and require an understanding of the experimental and computational pathways taken between preparation of input cells and output of interpretable data. In this review, we will discuss the basic principles of these new technologies, focusing on concepts important in the analysis of single-cell RNA-sequencing data. Specifically, we summarize approaches to quality-control measures for determination of which single cells to include for further examination, methods of data normalization and scaling to overcome the relatively inefficient capture rate of mRNA from each cell, and clustering and visualization algorithms used for dimensional reduction of the data to a two-dimensional plot.
Collapse
Affiliation(s)
- Aisha A. AlJanahi
- Translational Stem Cell Biology Branch, NHLBI, NIH, Bethesda, MD, USA
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, USA
| | - Mark Danielsen
- Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC, USA
| | - Cynthia E. Dunbar
- Translational Stem Cell Biology Branch, NHLBI, NIH, Bethesda, MD, USA
| |
Collapse
|