1
|
Xi NM, Li JJ. Exploring the optimization of autoencoder design for imputing single-cell RNA sequencing data. Comput Struct Biotechnol J 2023; 21:4079-4095. [PMID: 37671239 PMCID: PMC10475479 DOI: 10.1016/j.csbj.2023.07.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 07/22/2023] [Accepted: 07/31/2023] [Indexed: 09/07/2023] Open
Abstract
Autoencoders are the backbones of many imputation methods that aim to relieve the sparsity issue in single-cell RNA sequencing (scRNA-seq) data. The imputation performance of an autoencoder relies on both the neural network architecture and the hyperparameter choice. So far, literature in the single-cell field lacks a formal discussion on how to design the neural network and choose the hyperparameters. Here, we conducted an empirical study to answer this question. Our study used many real and simulated scRNA-seq datasets to examine the impacts of the neural network architecture, the activation function, and the regularization strategy on imputation accuracy and downstream analyses. Our results show that (i) deeper and narrower autoencoders generally lead to better imputation performance; (ii) the sigmoid and tanh activation functions consistently outperform other commonly used functions including ReLU; (iii) regularization improves the accuracy of imputation and downstream cell clustering and DE gene analyses. Notably, our results differ from common practices in the computer vision field regarding the activation function and the regularization strategy. Overall, our study offers practical guidance on how to optimize the autoencoder design for scRNA-seq data imputation.
Collapse
Affiliation(s)
- Nan Miles Xi
- Department of Mathematics and Statistics, Loyola University Chicago, Chicago, IL 60660, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, CA 90095-1554, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095-7088, USA
- Department of Computational Medicine, University of California, Los Angeles, CA 90095-1766, USA
- Department of Biostatistics, University of California, Los Angeles, CA 90095-1772, USA
| |
Collapse
|
2
|
Chen Y, Zhang XF, Ou-Yang L. Inferring cancer common and specific gene networks via multi-layer joint graphical model. Comput Struct Biotechnol J 2023; 21:974-990. [PMID: 36733706 PMCID: PMC9873583 DOI: 10.1016/j.csbj.2023.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 01/08/2023] [Accepted: 01/14/2023] [Indexed: 01/19/2023] Open
Abstract
Cancer is a complex disease caused primarily by genetic variants. Reconstructing gene networks within tumors is essential for understanding the functional regulatory mechanisms of carcinogenesis. Advances in high-throughput sequencing technologies have provided tremendous opportunities for inferring gene networks via computational approaches. However, due to the heterogeneity of the same cancer type and the similarities between different cancer types, it remains a challenge to systematically investigate the commonalities and specificities between gene networks of different cancer types, which is a crucial step towards precision cancer diagnosis and treatment. In this study, we propose a new sparse regularized multi-layer decomposition graphical model to jointly estimate the gene networks of multiple cancer types. Our model can handle various types of gene expression data and decomposes each cancer-type-specific network into three components, i.e., globally shared, partially shared and cancer-type-unique components. By identifying the globally and partially shared gene network components, our model can explore the heterogeneous similarities between different cancer types, and our identified cancer-type-unique components can help to reveal the regulatory mechanisms unique to each cancer type. Extensive experiments on synthetic data illustrate the effectiveness of our model in joint estimation of multiple gene networks. We also apply our model to two real data sets to infer the gene networks of multiple cancer subtypes or cell lines. By analyzing our estimated globally shared, partially shared, and cancer-type-unique components, we identified a number of important genes associated with common and specific regulatory mechanisms across different cancer types.
Collapse
Affiliation(s)
- Yuanxiao Chen
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, China
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), Shenzhen University, Shenzhen, China,Corresponding author.
| |
Collapse
|
3
|
Walle T, Kraske JA, Liao B, Lenoir B, Timke C, von Bohlen und Halbach E, Tran F, Griebel P, Albrecht D, Ahmed A, Suarez-Carmona M, Jiménez-Sánchez A, Beikert T, Tietz-Dahlfuß A, Menevse AN, Schmidt G, Brom M, Pahl JHW, Antonopoulos W, Miller M, Perez RL, Bestvater F, Giese NA, Beckhove P, Rosenstiel P, Jäger D, Strobel O, Pe’er D, Halama N, Debus J, Cerwenka A, Huber PE. Radiotherapy orchestrates natural killer cell dependent antitumor immune responses through CXCL8. SCIENCE ADVANCES 2022; 8:eabh4050. [PMID: 35319989 PMCID: PMC8942354 DOI: 10.1126/sciadv.abh4050] [Citation(s) in RCA: 60] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 01/31/2022] [Indexed: 05/17/2023]
Abstract
Radiotherapy is a mainstay cancer therapy whose antitumor effects partially depend on T cell responses. However, the role of Natural Killer (NK) cells in radiotherapy remains unclear. Here, using a reverse translational approach, we show a central role of NK cells in the radiation-induced immune response involving a CXCL8/IL-8-dependent mechanism. In a randomized controlled pancreatic cancer trial, CXCL8 increased under radiotherapy, and NK cell positively correlated with prolonged overall survival. Accordingly, NK cells preferentially infiltrated irradiated pancreatic tumors and exhibited CD56dim-like cytotoxic transcriptomic states. In experimental models, NF-κB and mTOR orchestrated radiation-induced CXCL8 secretion from tumor cells with senescence features causing directional migration of CD56dim NK cells, thus linking senescence-associated CXCL8 release to innate immune surveillance of human tumors. Moreover, combined high-dose radiotherapy and adoptive NK cell transfer improved tumor control over monotherapies in xenografted mice, suggesting NK cells combined with radiotherapy as a rational cancer treatment strategy.
Collapse
Affiliation(s)
- Thomas Walle
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Department of Medical Oncology, University Hospital Heidelberg, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Department of Immunobiochemistry and MI3, Mannheim Institute for Innate Immunoscience, Heidelberg University, Medical Faculty Mannheim, Mannheim, Germany
- Corresponding author. (T.W.); (P.E.H.)
| | - Joscha A. Kraske
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
| | - Boyu Liao
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
| | - Bénédicte Lenoir
- Clinical Cooperation Unit Applied Tumor Immunity, German Cancer Research Center, Heidelberg, Germany
- Department of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Carmen Timke
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiation Oncology, St. Franziskus Hospital, Flensburg, Germany
| | - Emilia von Bohlen und Halbach
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Department of Medical Oncology, University Hospital Heidelberg, Heidelberg, Germany
- Clinical Cooperation Unit Applied Tumor Immunity, German Cancer Research Center, Heidelberg, Germany
- Department of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Florian Tran
- Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Paul Griebel
- Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Dorothee Albrecht
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Azaz Ahmed
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Clinical Cooperation Unit Applied Tumor Immunity, German Cancer Research Center, Heidelberg, Germany
- Department of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Meggy Suarez-Carmona
- Clinical Cooperation Unit Applied Tumor Immunity, German Cancer Research Center, Heidelberg, Germany
- Department of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alejandro Jiménez-Sánchez
- Program for Computational and Systems Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tizian Beikert
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
| | - Alexandra Tietz-Dahlfuß
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Ayse Nur Menevse
- Leibniz Institute for Immunotherapy, Division of Interventional Immunology, Regensburg, Germany
| | - Gabriele Schmidt
- Core Facility Light Microscopy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Manuela Brom
- Core Facility Light Microscopy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Jens H. W. Pahl
- Department of Immunobiochemistry and MI3, Mannheim Institute for Innate Immunoscience, Heidelberg University, Medical Faculty Mannheim, Mannheim, Germany
| | | | - Matthias Miller
- Department of Immunobiochemistry and MI3, Mannheim Institute for Innate Immunoscience, Heidelberg University, Medical Faculty Mannheim, Mannheim, Germany
| | - Ramon Lopez Perez
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
| | - Felix Bestvater
- Core Facility Light Microscopy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Nathalia A. Giese
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
| | - Philipp Beckhove
- Leibniz Institute for Immunotherapy, Division of Interventional Immunology, Regensburg, Germany
| | - Philip Rosenstiel
- Institute of Clinical Molecular Biology, Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Dirk Jäger
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Department of Medical Oncology, University Hospital Heidelberg, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Clinical Cooperation Unit Applied Tumor Immunity, German Cancer Research Center, Heidelberg, Germany
| | - Oliver Strobel
- Department of General, Visceral and Transplantation Surgery, University Hospital Heidelberg, Heidelberg, Germany
| | - Dana Pe’er
- Parker Institute for Cancer Immunotherapy, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Program for Computational and Systems Biology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Niels Halama
- Department of Medical Oncology, National Center for Tumor Diseases (NCT), Heidelberg, Germany
- Department of Medical Oncology, University Hospital Heidelberg, Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Department of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Helmholtz Center for Translational Oncology (HITRON), Mainz, Germany
- Institute of Immunology, Heidelberg University Hospital, Heidelberg, Germany
| | - Jürgen Debus
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
- Heidelberg Ion Therapy Center (HIT), Heidelberg, Germany
- Heidelberg Institute for Radiation Oncology (HIRO), Heidelberg, Germany
| | - Adelheid Cerwenka
- Department of Immunobiochemistry and MI3, Mannheim Institute for Innate Immunoscience, Heidelberg University, Medical Faculty Mannheim, Mannheim, Germany
| | - Peter E. Huber
- Department of Molecular and Radiooncology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Radiooncology and Radiotherapy, University Hospital Heidelberg, Heidelberg, Germany
- Heidelberg Institute for Radiation Oncology (HIRO), Heidelberg, Germany
- Corresponding author. (T.W.); (P.E.H.)
| |
Collapse
|
4
|
A novel method for single-cell data imputation using subspace regression. Sci Rep 2022; 12:2697. [PMID: 35177662 PMCID: PMC8854597 DOI: 10.1038/s41598-022-06500-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 01/27/2022] [Indexed: 12/13/2022] Open
Abstract
Recent advances in biochemistry and single-cell RNA sequencing (scRNA-seq) have allowed us to monitor the biological systems at the single-cell resolution. However, the low capture of mRNA material within individual cells often leads to inaccurate quantification of genetic material. Consequently, a significant amount of expression values are reported as missing, which are often referred to as dropouts. To overcome this challenge, we develop a novel imputation method, named single-cell Imputation via Subspace Regression (scISR), that can reliably recover the dropout values of scRNA-seq data. The scISR method first uses a hypothesis-testing technique to identify zero-valued entries that are most likely affected by dropout events and then estimates the dropout values using a subspace regression model. Our comprehensive evaluation using 25 publicly available scRNA-seq datasets and various simulation scenarios against five state-of-the-art methods demonstrates that scISR is better than other imputation methods in recovering scRNA-seq expression profiles via imputation. scISR consistently improves the quality of cluster analysis regardless of dropout rates, normalization techniques, and quantification schemes. The source code of scISR can be found on GitHub at https://github.com/duct317/scISR.
Collapse
|
5
|
Patruno L, Maspero D, Craighero F, Angaroni F, Antoniotti M, Graudenzi A. A review of computational strategies for denoising and imputation of single-cell transcriptomic data. Brief Bioinform 2021; 22:bbaa222. [PMID: 33003202 DOI: 10.1093/bib/bbaa222] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 08/07/2020] [Accepted: 08/19/2020] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION The advancements of single-cell sequencing methods have paved the way for the characterization of cellular states at unprecedented resolution, revolutionizing the investigation on complex biological systems. Yet, single-cell sequencing experiments are hindered by several technical issues, which cause output data to be noisy, impacting the reliability of downstream analyses. Therefore, a growing number of data science methods has been proposed to recover lost or corrupted information from single-cell sequencing data. To date, however, no quantitative benchmarks have been proposed to evaluate such methods. RESULTS We present a comprehensive analysis of the state-of-the-art computational approaches for denoising and imputation of single-cell transcriptomic data, comparing their performance in different experimental scenarios. In detail, we compared 19 denoising and imputation methods, on both simulated and real-world datasets, with respect to several performance metrics related to imputation of dropout events, recovery of true expression profiles, characterization of cell similarity, identification of differentially expressed genes and computation time. The effectiveness and scalability of all methods were assessed with regard to distinct sequencing protocols, sample size and different levels of biological variability and technical noise. As a result, we identify a subset of versatile approaches exhibiting solid performances on most tests and show that certain algorithmic families prove effective on specific tasks but inefficient on others. Finally, most methods appear to benefit from the introduction of appropriate assumptions on noise distribution of biological processes.
Collapse
Affiliation(s)
- Lucrezia Patruno
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Davide Maspero
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Francesco Craighero
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Fabrizio Angaroni
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Marco Antoniotti
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| | - Alex Graudenzi
- Department of Informatics, Systems and Communication of the University of Milan-Bicocca
| |
Collapse
|
6
|
scLink: Inferring Sparse Gene Co-expression Networks from Single-cell Expression Data. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:475-492. [PMID: 34252628 PMCID: PMC8896229 DOI: 10.1016/j.gpb.2020.11.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/23/2020] [Accepted: 12/26/2020] [Indexed: 11/23/2022]
Abstract
A system-level understanding of the regulation and coordination mechanisms of gene expression is essential for studying the complexity of biological processes in health and disease. With the rapid development of single-cell RNA sequencing technologies, it is now possible to investigate gene interactions in a cell type-specific manner. Here we propose the scLink method, which uses statistical network modeling to understand the co-expression relationships among genes and construct sparse gene co-expression networks from single-cell gene expression data. We use both simulation and real data studies to demonstrate the advantages of scLink and its ability to improve single-cell gene network analysis. The scLink R package is available at https://github.com/Vivianstats/scLink.
Collapse
|
7
|
Liu J, Fan Z, Zhao W, Zhou X. Machine Intelligence in Single-Cell Data Analysis: Advances and New Challenges. Front Genet 2021; 12:655536. [PMID: 34135939 PMCID: PMC8203333 DOI: 10.3389/fgene.2021.655536] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Accepted: 04/26/2021] [Indexed: 12/18/2022] Open
Abstract
The rapid development of single-cell technologies allows for dissecting cellular heterogeneity at different omics layers with an unprecedented resolution. In-dep analysis of cellular heterogeneity will boost our understanding of complex biological systems or processes, including cancer, immune system and chronic diseases, thereby providing valuable insights for clinical and translational research. In this review, we will focus on the application of machine learning methods in single-cell multi-omics data analysis. We will start with the pre-processing of single-cell RNA sequencing (scRNA-seq) data, including data imputation, cross-platform batch effect removal, and cell cycle and cell-type identification. Next, we will introduce advanced data analysis tools and methods used for copy number variance estimate, single-cell pseudo-time trajectory analysis, phylogenetic tree inference, cell-cell interaction, regulatory network inference, and integrated analysis of scRNA-seq and spatial transcriptome data. Finally, we will present the latest analyzing challenges, such as multi-omics integration and integrated analysis of scRNA-seq data.
Collapse
Affiliation(s)
- Jiajia Liu
- College of Electronic and Information Engineering, Tongji University, Shanghai, China
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| | - Zhiwei Fan
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
- West China School of Public Health, West China Fourth Hospital, Sichuan University, Chengdu, China
| | - Weiling Zhao
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| | - Xiaobo Zhou
- School of Biomedical Informatics, The University of Texas Health Science Centre at Houston, Houston, TX, United States
| |
Collapse
|
8
|
Gayoso A, Steier Z, Lopez R, Regier J, Nazor KL, Streets A, Yosef N. Joint probabilistic modeling of single-cell multi-omic data with totalVI. Nat Methods 2021; 18:272-282. [PMID: 33589839 PMCID: PMC7954949 DOI: 10.1038/s41592-020-01050-x] [Citation(s) in RCA: 172] [Impact Index Per Article: 57.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 12/07/2020] [Accepted: 12/18/2020] [Indexed: 01/30/2023]
Abstract
The paired measurement of RNA and surface proteins in single cells with cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) is a promising approach to connect transcriptional variation with cell phenotypes and functions. However, combining these paired views into a unified representation of cell state is made challenging by the unique technical characteristics of each measurement. Here we present Total Variational Inference (totalVI; https://scvi-tools.org ), a framework for end-to-end joint analysis of CITE-seq data that probabilistically represents the data as a composite of biological and technical factors, including protein background and batch effects. To evaluate totalVI's performance, we profiled immune cells from murine spleen and lymph nodes with CITE-seq, measuring over 100 surface proteins. We demonstrate that totalVI provides a cohesive solution for common analysis tasks such as dimensionality reduction, the integration of datasets with different measured proteins, estimation of correlations between molecules and differential expression testing.
Collapse
Affiliation(s)
- Adam Gayoso
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Zoë Steier
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Romain Lopez
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA
| | - Jeffrey Regier
- Department of Statistics, University of Michigan, Ann Arbor, Ann Arbor, MI, USA
| | | | - Aaron Streets
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
| | - Nir Yosef
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, USA.
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA, USA.
- Chan Zuckerberg Biohub, San Francisco, CA, USA.
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
9
|
Yu X, Abbas-Aghababazadeh F, Chen YA, Fridley BL. Statistical and Bioinformatics Analysis of Data from Bulk and Single-Cell RNA Sequencing Experiments. Methods Mol Biol 2021; 2194:143-175. [PMID: 32926366 PMCID: PMC7771369 DOI: 10.1007/978-1-0716-0849-4_9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
High-throughput sequencing (HTS) has revolutionized researchers' ability to study the human transcriptome, particularly as it relates to cancer. Recently, HTS technology has advanced to the point where now one is able to sequence individual cells (i.e., "single-cell sequencing"). Prior to single-cell sequencing technology, HTS would be completed on RNA extracted from a tissue sample consisting of multiple cell types (i.e., "bulk sequencing"). In this chapter, we review the various bioinformatics and statistical methods used in the processing, quality control, and analysis of bulk and single-cell RNA sequencing methods. Additionally, we discuss how these methods are also being used to study tumor heterogeneity.
Collapse
Affiliation(s)
- Xiaoqing Yu
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Farnoosh Abbas-Aghababazadeh
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Y Ann Chen
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Brooke L Fridley
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
10
|
Putative cell type discovery from single-cell gene expression data. Nat Methods 2020; 17:621-628. [DOI: 10.1038/s41592-020-0825-9] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Accepted: 04/02/2020] [Indexed: 12/15/2022]
|
11
|
Schwartz GW, Zhou Y, Petrovic J, Fasolino M, Xu L, Shaffer SM, Pear WS, Vahedi G, Faryabi RB. TooManyCells identifies and visualizes relationships of single-cell clades. Nat Methods 2020; 17:405-413. [PMID: 32123397 PMCID: PMC7439807 DOI: 10.1038/s41592-020-0748-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/15/2020] [Indexed: 01/24/2023]
Abstract
Identifying and visualizing transcriptionally similar cells is instrumental for accurate exploration of the cellular diversity revealed by single-cell transcriptomics. However, widely used clustering and visualization algorithms produce a fixed number of cell clusters. A fixed clustering 'resolution' hampers our ability to identify and visualize echelons of cell states. We developed TooManyCells, a suite of graph-based algorithms for efficient and unbiased identification and visualization of cell clades. TooManyCells introduces a visualization model built on a concept intentionally orthogonal to dimensionality-reduction methods. TooManyCells is also equipped with an efficient matrix-free divisive hierarchical spectral clustering different from prevalent single-resolution clustering methods. TooManyCells enables multiresolution and multifaceted exploration of single-cell clades. An advantage of this paradigm is the immediate detection of rare and common populations that outperforms popular clustering and visualization algorithms, as demonstrated using existing single-cell transcriptomic data sets and new data modeling drug-resistance acquisition in leukemic T cells.
Collapse
Affiliation(s)
- Gregory W Schwartz
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yeqiao Zhou
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jelena Petrovic
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Maria Fasolino
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Lanwei Xu
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Sydney M Shaffer
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Warren S Pear
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Golnaz Vahedi
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Robert B Faryabi
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Abramson Family Cancer Research Institute Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
12
|
Lähnemann D, Köster J, Szczurek E, McCarthy DJ, Hicks SC, Robinson MD, Vallejos CA, Campbell KR, Beerenwinkel N, Mahfouz A, Pinello L, Skums P, Stamatakis A, Attolini CSO, Aparicio S, Baaijens J, Balvert M, Barbanson BD, Cappuccio A, Corleone G, Dutilh BE, Florescu M, Guryev V, Holmer R, Jahn K, Lobo TJ, Keizer EM, Khatri I, Kielbasa SM, Korbel JO, Kozlov AM, Kuo TH, Lelieveldt BP, Mandoiu II, Marioni JC, Marschall T, Mölder F, Niknejad A, Rączkowska A, Reinders M, Ridder JD, Saliba AE, Somarakis A, Stegle O, Theis FJ, Yang H, Zelikovsky A, McHardy AC, Raphael BJ, Shah SP, Schönhuth A. Eleven grand challenges in single-cell data science. Genome Biol 2020; 21:31. [PMID: 32033589 PMCID: PMC7007675 DOI: 10.1186/s13059-020-1926-6] [Citation(s) in RCA: 554] [Impact Index Per Article: 138.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Accepted: 01/02/2020] [Indexed: 02/08/2023] Open
Abstract
The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Collapse
Affiliation(s)
- David Lähnemann
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf, Germany
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA
| | - Ewa Szczurek
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Davis J. McCarthy
- Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia
- Melbourne Integrative Genomics, School of BioSciences–School of Mathematics & Statistics, Faculty of Science, University of Melbourne, Melbourne, Australia
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Mark D. Robinson
- Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zürich, Zürich, Switzerland
| | - Catalina A. Vallejos
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, UK
- The Alan Turing Institute, British Library, London, UK
| | - Kieran R. Campbell
- Department of Statistics, University of British Columbia, Vancouver, Canada
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Data Science Institute, University of British Columbia, Vancouver, Canada
| | - Niko Beerenwinkel
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmed Mahfouz
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Luca Pinello
- Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA
- Department of Pathology, Harvard Medical School, Boston, USA
- Broad Institute of Harvard and MIT, Cambridge, MA USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | | | - Samuel Aparicio
- Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
| | - Jasmijn Baaijens
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
| | - Marleen Balvert
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| | - Buys de Barbanson
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Antonio Cappuccio
- Institute for Advanced Study, University of Amsterdam, Amsterdam, The Netherlands
| | - Giacomo Corleone
- Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, London, UK
| | - Bas E. Dutilh
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Maria Florescu
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
- Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Rens Holmer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Katharina Jahn
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Thamar Jessurun Lobo
- European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Emma M. Keizer
- Biometris, Wageningen University & Research, Wageningen, The Netherlands
| | - Indu Khatri
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Szymon M. Kielbasa
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Alexey M. Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Boudewijn P.F. Lelieveldt
- PRB lab, Delft University of Technology, Delft, The Netherlands
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Ion I. Mandoiu
- Computer Science & Engineering Department, University of Connecticut, Storrs, USA
| | - John C. Marioni
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Felix Mölder
- Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
- Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Essen, Germany
| | - Amir Niknejad
- Computation molecular design, Zuse Institute Berlin, Berlin, Germany
- Mathematics Department, Mount Saint Vincent, New York, USA
| | - Alicja Rączkowska
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warszawa, Poland
| | - Marcel Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands
| | - Jeroen de Ridder
- Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Antoine-Emmanuel Saliba
- Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany
| | - Antonios Somarakis
- Division of Image Processing, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Oliver Stegle
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center–DKFZ, Heidelberg, Germany
| | - Fabian J. Theis
- Institute of Computational Biology, Helmholtz Zentrum München–German Research Center for Environmental Health, Neuherberg, Germany
| | - Huan Yang
- Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research–LACDR–Leiden University, Leiden, The Netherlands
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
| | - Alice C. McHardy
- Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Sohrab P. Shah
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
| | - Alexander Schönhuth
- Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
13
|
Elyanow R, Dumitrascu B, Engelhardt BE, Raphael BJ. netNMF-sc: leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. Genome Res 2020; 30:195-204. [PMID: 31992614 PMCID: PMC7050525 DOI: 10.1101/gr.251603.119] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 11/19/2019] [Indexed: 02/06/2023]
Abstract
Single-cell RNA-sequencing (scRNA-seq) enables high-throughput measurement of RNA expression in single cells. However, because of technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells in a lower-dimensional space, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc learns a low-dimensional representation of scRNA-seq transcript counts using network-regularized non-negative matrix factorization. The network regularization takes advantage of prior knowledge of gene–gene interactions, encouraging pairs of genes with known interactions to be nearby each other in the low-dimensional representation. The resulting matrix factorization imputes gene abundance for both zero and nonzero counts and can be used to cluster cells into meaningful subpopulations. We show that netNMF-sc outperforms existing methods at clustering cells and estimating gene–gene covariance using both simulated and real scRNA-seq data, with increasing advantages at higher dropout rates (e.g., >60%). We also show that the results from netNMF-sc are robust to variation in the input network, with more representative networks leading to greater performance gains.
Collapse
Affiliation(s)
- Rebecca Elyanow
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island 02912, USA.,Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Bianca Dumitrascu
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA
| | - Barbara E Engelhardt
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA.,Center for Statistics and Machine Learning, Princeton University, Princeton, New Jersey 08540, USA
| | - Benjamin J Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| |
Collapse
|
14
|
|
15
|
Davis-Marcisak EF, Sherman TD, Orugunta P, Stein-O'Brien GL, Puram SV, Roussos Torres ET, Hopkins AC, Jaffee EM, Favorov AV, Afsari B, Goff LA, Fertig EJ. Differential Variation Analysis Enables Detection of Tumor Heterogeneity Using Single-Cell RNA-Sequencing Data. Cancer Res 2019; 79:5102-5112. [PMID: 31337651 PMCID: PMC6844448 DOI: 10.1158/0008-5472.can-18-3882] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 05/13/2019] [Accepted: 07/19/2019] [Indexed: 12/20/2022]
Abstract
Tumor heterogeneity provides a complex challenge to cancer treatment and is a critical component of therapeutic response, disease recurrence, and patient survival. Single-cell RNA-sequencing (scRNA-seq) technologies have revealed the prevalence of intratumor and intertumor heterogeneity. Computational techniques are essential to quantify the differences in variation of these profiles between distinct cell types, tumor subtypes, and patients to fully characterize intratumor and intertumor molecular heterogeneity. In this study, we adapted our algorithm for pathway dysregulation, Expression Variation Analysis (EVA), to perform multivariate statistical analyses of differential variation of expression in gene sets for scRNA-seq. EVA has high sensitivity and specificity to detect pathways with true differential heterogeneity in simulated data. EVA was applied to several public domain scRNA-seq tumor datasets to quantify the landscape of tumor heterogeneity in several key applications in cancer genomics such as immunogenicity, metastasis, and cancer subtypes. Immune pathway heterogeneity of hematopoietic cell populations in breast tumors corresponded to the amount of diversity present in the T-cell repertoire of each individual. Cells from head and neck squamous cell carcinoma (HNSCC) primary tumors had significantly more heterogeneity across pathways than cells from metastases, consistent with a model of clonal outgrowth. Moreover, there were dramatic differences in pathway dysregulation across HNSCC basal primary tumors. Within the basal primary tumors, there was increased immune dysregulation in individuals with a high proportion of fibroblasts present in the tumor microenvironment. These results demonstrate the broad utility of EVA to quantify intertumor and intratumor heterogeneity from scRNA-seq data without reliance on low-dimensional visualization. SIGNIFICANCE: This study presents a robust statistical algorithm for evaluating gene expression heterogeneity within pathways or gene sets in single-cell RNA-seq data.
Collapse
Affiliation(s)
- Emily F Davis-Marcisak
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Thomas D Sherman
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Pranay Orugunta
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Genevieve L Stein-O'Brien
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Sidharth V Puram
- Department of Otolaryngology-Head and Neck Surgery, Washington University School of Medicine, St. Louis, Missouri
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri
| | - Evanthia T Roussos Torres
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Alexander C Hopkins
- Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, Michigan
| | - Elizabeth M Jaffee
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
- Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russia
| | - Bahman Afsari
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Loyal A Goff
- McKusick-Nathans Institute of the Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, Maryland
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, Maryland.
- Department of Applied Mathematics and Statistics, Johns Hopkins University Whiting School of Engineering, Baltimore, Maryland
- Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
16
|
Abstract
Precision medicine is emerging as a cornerstone of future cancer care with the objective of providing targeted therapies based on the molecular phenotype of each individual patient. Traditional bulk-level molecular phenotyping of tumours leads to significant information loss, as the molecular profile represents an average phenotype over large numbers of cells, while cancer is a disease with inherent intra-tumour heterogeneity at the cellular level caused by several factors, including clonal evolution, tissue hierarchies, rare cells and dynamic cell states. Single-cell sequencing provides means to characterize heterogeneity in a large population of cells and opens up opportunity to determine key molecular properties that influence clinical outcomes, including prognosis and probability of treatment response. Single-cell sequencing methods are now reliable enough to be used in many research laboratories, and we are starting to see applications of these technologies for characterization of human primary cancer cells. In this review, we provide an overview of studies that have applied single-cell sequencing to characterize human cancers at the single-cell level, and we discuss some of the current challenges in the field.
Collapse
Affiliation(s)
- Mattias Rantalainen
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels Vag 12A, Stockholm, Sweden
| |
Collapse
|
17
|
Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun 2019; 10:390. [PMID: 30674886 PMCID: PMC6344535 DOI: 10.1038/s41467-018-07931-2] [Citation(s) in RCA: 437] [Impact Index Per Article: 87.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 11/22/2018] [Indexed: 11/16/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has enabled researchers to study gene expression at a cellular resolution. However, noise due to amplification and dropout may obstruct analyses, so scalable denoising methods for increasingly large but sparse scRNA-seq data are needed. We propose a deep count autoencoder network (DCA) to denoise scRNA-seq datasets. DCA takes the count distribution, overdispersion and sparsity of the data into account using a negative binomial noise model with or without zero-inflation, and nonlinear gene-gene dependencies are captured. Our method scales linearly with the number of cells and can, therefore, be applied to datasets of millions of cells. We demonstrate that DCA denoising improves a diverse set of typical scRNA-seq data analyses using simulated and real datasets. DCA outperforms existing methods for data imputation in quality and speed, enhancing biological discovery. Single-cell RNA sequencing is a powerful method to study gene expression, but noise in the data can obstruct analysis. Here the authors develop a denoising method based on a deep count autoencoder network that scales linearly with the number of cells, and therefore is compatible with large data sets.
Collapse
|
18
|
Crow M, Gillis J. Co-expression in Single-Cell Analysis: Saving Grace or Original Sin? Trends Genet 2018; 34:823-831. [PMID: 30146183 PMCID: PMC6195469 DOI: 10.1016/j.tig.2018.07.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 07/05/2018] [Accepted: 07/25/2018] [Indexed: 01/04/2023]
Abstract
As a fundamental unit of life, the cell has rightfully been the subject of intense investigation throughout the history of biology. Technical innovations now make it possible to assay cellular features at genomic scale, yielding breakthroughs in our understanding of the molecular organization of tissues, and even whole organisms. As these data accumulate we will soon be faced with a new challenge: making sense of the plethora of results. Early investigations into the replicability of cell type profiles inferred from single-cell RNA sequencing data have indicated that this is likely to be surprisingly straightforward due to consistent gene co-expression. In this opinion article we discuss the evidence for this claim and its implications for interpreting cell type-specific gene expression.
Collapse
Affiliation(s)
- Megan Crow
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA.
| |
Collapse
|
19
|
Liu Q, Herring CA, Sheng Q, Ping J, Simmons AJ, Chen B, Banerjee A, Li W, Gu G, Coffey RJ, Shyr Y, Lau KS. Quantitative assessment of cell population diversity in single-cell landscapes. PLoS Biol 2018; 16:e2006687. [PMID: 30346945 PMCID: PMC6211764 DOI: 10.1371/journal.pbio.2006687] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Revised: 11/01/2018] [Accepted: 10/01/2018] [Indexed: 12/11/2022] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) has become a powerful tool for the systematic investigation of cellular diversity. As a number of computational tools have been developed to identify and visualize cell populations within a single scRNA-seq dataset, there is a need for methods to quantitatively and statistically define proportional shifts in cell population structures across datasets, such as expansion or shrinkage or emergence or disappearance of cell populations. Here we present sc-UniFrac, a framework to statistically quantify compositional diversity in cell populations between single-cell transcriptome landscapes. sc-UniFrac enables sensitive and robust quantification in simulated and experimental datasets in terms of both population identity and quantity. We have demonstrated the utility of sc-UniFrac in multiple applications, including assessment of biological and technical replicates, classification of tissue phenotypes and regional specification, identification and definition of altered cell infiltrates in tumorigenesis, and benchmarking batch-correction tools. sc-UniFrac provides a framework for quantifying diversity or alterations in cell populations across conditions and has broad utility for gaining insight into tissue-level perturbations at the single-cell resolution.
Collapse
Affiliation(s)
- Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Charles A. Herring
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Quanhu Sheng
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jie Ping
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Alan J. Simmons
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Bob Chen
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Amrita Banerjee
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Wei Li
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Guoqiang Gu
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Robert J. Coffey
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Veterans Affairs Medical Center, Tennessee Valley Healthcare System, Nashville, Tennessee, United States of America
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Ken S. Lau
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| |
Collapse
|
20
|
Hon CC, Shin JW, Carninci P, Stubbington MJT. The Human Cell Atlas: Technical approaches and challenges. Brief Funct Genomics 2018; 17:283-294. [PMID: 29092000 PMCID: PMC6063304 DOI: 10.1093/bfgp/elx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The Human Cell Atlas is a large, international consortium that aims to identify and describe every cell type in the human body. The comprehensive cellular maps that arise from this ambitious effort have the potential to transform many aspects of fundamental biology and clinical practice. Here, we discuss the technical approaches that could be used today to generate such a resource and also the technical challenges that will be encountered.
Collapse
Affiliation(s)
- Chung-Chau Hon
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Jay W Shin
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | - Piero Carninci
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa, Japan
| | | |
Collapse
|
21
|
Abstract
Single-cell RNA sequencing (scRNA-seq) is currently transforming our understanding of biology, as it is a powerful tool to resolve cellular heterogeneity and molecular networks. Over 50 protocols have been developed in recent years and also data processing and analyzes tools are evolving fast. Here, we review the basic principles underlying the different experimental protocols and how to benchmark them. We also review and compare the essential methods to process scRNA-seq data from mapping, filtering, normalization and batch corrections to basic differential expression analysis. We hope that this helps to choose appropriate experimental and computational methods for the research question at hand.
Collapse
Affiliation(s)
- Christoph Ziegenhain
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Beate Vieth
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Swati Parekh
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Ines Hellmann
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| | - Wolfgang Enard
- Anthropology and Human Genomics, Department of Biology II, Ludwig-Maximilians University, Großhaderner Str. 2, Martinsried, Germany
| |
Collapse
|
22
|
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 2018; 19:232. [PMID: 29914350 PMCID: PMC6006753 DOI: 10.1186/s12859-018-2217-z] [Citation(s) in RCA: 119] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Accepted: 05/24/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fundamental fact in biology states that genes do not operate in isolation, and yet, methods that infer regulatory networks for single cell gene expression data have been slow to emerge. With single cell sequencing methods now becoming accessible, general network inference algorithms that were initially developed for data collected from bulk samples may not be suitable for single cells. Meanwhile, although methods that are specific for single cell data are now emerging, whether they have improved performance over general methods is unknown. In this study, we evaluate the applicability of five general methods and three single cell methods for inferring gene regulatory networks from both experimental single cell gene expression data and in silico simulated data. RESULTS Standard evaluation metrics using ROC curves and Precision-Recall curves against reference sets sourced from the literature demonstrated that most of the methods performed poorly when they were applied to either experimental single cell data, or simulated single cell data, which demonstrates their lack of performance for this task. Using default settings, network methods were applied to the same datasets. Comparisons of the learned networks highlighted the uniqueness of some predicted edges for each method. The fact that different methods infer networks that vary substantially reflects the underlying mathematical rationale and assumptions that distinguish network methods from each other. CONCLUSIONS This study provides a comprehensive evaluation of network modeling algorithms applied to experimental single cell gene expression data and in silico simulated datasets where the network structure is known. Comparisons demonstrate that most of these assessed network methods are not able to predict network structures from single cell expression data accurately, even if they are specifically developed for single cell methods. Also, single cell methods, which usually depend on more elaborative algorithms, in general have less similarity to each other in the sets of edges detected. The results from this study emphasize the importance for developing more accurate optimized network modeling methods that are compatible for single cell data. Newly-developed single cell methods may uniquely capture particular features of potential gene-gene relationships, and caution should be taken when we interpret these results.
Collapse
Affiliation(s)
- Shuonan Chen
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA. .,Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA. .,Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.
| |
Collapse
|
23
|
Gong W, Kwak IY, Pota P, Koyano-Nakagawa N, Garry DJ. DrImpute: imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 2018; 19:220. [PMID: 29884114 PMCID: PMC5994079 DOI: 10.1186/s12859-018-2226-y] [Citation(s) in RCA: 167] [Impact Index Per Article: 27.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 05/30/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The single cell RNA sequencing (scRNA-seq) technique begin a new era by allowing the observation of gene expression at the single cell level. However, there is also a large amount of technical and biological noise. Because of the low number of RNA transcriptomes and the stochastic nature of the gene expression pattern, there is a high chance of missing nonzero entries as zero, which are called dropout events. RESULTS We develop DrImpute to impute dropout events in scRNA-seq data. We show that DrImpute has significantly better performance on the separation of the dropout zeros from true zeros than existing imputation algorithms. We also demonstrate that DrImpute can significantly improve the performance of existing tools for clustering, visualization and lineage reconstruction of nine published scRNA-seq datasets. CONCLUSIONS DrImpute can serve as a very useful addition to the currently existing statistical tools for single cell RNA-seq analysis. DrImpute is implemented in R and is available at https://github.com/gongx030/DrImpute .
Collapse
Affiliation(s)
- Wuming Gong
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Il-Youp Kwak
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Pruthvi Pota
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Naoko Koyano-Nakagawa
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| | - Daniel J. Garry
- Lillehei Heart Institute, University of Minnesota, 2231 6th St S.E, 4-165 CCRB, Minneapolis, MN 55114 USA
| |
Collapse
|