1
|
Hoedjes KM, Grath S, Posnien N, Ritchie MG, Schlötterer C, Abbott JK, Almudi I, Coronado-Zamora M, Durmaz Mitchell E, Flatt T, Fricke C, Glaser-Schmitt A, González J, Holman L, Kankare M, Lenhart B, Orengo DJ, Snook RR, Yılmaz VM, Yusuf L. From whole bodies to single cells: A guide to transcriptomic approaches for ecology and evolutionary biology. Mol Ecol 2024:e17382. [PMID: 38856653 DOI: 10.1111/mec.17382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 04/09/2024] [Accepted: 04/29/2024] [Indexed: 06/11/2024]
Abstract
RNA sequencing (RNAseq) methodology has experienced a burst of technological developments in the last decade, which has opened up opportunities for studying the mechanisms of adaptation to environmental factors at both the organismal and cellular level. Selecting the most suitable experimental approach for specific research questions and model systems can, however, be a challenge and researchers in ecology and evolution are commonly faced with the choice of whether to study gene expression variation in whole bodies, specific tissues, and/or single cells. A wide range of sometimes polarised opinions exists over which approach is best. Here, we highlight the advantages and disadvantages of each of these approaches to provide a guide to help researchers make informed decisions and maximise the power of their study. Using illustrative examples of various ecological and evolutionary research questions, we guide the readers through the different RNAseq approaches and help them identify the most suitable design for their own projects.
Collapse
Affiliation(s)
- Katja M Hoedjes
- Amsterdam Institute for Life and Environment, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sonja Grath
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Nico Posnien
- Department of Developmental Biology, Göttingen Center for Molecular Biosciences (GZMB), University of Göttingen, Göttingen, Germany
| | - Michael G Ritchie
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK
| | | | | | - Isabel Almudi
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | | | - Esra Durmaz Mitchell
- Department of Biology, University of Fribourg, Fribourg, Switzerland
- Functional Genomics and Metabolism Research Unit, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark
| | - Thomas Flatt
- Department of Biology, University of Fribourg, Fribourg, Switzerland
| | - Claudia Fricke
- Institute for Zoology/Animal Ecology, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | | | - Josefa González
- Institute of Evolutionary Biology, CSIC, UPF, Barcelona, Spain
| | - Luke Holman
- School of Applied Sciences, Edinburgh Napier University, Edinburgh, UK
| | - Maaria Kankare
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
| | - Benedict Lenhart
- Department of Biology, University of Virginia, Charlottesville, Virginia, USA
| | - Dorcas J Orengo
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
| | - Rhonda R Snook
- Department of Zoology, Stockholm University, Stockholm, Sweden
| | - Vera M Yılmaz
- Division of Evolutionary Biology, LMU Munich, Planegg-Martinsried, Germany
| | - Leeban Yusuf
- Centre for Biological Diversity, University of St Andrews, St Andrews, UK
| |
Collapse
|
2
|
Alonso-Moreda N, Berral-González A, De La Rosa E, González-Velasco O, Sánchez-Santos JM, De Las Rivas J. Comparative Analysis of Cell Mixtures Deconvolution and Gene Signatures Generated for Blood, Immune and Cancer Cells. Int J Mol Sci 2023; 24:10765. [PMID: 37445946 DOI: 10.3390/ijms241310765] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
In the last two decades, many detailed full transcriptomic studies on complex biological samples have been published and included in large gene expression repositories. These studies primarily provide a bulk expression signal for each sample, including multiple cell-types mixed within the global signal. The cellular heterogeneity in these mixtures does not allow the activity of specific genes in specific cell types to be identified. Therefore, inferring relative cellular composition is a very powerful tool to achieve a more accurate molecular profiling of complex biological samples. In recent decades, computational techniques have been developed to solve this problem by applying deconvolution methods, designed to decompose cell mixtures into their cellular components and calculate the relative proportions of these elements. Some of them only calculate the cell proportions (supervised methods), while other deconvolution algorithms can also identify the gene signatures specific for each cell type (unsupervised methods). In these work, five deconvolution methods (CIBERSORT, FARDEEP, DECONICA, LINSEED and ABIS) were implemented and used to analyze blood and immune cells, and also cancer cells, in complex mixture samples (using three bulk expression datasets). Our study provides three analytical tools (corrplots, cell-signature plots and bar-mixture plots) that allow a thorough comparative analysis of the cell mixture data. The work indicates that CIBERSORT is a robust method optimized for the identification of immune cell-types, but not as efficient in the identification of cancer cells. We also found that LINSEED is a very powerful unsupervised method that provides precise and specific gene signatures for each of the main immune cell types tested: neutrophils and monocytes (of the myeloid lineage), B-cells, NK cells and T-cells (of the lymphoid lineage), and also for cancer cells.
Collapse
Affiliation(s)
- Natalia Alonso-Moreda
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Alberto Berral-González
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Enrique De La Rosa
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Oscar González-Velasco
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - José Manuel Sánchez-Santos
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Department of Statistics, University of Salamanca (USAL), 37008 Salamanca, Spain
| | - Javier De Las Rivas
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| |
Collapse
|
3
|
Heiling HM, Wilson DR, Rashid NU, Sun W, Ibrahim JG. Estimating cell type composition using isoform expression one gene at a time. Biometrics 2023; 79:854-865. [PMID: 34921386 PMCID: PMC11245124 DOI: 10.1111/biom.13614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 12/08/2021] [Indexed: 11/29/2022]
Abstract
Human tissue samples are often mixtures of heterogeneous cell types, which can confound the analyses of gene expression data derived from such tissues. The cell type composition of a tissue sample may itself be of interest and is needed for proper analysis of differential gene expression. A variety of computational methods have been developed to estimate cell type proportions using gene-level expression data. However, RNA isoforms can also be differentially expressed across cell types, and isoform-level expression could be equally or more informative for determining cell type origin than gene-level expression. We propose a new computational method, IsoDeconvMM, which estimates cell type fractions using isoform-level gene expression data. A novel and useful feature of IsoDeconvMM is that it can estimate cell type proportions using only a single gene, though in practice we recommend aggregating estimates of a few dozen genes to obtain more accurate results. We demonstrate the performance of IsoDeconvMM using a unique data set with cell type-specific RNA-seq data across more than 135 individuals. This data set allows us to evaluate different methods given the biological variation of cell type-specific gene expression data across individuals. We further complement this analysis with additional simulations.
Collapse
Affiliation(s)
- Hillary M Heiling
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Douglas R Wilson
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Naim U Rashid
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| | - Wei Sun
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
| |
Collapse
|
4
|
Real age prediction from the transcriptome with RAPToR. Nat Methods 2022; 19:969-975. [PMID: 35817937 DOI: 10.1038/s41592-022-01540-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 05/25/2022] [Indexed: 11/08/2022]
Abstract
Transcriptomic data is often affected by uncontrolled variation among samples that can obscure and confound the effects of interest. This variation is frequently due to unintended differences in developmental stages between samples. The transcriptome itself can be used to estimate developmental progression, but existing methods require many samples and do not estimate a specimen's real age. Here we present real-age prediction from transcriptome staging on reference (RAPToR), a computational method that precisely estimates the real age of a sample from its transcriptome, exploiting existing time-series data as reference. RAPToR works with whole animal, dissected tissue and single-cell data for the most common animal models, humans and even for non-model organisms lacking reference data. We show that RAPToR can be used to remove age as a confounding factor and allow recovery of a signal of interest in differential expression analysis. RAPToR will be especially useful in large-scale single-organism profiling because it eliminates the need for accurate staging or synchronisation before profiling.
Collapse
|
5
|
Karikomi M, Zhou P, Nie Q. DURIAN: an integrative deconvolution and imputation method for robust signaling analysis of single-cell transcriptomics data. Brief Bioinform 2022; 23:6609525. [PMID: 35709795 PMCID: PMC9294432 DOI: 10.1093/bib/bbac223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/29/2022] [Accepted: 05/11/2022] [Indexed: 01/31/2023] Open
Abstract
Single-cell RNA sequencing trades read-depth for dimensionality, often leading to loss of critical signaling gene information that is typically present in bulk data sets. We introduce DURIAN (Deconvolution and mUltitask-Regression-based ImputAtioN), an integrative method for recovery of gene expression in single-cell data. Through systematic benchmarking, we demonstrate the accuracy, robustness and empirical convergence of DURIAN using both synthetic and published data sets. We show that use of DURIAN improves single-cell clustering, low-dimensional embedding, and recovery of intercellular signaling networks. Our study resolves several inconsistent results of cell-cell communication analysis using single-cell or bulk data independently. The method has broad application in biomarker discovery and cell signaling analysis using single-cell transcriptomics data sets.
Collapse
Affiliation(s)
| | - Peijie Zhou
- Corresponding authors: Peijie Zhou, 540P Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993; ; Qing Nie, 540F Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993;
| | - Qing Nie
- Corresponding authors: Peijie Zhou, 540P Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993; ; Qing Nie, 540F Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993;
| |
Collapse
|
6
|
Lei H, Guo XA, Tao Y, Ding K, Fu X, Oesterreich S, Lee AV, Schwartz R. OUP accepted manuscript. Bioinformatics 2022; 38:i386-i394. [PMID: 35758822 PMCID: PMC9235482 DOI: 10.1093/bioinformatics/btac262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Motivation Identifying cell types and their abundances and how these evolve during tumor progression is critical to understanding the mechanisms of metastasis and identifying predictors of metastatic potential that can guide the development of new diagnostics or therapeutics. Single-cell RNA sequencing (scRNA-seq) has been especially promising in resolving heterogeneity of expression programs at the single-cell level, but is not always feasible, e.g. for large cohort studies or longitudinal analysis of archived samples. In such cases, clonal subpopulations may still be inferred via genomic deconvolution, but deconvolution methods have limited ability to resolve fine clonal structure and may require reference cell type profiles that are missing or imprecise. Prior methods can eliminate the need for reference profiles but show unstable performance when few bulk samples are available. Results In this work, we develop a new method using reference scRNA-seq to interpret sample collections for which only bulk RNA-seq is available for some samples, e.g. clonally resolving archived primary tissues using scRNA-seq from metastases. By integrating such information in a Quadratic Programming framework, our method can recover more accurate cell types and corresponding cell type abundances in bulk samples. Application to a breast tumor bone metastases dataset confirms the power of scRNA-seq data to improve cell type inference and quantification in same-patient bulk samples. Availability and implementation Source code is available on Github at https://github.com/CMUSchwartzLab/RADs.
Collapse
Affiliation(s)
| | | | - Yifeng Tao
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kai Ding
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Xuecong Fu
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Steffi Oesterreich
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | - Adrian V Lee
- Department of Pharmacology and Chemical Biology, UPMC Hillman Cancer Center, Magee-Womens Research Institute, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
7
|
Doostparast Torshizi A, Duan J, Wang K. A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders. NAR Genom Bioinform 2021; 3:lqab056. [PMID: 34169279 PMCID: PMC8219045 DOI: 10.1093/nargab/lqab056] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/21/2021] [Indexed: 02/06/2023] Open
Abstract
The importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, most gene expression studies are conducted on bulk tissues, without examining cell type-specific expression profiles. Several computational methods are available for cell type deconvolution (i.e. inference of cellular composition) from bulk RNA-Seq data, but few of them impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq and population-wide expression profiles, it can be computationally tractable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations and uses a multi-variate stochastic search algorithm to estimate the cell type-specific expression profiles. Analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease and type 2 diabetes validated the efficiency of CellR, while revealing how specific cell types contribute to different diseases. In summary, CellR compares favorably against competing approaches, enabling cell type-specific re-analysis of gene expression data on bulk tissues in complex diseases.
Collapse
Affiliation(s)
- Abolfazl Doostparast Torshizi
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
8
|
Blasco A, Natoli T, Endres MG, Sergeev RA, Randazzo S, Paik JH, Macaluso NJM, Narayan R, Lu X, Peck D, Lakhani KR, Subramanian A. Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map. Bioinformatics 2021; 37:2889-2895. [PMID: 33824954 PMCID: PMC8479655 DOI: 10.1093/bioinformatics/btab192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 03/03/2021] [Accepted: 03/19/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples. RESULTS We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. AVAILABILITY AND IMPLEMENTATION The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https://github.com/cmap/gene_deconvolution_challenge. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrea Blasco
- Harvard Business School, Harvard University, Boston, MA 02163, USA,Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA,To whom correspondence should be addressed.
| | - Ted Natoli
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Michael G Endres
- Harvard Business School, Harvard University, Boston, MA 02163, USA
| | - Rinat A Sergeev
- Harvard Business School, Harvard University, Boston, MA 02163, USA
| | - Steven Randazzo
- Harvard Business School, Harvard University, Boston, MA 02163, USA
| | - Jin H Paik
- Harvard Business School, Harvard University, Boston, MA 02163, USA
| | | | - Rajiv Narayan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Xiaodong Lu
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - David Peck
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Karim R Lakhani
- Harvard Business School, Harvard University, Boston, MA 02163, USA,National Bureau of Economic Research (NBER), Cambridge, MA 02138, USA
| | | |
Collapse
|
9
|
Hunt GJ, Gagnon-Bartsch JA. The role of scale in the estimation of cell-type proportions. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Dong L, Kollipara A, Darville T, Zou F, Zheng X. Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information. Sci Rep 2020; 10:5434. [PMID: 32214192 PMCID: PMC7096458 DOI: 10.1038/s41598-020-62330-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/26/2020] [Indexed: 01/03/2023] Open
Abstract
Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.
Collapse
Affiliation(s)
- Li Dong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Avinash Kollipara
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| |
Collapse
|
11
|
Görtler F, Schön M, Simeth J, Solbrig S, Wettig T, Oefner PJ, Spang R, Altenbuchinger M. Loss-Function Learning for Digital Tissue Deconvolution. J Comput Biol 2020; 27:342-355. [DOI: 10.1089/cmb.2019.0462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Franziska Görtler
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Marian Schön
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Jakob Simeth
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Peter J. Oefner
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Michael Altenbuchinger
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| |
Collapse
|
12
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
13
|
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019; 34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 129] [Impact Index Per Article: 25.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open
Abstract
Summary Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. In this review, we highlight the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting. We also explain the various deconvolution scenarios, the mathematical approaches used to solve them and the effect of data processing and different confounding factors on the accuracy of the deconvolution results. Contact katleen.depreter@ugent.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| |
Collapse
|
14
|
Predicting Host Immune Cell Dynamics and Key Disease-Associated Genes Using Tissue Transcriptional Profiles. Processes (Basel) 2019. [DOI: 10.3390/pr7050301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Motivation: Immune cell dynamics is a critical factor of disease-associated pathology (immunopathology) that also impacts the levels of mRNAs in diseased tissue. Deconvolution algorithms attempt to infer cell quantities in a tissue/organ sample based on gene expression profiles and are often evaluated using artificial, non-complex samples. Their accuracy on estimating cell counts given temporal tissue gene expression data remains not well characterized and has never been characterized when using diseased lung. Further, how to remove the effects of cell migration on transcript counts to improve discovery of disease factors is an open question. Results: Four cell count inference (i.e., deconvolution) tools are evaluated using microarray data from influenza-infected lung sampled at several time points post-infection. The analysis finds that inferred cell quantities are accurate only for select cell types and there is a tendency for algorithms to have a good relative fit (R 2 ) but a poor absolute fit (normalized mean squared error; NMSE), which suggests systemic biases exist. Nonetheless, using cell fraction estimates to adjust gene expression data, we show that genes associated with influenza virus replication and increased infection pathology are more likely to be identified as significant than when applying traditional statistical tests.
Collapse
|
15
|
Monaco G, Lee B, Xu W, Mustafah S, Hwang YY, Carré C, Burdin N, Visan L, Ceccarelli M, Poidinger M, Zippelius A, Pedro de Magalhães J, Larbi A. RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types. Cell Rep 2019; 26:1627-1640.e7. [PMID: 30726743 PMCID: PMC6367568 DOI: 10.1016/j.celrep.2019.01.041] [Citation(s) in RCA: 468] [Impact Index Per Article: 93.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 12/03/2018] [Accepted: 01/10/2019] [Indexed: 01/22/2023] Open
Abstract
The molecular characterization of immune subsets is important for designing effective strategies to understand and treat diseases. We characterized 29 immune cell types within the peripheral blood mononuclear cell (PBMC) fraction of healthy donors using RNA-seq (RNA sequencing) and flow cytometry. Our dataset was used, first, to identify sets of genes that are specific, are co-expressed, and have housekeeping roles across the 29 cell types. Then, we examined differences in mRNA heterogeneity and mRNA abundance revealing cell type specificity. Last, we performed absolute deconvolution on a suitable set of immune cell types using transcriptomics signatures normalized by mRNA abundance. Absolute deconvolution is ready to use for PBMC transcriptomic data using our Shiny app (https://github.com/giannimonaco/ABIS). We benchmarked different deconvolution and normalization methods and validated the resources in independent cohorts. Our work has research, clinical, and diagnostic value by making it possible to effectively associate observations in bulk transcriptomics data to specific immune subsets.
Collapse
Affiliation(s)
- Gianni Monaco
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore; Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool L78TX, UK; Department of Biomedicine, University Hospital and University of Basel, 4031 Basel, Switzerland.
| | - Bernett Lee
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Weili Xu
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Seri Mustafah
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - You Yi Hwang
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | | | | | | | - Michele Ceccarelli
- BIOGEM Research Center, Ariano Irpino, Italy; Department of Science and Technology, University of Sannio, Benevento, Italy
| | - Michael Poidinger
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Alfred Zippelius
- Department of Biomedicine, University Hospital and University of Basel, 4031 Basel, Switzerland
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool L78TX, UK.
| | - Anis Larbi
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore; Department of Biology, Faculty of Sciences, University Tunis El Manar, Tunis, Tunisia; Faculty of Medicine, University of Sherbrooke, Sherbrooke, QC, Canada; Department of Microbiology, Immunology Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
16
|
Hunt GJ, Freytag S, Bahlo M, Gagnon-Bartsch JA. dtangle: accurate and robust cell type deconvolution. Bioinformatics 2018; 35:2093-2099. [DOI: 10.1093/bioinformatics/bty926] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 10/20/2018] [Accepted: 11/06/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Cell type composition of tissues is important in many biological processes. To help understand cell type composition using gene expression data, methods of estimating (deconvolving) cell type proportions have been developed. Such estimates are often used to adjust for confounding effects of cell type in differential expression analysis (DEA).
Results
We propose dtangle, a new cell type deconvolution method. dtangle works on a range of DNA microarray and bulk RNA-seq platforms. It estimates cell type proportions using publicly available, often cross-platform, reference data. We evaluate dtangle on 11 benchmark datasets showing that dtangle is competitive with published deconvolution methods, is robust to outliers and selection of tuning parameters, and is fast. As a case study, we investigate the human immune response to Lyme disease. dtangle’s estimates reveal a temporal trend consistent with previous findings and are important covariates for DEA across disease status.
Availability and implementation
dtangle is on CRAN (cran.r-project.org/package=dtangle) or github (dtangle.github.io).
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregory J Hunt
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | | |
Collapse
|
17
|
BayesCCE: a Bayesian framework for estimating cell-type composition from DNA methylation without the need for methylation reference. Genome Biol 2018; 19:141. [PMID: 30241486 PMCID: PMC6151042 DOI: 10.1186/s13059-018-1513-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Accepted: 08/20/2018] [Indexed: 11/10/2022] Open
Abstract
We introduce a Bayesian semi-supervised method for estimating cell counts from DNA methylation by leveraging an easily obtainable prior knowledge on the cell-type composition distribution of the studied tissue. We show mathematically and empirically that alternative methods which attempt to infer cell counts without methylation reference only capture linear combinations of cell counts rather than provide one component per cell type. Our approach allows the construction of components such that each component corresponds to a single cell type, and provides a new opportunity to investigate cell compositions in genomic studies of tissues for which it was not possible before.
Collapse
|
18
|
Chen X, Teichmann SA, Meyer KB. From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013452] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the recent transformative developments in single-cell genomics and, in particular, single-cell gene expression analysis, it is now possible to study tissues at the single-cell level, rather than having to rely on data from bulk measurements. Here we review the rapid developments in single-cell RNA sequencing (scRNA-seq) protocols that have the potential for unbiased identification and profiling of all cell types within a tissue or organism. In addition, novel approaches for spatial profiling of gene expression allow us to map individual cells and cell types back into the three-dimensional context of organs. The combination of in-depth single-cell and spatial gene expression data will reveal tissue architecture in unprecedented detail, generating a wealth of biological knowledge and a better understanding of many diseases.
Collapse
Affiliation(s)
- Xi Chen
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| | - Sarah A. Teichmann
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
- European Molecular Biology Laboratory (EMBL)–European Bioinformatics Institute (EBI), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Theory of Condensed Matter Research Group, Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, United Kingdom
| | - Kerstin B. Meyer
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, United Kingdom
| |
Collapse
|
19
|
Wang N, Chen L, Wang Y. Mathematical Modeling and Deconvolution of Molecular Heterogeneity Identifies Novel Subpopulations in Complex Tissues. Methods Mol Biol 2018; 1751:223-236. [PMID: 29508301 DOI: 10.1007/978-1-4939-7710-9_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised methods to deconvolve tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we develop a novel unsupervised deconvolution method, Convex Analysis of Mixtures (CAM), within a well-grounded mathematical framework, to dissect mixed gene expressions in heterogeneous tissue samples. To facilitate the utility of this method, we implement an R-Java CAM package that provides comprehensive analytic functions and graphic user interface (GUI).
Collapse
Affiliation(s)
- Niya Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA.
| | - Lulu Chen
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA
| | - Yue Wang
- Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA
| |
Collapse
|
20
|
Zhang JD, Hatje K, Sturm G, Broger C, Ebeling M, Burtin M, Terzi F, Pomposiello SI, Badi L. Detect tissue heterogeneity in gene expression data with BioQC. BMC Genomics 2017; 18:277. [PMID: 28376718 PMCID: PMC5379536 DOI: 10.1186/s12864-017-3661-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 03/25/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression data can be compromised by cells originating from other tissues than the target tissue of profiling. Failures in detecting such tissue heterogeneity have profound implications on data interpretation and reproducibility. A computational tool explicitly addressing the issue is warranted. RESULTS We introduce BioQC, a R/Bioconductor software package to detect tissue heterogeneity in gene expression data. To this end BioQC implements a computationally efficient Wilcoxon-Mann-Whitney test and provides more than 150 signatures of tissue-enriched genes derived from large-scale transcriptomics studies. Simulation experiments show that BioQC is both fast and sensitive in detecting tissue heterogeneity. In a case study with whole-organ profiling data, BioQC predicted contamination events that are confirmed by quantitative RT-PCR. Applied to transcriptomics data of the Genotype-Tissue Expression (GTEx) project, BioQC reveals clustering of samples and suggests that some samples likely suffer from tissue heterogeneity. CONCLUSIONS Our experience with gene expression data indicates a prevalence of tissue heterogeneity that often goes unnoticed. BioQC addresses the issue by integrating prior knowledge with a scalable algorithm. We propose BioQC as a first-line tool to ensure quality and reproducibility of gene expression data.
Collapse
Affiliation(s)
- Jitao David Zhang
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| | - Klas Hatje
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| | - Gregor Sturm
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| | - Clemens Broger
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
- Present address: Peter-Rot-Strasse 84, Basel, 4058 Switzerland
| | - Martin Ebeling
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| | - Martine Burtin
- Inserm U1151, Université Paris Descartes, Institut Necker Enfants Malades, Hôpital Necker Enfants Malades, 149, Rue de Sèvres, Paris, 75015 France
| | - Fabiola Terzi
- Inserm U1151, Université Paris Descartes, Institut Necker Enfants Malades, Hôpital Necker Enfants Malades, 149, Rue de Sèvres, Paris, 75015 France
| | - Silvia Ines Pomposiello
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| | - Laura Badi
- Roche Pharma Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, 4070 Switzerland
| |
Collapse
|
21
|
Shannon CP, Balshaw R, Chen V, Hollander Z, Toma M, McManus BM, FitzGerald JM, Sin DD, Ng RT, Tebbutt SJ. Enumerateblood - an R package to estimate the cellular composition of whole blood from Affymetrix Gene ST gene expression profiles. BMC Genomics 2017; 18:43. [PMID: 28061752 PMCID: PMC5219701 DOI: 10.1186/s12864-016-3460-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2016] [Accepted: 12/22/2016] [Indexed: 11/20/2022] Open
Abstract
Background Measuring genome-wide changes in transcript abundance in circulating peripheral whole blood is a useful way to study disease pathobiology and may help elucidate the molecular mechanisms of disease, or discovery of useful disease biomarkers. The sensitivity and interpretability of analyses carried out in this complex tissue, however, are significantly affected by its dynamic cellular heterogeneity. It is therefore desirable to quantify this heterogeneity, either to account for it or to better model interactions that may be present between the abundance of certain transcripts, specific cell types and the indication under study. Accurate enumeration of the many component cell types that make up peripheral whole blood can further complicate the sample collection process, however, and result in additional costs. Many approaches have been developed to infer the composition of a sample from high-dimensional transcriptomic and, more recently, epigenetic data. These approaches rely on the availability of isolated expression profiles for the cell types to be enumerated. These profiles are platform-specific, suitable datasets are rare, and generating them is expensive. No such dataset exists on the Affymetrix Gene ST platform. Results We present ‘Enumerateblood’, a freely-available and open source R package that exposes a multi-response Gaussian model capable of accurately predicting the composition of peripheral whole blood samples from Affymetrix Gene ST expression profiles, outperforming other current methods when applied to Gene ST data. Conclusions ‘Enumerateblood’ significantly improves our ability to study disease pathobiology from whole blood gene expression assayed on the popular Affymetrix Gene ST platform by allowing a more complete study of the various components of this complex tissue without the need for additional data collection. Future use of the model may allow for novel insights to be generated from the ~400 Affymetrix Gene ST blood gene expression datasets currently available on the Gene Expression Omnibus (GEO) website. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3460-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Casey P Shannon
- PROOF Centre of Excellence, Vancouver, BC, Canada. .,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada.
| | - Robert Balshaw
- PROOF Centre of Excellence, Vancouver, BC, Canada.,BC Centre for Disease Control, Vancouver, BC, Canada
| | - Virginia Chen
- PROOF Centre of Excellence, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada
| | - Zsuzsanna Hollander
- PROOF Centre of Excellence, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada
| | - Mustafa Toma
- Division of Cardiology, University of British Columbia, Vancouver, BC, Canada
| | - Bruce M McManus
- PROOF Centre of Excellence, Vancouver, BC, Canada.,Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada.,Institute for Heart and Lung Health, Vancouver, BC, Canada
| | - J Mark FitzGerald
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada.,Institute for Heart and Lung Health, Vancouver, BC, Canada
| | - Don D Sin
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada.,Institute for Heart and Lung Health, Vancouver, BC, Canada
| | - Raymond T Ng
- PROOF Centre of Excellence, Vancouver, BC, Canada.,Department of Computer Science, University of British Columbia, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada.,Institute for Heart and Lung Health, Vancouver, BC, Canada
| | - Scott J Tebbutt
- PROOF Centre of Excellence, Vancouver, BC, Canada.,Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada.,Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC, Canada.,Institute for Heart and Lung Health, Vancouver, BC, Canada
| |
Collapse
|
22
|
Convex Analysis of Mixtures for Separating Non-negative Well-grounded Sources. Sci Rep 2016; 6:38350. [PMID: 27922124 PMCID: PMC5138607 DOI: 10.1038/srep38350] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2016] [Accepted: 11/08/2016] [Indexed: 11/17/2022] Open
Abstract
Blind Source Separation (BSS) is a powerful tool for analyzing composite data patterns in many areas, such as computational biology. We introduce a novel BSS method, Convex Analysis of Mixtures (CAM), for separating non-negative well-grounded sources, which learns the mixing matrix by identifying the lateral edges of the convex data scatter plot. We propose and prove a sufficient and necessary condition for identifying the mixing matrix through edge detection in the noise-free case, which enables CAM to identify the mixing matrix not only in the exact-determined and over-determined scenarios, but also in the under-determined scenario. We show the optimality of the edge detection strategy, even for cases where source well-groundedness is not strictly satisfied. The CAM algorithm integrates plug-in noise filtering using sector-based clustering, an efficient geometric convex analysis scheme, and stability-based model order selection. The superior performance of CAM against a panel of benchmark BSS techniques is demonstrated on numerically mixed gene expression data of ovarian cancer subtypes. We apply CAM to dissect dynamic contrast-enhanced magnetic resonance imaging data taken from breast tumors and time-course microarray gene expression data derived from in-vivo muscle regeneration in mice, both producing biologically plausible decomposition results.
Collapse
|
23
|
Regulatory complexity revealed by integrated cytological and RNA-seq analyses of meiotic substages in mouse spermatocytes. BMC Genomics 2016; 17:628. [PMID: 27519264 PMCID: PMC4983049 DOI: 10.1186/s12864-016-2865-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 06/28/2016] [Indexed: 01/24/2023] Open
Abstract
Background The continuous and non-synchronous nature of postnatal male germ-cell development has impeded stage-specific resolution of molecular events of mammalian meiotic prophase in the testis. Here the juvenile onset of spermatogenesis in mice is analyzed by combining cytological and transcriptomic data in a novel computational analysis that allows decomposition of the transcriptional programs of spermatogonia and meiotic prophase substages. Results Germ cells from testes of individual mice were obtained at two-day intervals from 8 to 18 days post-partum (dpp), prepared as surface-spread chromatin and immunolabeled for meiotic stage-specific protein markers (STRA8, SYCP3, phosphorylated H2AFX, and HISTH1T). Eight stages were discriminated cytologically by combinatorial antibody labeling, and RNA-seq was performed on the same samples. Independent principal component analyses of cytological and transcriptomic data yielded similar patterns for both data types, providing strong evidence for substage-specific gene expression signatures. A novel permutation-based maximum covariance analysis (PMCA) was developed to map co-expressed transcripts to one or more of the eight meiotic prophase substages, thereby linking distinct molecular programs to cytologically defined cell states. Expression of meiosis-specific genes is not substage-limited, suggesting regulation of substage transitions at other levels. Conclusions This integrated analysis provides a general method for resolving complex cell populations. Here it revealed not only features of meiotic substage-specific gene expression, but also a network of substage-specific transcription factors and relationships to potential target genes. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2865-1) contains supplementary material, which is available to authorized users.
Collapse
|
24
|
Newman AM, Alizadeh AA. High-throughput genomic profiling of tumor-infiltrating leukocytes. Curr Opin Immunol 2016; 41:77-84. [PMID: 27372732 DOI: 10.1016/j.coi.2016.06.006] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2016] [Accepted: 06/13/2016] [Indexed: 12/21/2022]
Abstract
Tumors are complex ecosystems comprised of diverse cell types including malignant cells, mesenchymal cells, and tumor-infiltrating leukocytes (TILs). While TILs are well known to play important roles in many aspects of cancer biology, recent developments in immuno-oncology have spurred considerable interest in TILs, particularly in relation to their optimal engagement by emerging immunotherapies. Traditionally, the enumeration of TIL phenotypic diversity and composition in solid tumors has relied on resolving single cells by flow cytometry and immunohistochemical methods. However, advances in genome-wide technologies and computational methods are now allowing TILs to be profiled with increasingly high resolution and accuracy directly from RNA mixtures of bulk tumor samples. In this review, we highlight recent progress in the development of in silico tumor dissection methods, and illustrate examples of how these strategies can be applied to characterize TILs in human tumors to facilitate personalized cancer therapy.
Collapse
Affiliation(s)
- Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA; Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
| | - Ash A Alizadeh
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA; Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Stanford Cancer Institute, Stanford University, Stanford, CA, USA; Division of Hematology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|
25
|
Waite LL, Weaver B, Day K, Li X, Roberts K, Gibson AW, Edberg JC, Kimberly RP, Absher DM, Tiwari HK. Estimation of Cell-Type Composition Including T and B Cell Subtypes for Whole Blood Methylation Microarray Data. Front Genet 2016; 7:23. [PMID: 26925097 PMCID: PMC4757643 DOI: 10.3389/fgene.2016.00023] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 02/03/2016] [Indexed: 01/31/2023] Open
Abstract
DNA methylation levels vary markedly by cell-type makeup of a sample. Understanding these differences and estimating the cell-type makeup of a sample is an important aspect of studying DNA methylation. DNA from leukocytes in whole blood is simple to obtain and pervasive in research. However, leukocytes contain many distinct cell types and subtypes. We propose a two-stage model that estimates the proportions of six main cell types in whole blood (CD4+ T cells, CD8+ T cells, monocytes, B cells, granulocytes, and natural killer cells) as well as subtypes of T and B cells. Unlike previous methods that only estimate overall proportions of CD4+ T cell, CD8+ T cells, and B cells, our model is able to estimate proportions of naïve, memory, and regulatory CD4+ T cells as well as naïve and memory CD8+ T cells and naïve and memory B cells. Using real and simulated data, we are able to demonstrate that our model is able to reliably estimate proportions of these cell types and subtypes. In studies with DNA methylation data from Illumina's HumanMethylation450k arrays, our estimates will be useful both for testing for associations of cell type and subtype composition with phenotypes of interest as well as for adjustment purposes to prevent confounding in epigenetic association studies. Additionally, our method can be easily adapted for use with whole genome bisulfite sequencing (WGBS) data or any other genome-wide methylation data platform.
Collapse
Affiliation(s)
- Lindsay L Waite
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at BirminghamBirmingham, AL, USA; HudsonAlpha Institute for BiotechnologyHuntsville, AL, USA
| | | | - Kenneth Day
- HudsonAlpha Institute for Biotechnology Huntsville, AL, USA
| | - Xinrui Li
- Division of Clinical Immunology and Rheumatology, Department of Medicine, School of Medicine, University of Alabama at Birmingham Birmingham, AL, USA
| | - Kevin Roberts
- HudsonAlpha Institute for Biotechnology Huntsville, AL, USA
| | - Andrew W Gibson
- Division of Clinical Immunology and Rheumatology, Department of Medicine, School of Medicine, University of Alabama at Birmingham Birmingham, AL, USA
| | - Jeffrey C Edberg
- Division of Clinical Immunology and Rheumatology, Department of Medicine, School of Medicine, University of Alabama at Birmingham Birmingham, AL, USA
| | - Robert P Kimberly
- Division of Clinical Immunology and Rheumatology, Department of Medicine, School of Medicine, University of Alabama at Birmingham Birmingham, AL, USA
| | - Devin M Absher
- HudsonAlpha Institute for Biotechnology Huntsville, AL, USA
| | - Hemant K Tiwari
- Section on Statistical Genetics, Department of Biostatistics, School of Public Health, University of Alabama at Birmingham Birmingham, AL, USA
| |
Collapse
|
26
|
Mathematical modelling of transcriptional heterogeneity identifies novel markers and subpopulations in complex tissues. Sci Rep 2016; 6:18909. [PMID: 26739359 PMCID: PMC4703969 DOI: 10.1038/srep18909] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 11/23/2015] [Indexed: 01/18/2023] Open
Abstract
Tissue heterogeneity is both a major confounding factor and an underexploited information source. While a handful of reports have demonstrated the potential of supervised computational methods to deconvolute tissue heterogeneity, these approaches require a priori information on the marker genes or composition of known subpopulations. To address the critical problem of the absence of validated marker genes for many (including novel) subpopulations, we describe convex analysis of mixtures (CAM), a fully unsupervised in silico method, for identifying subpopulation marker genes directly from the original mixed gene expressions in scatter space that can improve molecular analyses in many biological contexts. Validated with predesigned mixtures, CAM on the gene expression data from peripheral leukocytes, brain tissue, and yeast cell cycle, revealed novel marker genes that were otherwise undetectable using existing methods. Importantly, CAM requires no a priori information on the number, identity, or composition of the subpopulations present in mixed samples, and does not require the presence of pure subpopulations in sample space. This advantage is significant in that CAM can achieve all of its goals using only a small number of heterogeneous samples, and is more powerful to distinguish between phenotypically similar subpopulations.
Collapse
|
27
|
Levine JH, Simonds EF, Bendall SC, Davis KL, Amir EAD, Tadmor MD, Litvin O, Fienberg HG, Jager A, Zunder ER, Finck R, Gedman AL, Radtke I, Downing JR, Pe'er D, Nolan GP. Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis. Cell 2015; 162:184-97. [PMID: 26095251 DOI: 10.1016/j.cell.2015.05.047] [Citation(s) in RCA: 1313] [Impact Index Per Article: 145.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2015] [Revised: 03/16/2015] [Accepted: 05/04/2015] [Indexed: 12/20/2022]
Abstract
Acute myeloid leukemia (AML) manifests as phenotypically and functionally diverse cells, often within the same patient. Intratumor phenotypic and functional heterogeneity have been linked primarily by physical sorting experiments, which assume that functionally distinct subpopulations can be prospectively isolated by surface phenotypes. This assumption has proven problematic, and we therefore developed a data-driven approach. Using mass cytometry, we profiled surface and intracellular signaling proteins simultaneously in millions of healthy and leukemic cells. We developed PhenoGraph, which algorithmically defines phenotypes in high-dimensional single-cell data. PhenoGraph revealed that the surface phenotypes of leukemic blasts do not necessarily reflect their intracellular state. Using hematopoietic progenitors, we defined a signaling-based measure of cellular phenotype, which led to isolation of a gene expression signature that was predictive of survival in independent cohorts. This study presents new methods for large-scale analysis of single-cell heterogeneity and demonstrates their utility, yielding insights into AML pathophysiology.
Collapse
Affiliation(s)
- Jacob H Levine
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY 10027, USA
| | - Erin F Simonds
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - Sean C Bendall
- Department of Pathology, Stanford University, Stanford, CA 94305, USA
| | - Kara L Davis
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - El-ad D Amir
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY 10027, USA
| | - Michelle D Tadmor
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY 10027, USA
| | - Oren Litvin
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY 10027, USA
| | - Harris G Fienberg
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - Astraea Jager
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - Eli R Zunder
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - Rachel Finck
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA
| | - Amanda L Gedman
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Ina Radtke
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - James R Downing
- Department of Pathology, St. Jude Children's Research Hospital, 262 Danny Thomas Place, Memphis, TN 38105, USA
| | - Dana Pe'er
- Departments of Biological Sciences and Systems Biology, Columbia University, New York, NY 10027, USA.
| | - Garry P Nolan
- Baxter Laboratory in Stem Cell Biology, Department of Microbiology and Immunology, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
28
|
Anghel CV, Quon G, Haider S, Nguyen F, Deshwar AG, Morris QD, Boutros PC. ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles. BMC Bioinformatics 2015; 16:156. [PMID: 25972088 PMCID: PMC4429941 DOI: 10.1186/s12859-015-0597-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 04/27/2015] [Indexed: 01/23/2023] Open
Abstract
Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0597-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Catalina V Anghel
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Gerald Quon
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada.
| | - Syed Haider
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Oncology, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Oxford, OX3 7DQ, United Kingdom.
| | - Francis Nguyen
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada.
| | - Amit G Deshwar
- Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada.
| | - Quaid D Morris
- Department of Computer Science, University of Toronto, 10 King's College Road, Room 3303, M5S 3G4, Toronto, ON, Canada. .,Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, 10 King's College, Room SFB540, Toronto, M5S 3G4, ON, Canada. .,Department of Molecular Genetics, University of Toronto, 1 King's College Circle, Room 4396, Toronto, M4S 1A8, ON, Canada. .,The Donnelly Centre, 160 College Street, Room 230, Toronto, M5S 3E1, ON, Canada.
| | - Paul C Boutros
- Informatics and Biocomputing Program, Ontario Institute for Cancer Research, 661 University Avenue, Toronto, Suite 510, M5G 0A3, ON, Canada. .,Department of Medical Biophysics, University of Toronto, 101 College Street, Toronto, M5G 1L7, ON, Canada. .,Department of Pharmacology and Toxicology, University of Toronto, 1 King's College Circle, Toronto, M5S 1A8, ON, Canada.
| |
Collapse
|
29
|
Sokol ES, Sanduja S, Jin DX, Miller DH, Mathis RA, Gupta PB. Perturbation-expression analysis identifies RUNX1 as a regulator of human mammary stem cell differentiation. PLoS Comput Biol 2015; 11:e1004161. [PMID: 25894653 PMCID: PMC4404314 DOI: 10.1371/journal.pcbi.1004161] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 01/29/2015] [Indexed: 12/18/2022] Open
Abstract
The search for genes that regulate stem cell self-renewal and differentiation has been hindered by a paucity of markers that uniquely label stem cells and early progenitors. To circumvent this difficulty we have developed a method that identifies cell-state regulators without requiring any markers of differentiation, termed Perturbation-Expression Analysis of Cell States (PEACS). We have applied this marker-free approach to screen for transcription factors that regulate mammary stem cell differentiation in a 3D model of tissue morphogenesis and identified RUNX1 as a stem cell regulator. Inhibition of RUNX1 expanded bipotent stem cells and blocked their differentiation into ductal and lobular tissue rudiments. Reactivation of RUNX1 allowed exit from the bipotent state and subsequent differentiation and mammary morphogenesis. Collectively, our findings show that RUNX1 is required for mammary stem cells to exit a bipotent state, and provide a new method for discovering cell-state regulators when markers are not available. The discovery of stem cell regulators is a major goal of biological research, but progress is often limited by a lack of definitive markers capable of distinguishing stem cells from early progenitors. Even in cases where markers have been identified, they often only enrich for certain cell states and do not uniquely identify states. While useful in some contexts, such enriching markers are ineffective tools for discovering genes that regulate the transition of cells between states. We present a method for identifying these cell state regulatory genes without the need for pre-determined markers, termed Perturbation-Expression Analysis of Cell States (PEACS). PEACS uses a novel computational approach to analyze gene expression data from perturbed cellular populations, and can be applied broadly to identify regulators of stem and progenitor cell self-renewal or differentiation. Application of PEACS to mammary stem cells resulted in the identification of RUNX1 as a key regulator of exit from the bipotent state.
Collapse
Affiliation(s)
- Ethan S. Sokol
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Sandhya Sanduja
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
| | - Dexter X. Jin
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Daniel H. Miller
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Robert A. Mathis
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Piyush B. Gupta
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, United States of America
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Koch Institute for Integrative Cancer Research at MIT, Cambridge, Massachusetts, United States of America
- Harvard Stem Cell Institute, Cambridge, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
30
|
Robust enumeration of cell subsets from tissue expression profiles. Nat Methods 2015; 12:453-7. [PMID: 25822800 PMCID: PMC4739640 DOI: 10.1038/nmeth.3337] [Citation(s) in RCA: 7396] [Impact Index Per Article: 821.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 02/02/2015] [Indexed: 12/15/2022]
Abstract
We introduce CIBERSORT, a method for characterizing cell composition of complex tissues from their gene expression profiles. When applied to enumeration of hematopoietic subsets in RNA mixtures from fresh, frozen, and fixed tissues, including solid tumors, CIBERSORT outperformed other methods with respect to noise, unknown mixture content, and closely related cell types. CIBERSORT should enable large-scale analysis of RNA mixtures for cellular biomarkers and therapeutic targets (http://cibersort.stanford.edu).
Collapse
|
31
|
Francesconi M, Lehner B. Reconstructing and analysing cellular states, space and time from gene expression profiles of many cells and single cells. MOLECULAR BIOSYSTEMS 2015; 11:2690-8. [DOI: 10.1039/c5mb00339c] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Gene expression profiling is a fast, cheap and standardised analysis that provides a high dimensional measurement of the state of a biological sample, including of single cells. Computational methods to reconstruct the composition of samples and spatial and temporal information from expression profiles are described, as well as how they can be used to describe the effects of genetic variation.
Collapse
Affiliation(s)
- Mirko Francesconi
- EMBL-CRG Systems Biology Unit
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona
- Spain
- Universitat Pompeu Fabra (UPF)
| | - Ben Lehner
- EMBL-CRG Systems Biology Unit
- Centre for Genomic Regulation (CRG)
- 08003 Barcelona
- Spain
- Universitat Pompeu Fabra (UPF)
| |
Collapse
|
32
|
Kuhn A. Correspondence regarding Zhong et al., BMC Bioinformatics 2013 Mar 7;14:89. BMC Bioinformatics 2014; 15:347. [PMID: 25431099 PMCID: PMC4245730 DOI: 10.1186/s12859-014-0347-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2013] [Accepted: 10/07/2014] [Indexed: 12/02/2022] Open
Abstract
Computational expression deconvolution aims to estimate the contribution of individual cell populations to expression profiles measured in samples of heterogeneous composition. Zhong et al. recently proposed Digital Sorting Algorithm (BMC Bioinformatics 2013 Mar 7;14:89) and showed that they could accurately estimate population-specific expression levels and expression differences between two populations. They compared DSA with Population-Specific Expression Analysis (PSEA), a previous deconvolution method that we developed to detect expression changes occurring within the same population between two conditions (e.g. disease versus non-disease). However, Zhong et al. compared PSEA-derived specific expression levels across different cell populations. Specific expression levels obtained with PSEA cannot be directly compared across different populations as they are on a relative scale. They are accurate as we demonstrate by deconvolving the same dataset used by Zhong et al. and, importantly, allow for comparison of population-specific expression across conditions.
Collapse
Affiliation(s)
- Alexandre Kuhn
- Microfluidics Systems Biology Lab, Institute of Molecular and Cell Biology, Agency for Science, Technology and Research, Proteos Building, Room #03-04, 61 Biopolis Drive, Singapore 138673, Singapore.
| |
Collapse
|
33
|
Clarke B, Clarke J. Estimating the proportions in a mixed sample using transcriptomics. Stat (Int Stat Inst) 2014. [DOI: 10.1002/sta4.65] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Bertrand Clarke
- Department of Statistics University of Nebraska–Lincoln Lincoln NE 68583 USA
| | - Jennifer Clarke
- Department of Statistics and the Department of Food Science and Technology University of Nebraska‐Lincoln Lincoln NE 68583 USA
| |
Collapse
|
34
|
Brooks MD, Jackson E, Warrington NM, Luo J, Forys JT, Taylor S, Mao DD, Leonard JR, Kim AH, Piwnica-Worms D, Mitra RD, Rubin JB. PDE7B is a novel, prognostically significant mediator of glioblastoma growth whose expression is regulated by endothelial cells. PLoS One 2014; 9:e107397. [PMID: 25203500 PMCID: PMC4159344 DOI: 10.1371/journal.pone.0107397] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2014] [Accepted: 08/15/2014] [Indexed: 11/18/2022] Open
Abstract
Cell-cell interactions between tumor cells and constituents of their microenvironment are critical determinants of tumor tissue biology and therapeutic responses. Interactions between glioblastoma (GBM) cells and endothelial cells (ECs) establish a purported cancer stem cell niche. We hypothesized that genes regulated by these interactions would be important, particularly as therapeutic targets. Using a computational approach, we deconvoluted expression data from a mixed physical co-culture of GBM cells and ECs and identified a previously undescribed upregulation of the cAMP specific phosphodiesterase PDE7B in GBM cells in response to direct contact with ECs. We further found that elevated PDE7B expression occurs in most GBM cases and has a negative effect on survival. PDE7B overexpression resulted in the expansion of a stem-like cell subpopulation in vitro and increased tumor growth and aggressiveness in an in vivo intracranial GBM model. Collectively these studies illustrate a novel approach for studying cell-cell interactions and identifying new therapeutic targets like PDE7B in GBM.
Collapse
Affiliation(s)
- Michael D. Brooks
- Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Erin Jackson
- BRIGHT Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Molecular Imaging Center, Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Nicole M. Warrington
- Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jingqin Luo
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jason T. Forys
- Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Sara Taylor
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Diane D. Mao
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Jeffrey R. Leonard
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Albert H. Kim
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - David Piwnica-Worms
- BRIGHT Institute, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Molecular Imaging Center, Mallinckrodt Institute of Radiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Cell Biology & Physiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America
| | - Robi D. Mitra
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Joshua B. Rubin
- Department of Pediatrics, Washington University School of Medicine, St. Louis, Missouri, United States of America
- Department of Anatomy and Neurobiology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
35
|
O'Duibhir E, Lijnzaad P, Benschop JJ, Lenstra TL, van Leenen D, Groot Koerkamp MJA, Margaritis T, Brok MO, Kemmeren P, Holstege FCP. Cell cycle population effects in perturbation studies. Mol Syst Biol 2014; 10:732. [PMID: 24952590 PMCID: PMC4265054 DOI: 10.15252/msb.20145172] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2014] [Revised: 05/08/2014] [Accepted: 05/12/2014] [Indexed: 12/21/2022] Open
Abstract
Growth condition perturbation or gene function disruption are commonly used strategies to study cellular systems. Although it is widely appreciated that such experiments may involve indirect effects, these frequently remain uncharacterized. Here, analysis of functionally unrelated Saccharyomyces cerevisiae deletion strains reveals a common gene expression signature. One property shared by these strains is slower growth, with increased presence of the signature in more slowly growing strains. The slow growth signature is highly similar to the environmental stress response (ESR), an expression response common to diverse environmental perturbations. Both environmental and genetic perturbations result in growth rate changes. These are accompanied by a change in the distribution of cells over different cell cycle phases. Rather than representing a direct expression response in single cells, both the slow growth signature and ESR mainly reflect a redistribution of cells over different cell cycle phases, primarily characterized by an increase in the G1 population. The findings have implications for any study of perturbation that is accompanied by growth rate changes. Strategies to counter these effects are presented and discussed.
Collapse
Affiliation(s)
- Eoghan O'Duibhir
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Philip Lijnzaad
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Joris J Benschop
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Tineke L Lenstra
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Dik van Leenen
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | | | - Thanasis Margaritis
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Mariel O Brok
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Patrick Kemmeren
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Frank C P Holstege
- Molecular Cancer Research, University Medical Center Utrecht, Utrecht, the Netherlands
| |
Collapse
|
36
|
Remien CH, Adler FR, Chesson LA, Valenzuela LO, Ehleringer JR, Cerling TE. Deconvolution of isotope signals from bundles of multiple hairs. Oecologia 2014; 175:781-9. [DOI: 10.1007/s00442-014-2945-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 04/09/2014] [Indexed: 10/25/2022]
|
37
|
Shannon CP, Balshaw R, Ng RT, Wilson-McManus JE, Keown P, McMaster R, McManus BM, Landsberg D, Isbel NM, Knoll G, Tebbutt SJ. Two-stage, in silico deconvolution of the lymphocyte compartment of the peripheral whole blood transcriptome in the context of acute kidney allograft rejection. PLoS One 2014; 9:e95224. [PMID: 24733377 PMCID: PMC3986379 DOI: 10.1371/journal.pone.0095224] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 03/24/2014] [Indexed: 01/21/2023] Open
Abstract
Acute rejection is a major complication of solid organ transplantation that prevents the long-term assimilation of the allograft. Various populations of lymphocytes are principal mediators of this process, infiltrating graft tissues and driving cell-mediated cytotoxicity. Understanding the lymphocyte-specific biology associated with rejection is therefore critical. Measuring genome-wide changes in transcript abundance in peripheral whole blood cells can deliver a comprehensive view of the status of the immune system. The heterogeneous nature of the tissue significantly affects the sensitivity and interpretability of traditional analyses, however. Experimental separation of cell types is an obvious solution, but is often impractical and, more worrying, may affect expression, leading to spurious results. Statistical deconvolution of the cell type-specific signal is an attractive alternative, but existing approaches still present some challenges, particularly in a clinical research setting. Obtaining time-matched sample composition to biologically interesting, phenotypically homogeneous cell sub-populations is costly and adds significant complexity to study design. We used a two-stage, in silico deconvolution approach that first predicts sample composition to biologically meaningful and homogeneous leukocyte sub-populations, and then performs cell type-specific differential expression analysis in these same sub-populations, from peripheral whole blood expression data. We applied this approach to a peripheral whole blood expression study of kidney allograft rejection. The patterns of differential composition uncovered are consistent with previous studies carried out using flow cytometry and provide a relevant biological context when interpreting cell type-specific differential expression results. We identified cell type-specific differential expression in a variety of leukocyte sub-populations at the time of rejection. The tissue-specificity of these differentially expressed probe-set lists is consistent with the originating tissue and their functional enrichment consistent with allograft rejection. Finally, we demonstrate that the strategy described here can be used to derive useful hypotheses by validating a cell type-specific ratio in an independent cohort using the nanoString nCounter assay.
Collapse
Affiliation(s)
- Casey P. Shannon
- PROOF Centre of Excellence, Vancouver, BC, Canada
- UBC James Hogg Centre for Heart Lung Innovations, Vancouver, BC, Canada
| | - Robert Balshaw
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada
| | - Raymond T. Ng
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Computer Science, University of British Columbia, Vancouver, BC, Canada
- UBC James Hogg Centre for Heart Lung Innovations, Vancouver, BC, Canada
| | - Janet E. Wilson-McManus
- PROOF Centre of Excellence, Vancouver, BC, Canada
- UBC James Hogg Centre for Heart Lung Innovations, Vancouver, BC, Canada
| | - Paul Keown
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Medicine, Division of Nephrology, University of British Columbia, Vancouver, BC, Canada
| | - Robert McMaster
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Bruce M. McManus
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
- UBC James Hogg Centre for Heart Lung Innovations, Vancouver, BC, Canada
| | - David Landsberg
- Division of Nephrology, St. Paul's Hospital, and University of British Columbia, Vancouver, BC, Canada
| | - Nicole M. Isbel
- Department of Nephrology, Princess Alexandra Hospital, and University of Queensland, Brisbane, Australia
| | - Greg Knoll
- Ottawa Hospital Research Institute, Ottawa, On, Canada
| | - Scott J. Tebbutt
- PROOF Centre of Excellence, Vancouver, BC, Canada
- Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada
- UBC James Hogg Centre for Heart Lung Innovations, Vancouver, BC, Canada
| |
Collapse
|
38
|
Altboum Z, Steuerman Y, David E, Barnett-Itzhaki Z, Valadarsky L, Keren-Shaul H, Meningher T, Mendelson E, Mandelboim M, Gat-Viks I, Amit I. Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol Syst Biol 2014; 10:720. [PMID: 24586061 PMCID: PMC4023392 DOI: 10.1002/msb.134947] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Hundreds of immune cell types work in coordination to maintain tissue homeostasis. Upon infection, dramatic changes occur with the localization, migration, and proliferation of the immune cells to first alert the body of the danger, confine it to limit spreading, and finally extinguish the threat and bring the tissue back to homeostasis. Since current technologies can follow the dynamics of only a limited number of cell types, we have yet to grasp the full complexity of global in vivo cell dynamics in normal developmental processes and disease. Here, we devise a computational method, digital cell quantification (DCQ), which combines genome‐wide gene expression data with an immune cell compendium to infer in vivo changes in the quantities of 213 immune cell subpopulations. DCQ was applied to study global immune cell dynamics in mice lungs at ten time points during 7 days of flu infection. We find dramatic changes in quantities of 70 immune cell types, including various innate, adaptive, and progenitor immune cells. We focus on the previously unreported dynamics of four immune dendritic cell subtypes and suggest a specific role for CD103+CD11b−DCs in early stages of disease and CD8+pDC in late stages of flu infection.
Collapse
Affiliation(s)
- Zeev Altboum
- Department of Immunology, Weizmann Institute, Rehovot, Israel
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Nieto-Diaz M, Esteban FJ, Reigada D, Muñoz-Galdeano T, Yunta M, Caballero-López M, Navarro-Ruiz R, Del Águila A, Maza RM. MicroRNA dysregulation in spinal cord injury: causes, consequences and therapeutics. Front Cell Neurosci 2014; 8:53. [PMID: 24701199 PMCID: PMC3934005 DOI: 10.3389/fncel.2014.00053] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2013] [Accepted: 02/06/2014] [Indexed: 01/18/2023] Open
Abstract
Trauma to the spinal cord causes permanent disability to more than 180,000 people every year worldwide. The initial mechanical damage triggers a complex set of secondary events involving the neural, vascular, and immune systems that largely determine the functional outcome of the spinal cord injury (SCI). Cellular and biochemical mechanisms responsible for this secondary injury largely depend on activation and inactivation of specific gene programs. Recent studies indicate that microRNAs function as gene expression switches in key processes of the SCI. Microarray data from rodent contusion models reveal that SCI induces changes in the global microRNA expression patterns. Variations in microRNA abundance largely result from alterations in the expression of the cells at the damaged spinal cord. However, microRNA expression levels after SCI are also influenced by the infiltration of immune cells to the injury site and the death and migration of specific neural cells after injury. Evidences on the role of microRNAs in the SCI pathophysiology have come from different sources. Bioinformatic analysis of microarray data has been used to identify specific variations in microRNA expression underlying transcriptional changes in target genes, which are involved in key processes in the SCI. Direct evidences on the role of microRNAs in SCI are scarcer, although recent studies have identified several microRNAs (miR-21, miR-486, miR-20) involved in key mechanisms of the SCI such as cell death or astrogliosis, among others. From a clinical perspective, different evidences make clear that microRNAs can be potent therapeutic tools to manipulate cell state and molecular processes in order to enhance functional recovery. The present article reviews the actual knowledge on how injury affects microRNA expression and the meaning of these changes in the SCI pathophysiology, to finally explore the clinical potential of microRNAs in the SCI.
Collapse
Affiliation(s)
- Manuel Nieto-Diaz
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Francisco J Esteban
- Departamento de Biología Experimental, Facultad de Ciencias Experimentales y de la Salud, Universidad de Jaén Jaén, Spain
| | - David Reigada
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Teresa Muñoz-Galdeano
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Mónica Yunta
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain ; Unidad de Patología Mitocondrial, Unidad Funcional de Investigación en Enfermedades Crónicas, Instituto de Salud Carlos III Madrid, Spain
| | - Marcos Caballero-López
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Rosa Navarro-Ruiz
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Angela Del Águila
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| | - Rodrigo M Maza
- Molecular Neuroprotection Group, Experimental Neurology Unit, Hospital Nacional de Parapléjicos (Servicio de Salud de Castilla-La Mancha) Toledo, Spain
| |
Collapse
|
40
|
Yadav VK, De S. An assessment of computational methods for estimating purity and clonality using genomic data derived from heterogeneous tumor tissue samples. Brief Bioinform 2014; 16:232-41. [PMID: 24562872 DOI: 10.1093/bib/bbu002] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Solid tumor samples typically contain multiple distinct clonal populations of cancer cells, and also stromal and immune cell contamination. A majority of the cancer genomics and transcriptomics studies do not explicitly consider genetic heterogeneity and impurity, and draw inferences based on mixed populations of cells. Deconvolution of genomic data from heterogeneous samples provides a powerful tool to address this limitation. We discuss several computational tools, which enable deconvolution of genomic and transcriptomic data from heterogeneous samples. We also performed a systematic comparative assessment of these tools. If properly used, these tools have potentials to complement single-cell genomics and immunoFISH analyses, and provide novel insights into tumor heterogeneity.
Collapse
|
41
|
Margolin G, Khil PP, Kim J, Bellani MA, Camerini-Otero RD. Integrated transcriptome analysis of mouse spermatogenesis. BMC Genomics 2014; 15:39. [PMID: 24438502 PMCID: PMC3906902 DOI: 10.1186/1471-2164-15-39] [Citation(s) in RCA: 98] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 01/14/2014] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Differentiation of primordial germ cells into mature spermatozoa proceeds through multiple stages, one of the most important of which is meiosis. Meiotic recombination is in turn a key part of meiosis. To achieve the highly specialized and diverse functions necessary for the successful completion of meiosis and the generation of spermatozoa thousands of genes are coordinately regulated through spermatogenesis. A complete and unbiased characterization of the transcriptome dynamics of spermatogenesis is, however, still lacking. RESULTS In order to characterize gene expression during spermatogenesis we sequenced eight mRNA samples from testes of juvenile mice from 6 to 38 days post partum. Using gene expression clustering we defined over 1,000 novel meiotically-expressed genes. We also developed a computational de-convolution approach and used it to estimate cell type-specific gene expression in pre-meiotic, meiotic and post-meiotic cells. In addition, we detected 13,000 novel alternative splicing events around 40% of which preserve an open reading frame, and found experimental support for 159 computational gene predictions. A comparison of RNA polymerase II (Pol II) ChIP-Seq signals with RNA-Seq coverage shows that gene expression correlates well with Pol II signals, both at promoters and along the gene body. However, we observe numerous instances of non-canonical promoter usage, as well as intergenic Pol II peaks that potentially delineate unannotated promoters, enhancers or small RNA clusters. CONCLUSIONS Here we provide a comprehensive analysis of gene expression throughout mouse meiosis and spermatogenesis. Importantly, we find over a thousand of novel meiotic genes and over 5,000 novel potentially coding isoforms. These data should be a valuable resource for future studies of meiosis and spermatogenesis in mammals.
Collapse
Affiliation(s)
- Gennady Margolin
- Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Building 5, Room 205A, Bethesda, MD 20892, USA
| | - Pavel P Khil
- Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Building 5, Room 205A, Bethesda, MD 20892, USA
| | - Joongbaek Kim
- Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Building 5, Room 205A, Bethesda, MD 20892, USA
| | - Marina A Bellani
- National Institute of Aging, National Institutes of Health (NIH), Baltimore, MD 21224, USA
| | - R Daniel Camerini-Otero
- Genetics and Biochemistry Branch, National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institutes of Health (NIH), Building 5, Room 205A, Bethesda, MD 20892, USA
| |
Collapse
|
42
|
Andrade-Navarro MA, Kanji F, Palii CG, Brand M, Atkins H, Perez-Iratxeta C. A method for cell type marker discovery by high-throughput gene expression analysis of mixed cell populations. BMC Biotechnol 2013; 13:80. [PMID: 24090206 PMCID: PMC3853712 DOI: 10.1186/1472-6750-13-80] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Accepted: 09/25/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene transcripts specifically expressed in a particular cell type (cell-type specific gene markers) are useful for its detection and isolation from a tissue or other cell mixtures. However, finding informative marker genes can be problematic when working with a poorly characterized cell type, as markers can only be unequivocally determined once the cell type has been isolated. We propose a method that could identify marker genes of an uncharacterized cell type within a mixed cell population, provided that the proportion of the cell type of interest in the mixture can be estimated by some indirect method, such as a functional assay. RESULTS We show that cell-type specific gene markers can be identified from the global gene expression of several cell mixtures that contain the cell type of interest in a known proportion by their high correlation to the concentration of the corresponding cell type across the mixtures. CONCLUSIONS Genes detected using this high-throughput strategy would be candidate markers that may be useful in detecting or purifying a cell type from a particular biological context. We present an experimental proof-of-concept of this method using cell mixtures of various well-characterized hematopoietic cell types, and we evaluate the performance of the method in a benchmark that explores the requirements and range of validity of the approach.
Collapse
|
43
|
Liebner DA, Huang K, Parvin JD. MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. ACTA ACUST UNITED AC 2013; 30:682-9. [PMID: 24085566 DOI: 10.1093/bioinformatics/btt566] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
BACKGROUND One of the significant obstacles in the development of clinically relevant microarray-derived biomarkers and classifiers is tissue heterogeneity. Physical cell separation techniques, such as cell sorting and laser-capture microdissection, can enrich samples for cell types of interest, but are costly, labor intensive and can limit investigation of important interactions between different cell types. RESULTS We developed a new computational approach, called microarray microdissection with analysis of differences (MMAD), which performs microdissection in silico. Notably, MMAD (i) allows for simultaneous estimation of cell fractions and gene expression profiles of contributing cell types, (ii) adjusts for microarray normalization bias, (iii) uses the corrected Akaike information criterion during model optimization to minimize overfitting and (iv) provides mechanisms for comparing gene expression and cell fractions between samples in different classes. Computational microdissection of simulated and experimental tissue mixture datasets showed tight correlations between predicted and measured gene expression of pure tissues as well as tight correlations between reported and estimated cell fraction for each of the individual cell types. In simulation studies, MMAD showed superior ability to detect differentially expressed genes in mixed tissue samples when compared with standard metrics, including both significance analysis of microarrays and cell type-specific significance analysis of microarrays. CONCLUSIONS We have developed a new computational tool called MMAD, which is capable of performing robust tissue microdissection in silico, and which can improve the detection of differentially expressed genes. MMAD software as implemented in MATLAB is publically available for download at http://sourceforge.net/projects/mmad/.
Collapse
Affiliation(s)
- David A Liebner
- Division of Medical Oncology, Department of Internal Medicine, Department of Biomedical Informatics and Comprehensive Cancer Center, Biomedical Informatics Shared Resource, The Ohio State University, Columbus OH 43210, USA
| | | | | |
Collapse
|
44
|
A self-directed method for cell-type identification and separation of gene expression microarrays. PLoS Comput Biol 2013; 9:e1003189. [PMID: 23990767 PMCID: PMC3749952 DOI: 10.1371/journal.pcbi.1003189] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 07/07/2013] [Indexed: 11/19/2022] Open
Abstract
Gene expression analysis is generally performed on heterogeneous tissue samples consisting of multiple cell types. Current methods developed to separate heterogeneous gene expression rely on prior knowledge of the cell-type composition and/or signatures - these are not available in most public datasets. We present a novel method to identify the cell-type composition, signatures and proportions per sample without need for a-priori information. The method was successfully tested on controlled and semi-controlled datasets and performed as accurately as current methods that do require additional information. As such, this method enables the analysis of cell-type specific gene expression using existing large pools of publically available microarray datasets. Gene expression microarrays are widely used to uncover biological insights. Most microarray experiments profile whole tissues containing mixtures of multiple cell-types. As such, gene expression differences between samples may be due to different cellular compositions or biological differences, highly limiting the conclusions derived from the analysis. All current approaches to computationally separate the heterogeneous gene expression to individual cell-types require that the identity, relative amount of the cell-types in the tissue or their individual gene expression are known. Publically available microarray-based datasets, which include thousands of patient samples, do not usually measure this information, rendering existing separation methods unusable. We developed a novel approach to estimate the number of cell-types, identities, individual gene expression and relative proportions in heterogeneous tissues with no a-priori information except for an initial estimate of the cell-types in the tissue analyzed and general reference signatures of these cell-types that may be easily obtained from public databases. We successfully applied our method to microarray datasets, yielding highly accurate estimations, which often exceed the performance of separation methods that require prior information. Thus, our method can be accurately applied to any heterogeneous dataset, where re-examination and analysis of the individual cell-types in the heterogeneous tissue can aid in discovering new aspects regarding these diseases.
Collapse
|
45
|
Li S, Nakaya HI, Kazmin DA, Oh JZ, Pulendran B. Systems biological approaches to measure and understand vaccine immunity in humans. Semin Immunol 2013; 25:209-18. [PMID: 23796714 DOI: 10.1016/j.smim.2013.05.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2013] [Accepted: 05/09/2013] [Indexed: 02/01/2023]
Abstract
Recent studies have demonstrated the utility of using systems approaches to identify molecular signatures that can be used to predict vaccine immunity in humans. Such approaches are now being used extensively in vaccinology, and are beginning to yield novel insights about the molecular networks driving vaccine immunity. In this review, we present a broad review of the methodologies involved in these studies, and discuss the promise and challenges involved in this emerging field of "systems vaccinology."
Collapse
Affiliation(s)
- Shuzhao Li
- Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road, Atlanta, GA 30329, USA
| | | | | | | | | |
Collapse
|
46
|
Bradford JR, Farren M, Powell SJ, Runswick S, Weston SL, Brown H, Delpuech O, Wappett M, Smith NR, Carr TH, Dry JR, Gibson NJ, Barry ST. RNA-Seq Differentiates Tumour and Host mRNA Expression Changes Induced by Treatment of Human Tumour Xenografts with the VEGFR Tyrosine Kinase Inhibitor Cediranib. PLoS One 2013; 8:e66003. [PMID: 23840389 PMCID: PMC3686868 DOI: 10.1371/journal.pone.0066003] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Accepted: 04/30/2013] [Indexed: 12/30/2022] Open
Abstract
Pre-clinical models of tumour biology often rely on propagating human tumour cells in a mouse. In order to gain insight into the alignment of these models to human disease segments or investigate the effects of different therapeutics, approaches such as PCR or array based expression profiling are often employed despite suffering from biased transcript coverage, and a requirement for specialist experimental protocols to separate tumour and host signals. Here, we describe a computational strategy to profile transcript expression in both the tumour and host compartments of pre-clinical xenograft models from the same RNA sample using RNA-Seq. Key to this strategy is a species-specific mapping approach that removes the need for manipulation of the RNA population, customised sequencing protocols, or prior knowledge of the species component ratio. The method demonstrates comparable performance to species-specific RT-qPCR and a standard microarray platform, and allowed us to quantify gene expression changes in both the tumour and host tissue following treatment with cediranib, a potent vascular endothelial growth factor receptor tyrosine kinase inhibitor, including the reduction of multiple murine transcripts associated with endothelium or vessels, and an increase in genes associated with the inflammatory response in response to cediranib. In the human compartment, we observed a robust induction of hypoxia genes and a reduction in cell cycle associated transcripts. In conclusion, the study establishes that RNA-Seq can be applied to pre-clinical models to gain deeper understanding of model characteristics and compound mechanism of action, and to identify both tumour and host biomarkers.
Collapse
Affiliation(s)
- James R. Bradford
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
- * E-mail:
| | - Matthew Farren
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Steve J. Powell
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Sarah Runswick
- Personalised Healthcare and Biomarkers, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Susie L. Weston
- Personalised Healthcare and Biomarkers, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Helen Brown
- Personalised Healthcare and Biomarkers, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Oona Delpuech
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Mark Wappett
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Neil R. Smith
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - T. Hedley Carr
- Personalised Healthcare and Biomarkers, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Jonathan R. Dry
- Oncology, AstraZeneca Pharmaceuticals, Gatehouse Park, Massachusetts, United States of America
| | - Neil J. Gibson
- Personalised Healthcare and Biomarkers, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| | - Simon T. Barry
- Oncology, AstraZeneca Pharmaceuticals, Alderley Park, Cheshire, United Kingdom
| |
Collapse
|
47
|
Abstract
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
48
|
Ahn J, Yuan Y, Parmigiani G, Suraokar MB, Diao L, Wistuba II, Wang W. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. ACTA ACUST UNITED AC 2013; 29:1865-71. [PMID: 23712657 DOI: 10.1093/bioinformatics/btt301] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Tissue samples of tumor cells mixed with stromal cells cause underdetection of gene expression signatures associated with cancer prognosis or response to treatment. In silico dissection of mixed cell samples is essential for analyzing expression data generated in cancer studies. Currently, a systematic approach is lacking to address three challenges in computational deconvolution: (i) violation of linear addition of expression levels from multiple tissues when log-transformed microarray data are used; (ii) estimation of both tumor proportion and tumor-specific expression, when neither is known a priori; and (iii) estimation of expression profiles for individual patients. RESULTS We have developed a statistical method for deconvolving mixed cancer transcriptomes, DeMix, which addresses the aforementioned issues in array-based expression data. We demonstrate the performance of our model in synthetic and real, publicly available, datasets. DeMix can be applied to ongoing biomarker-based clinical studies and to the vast expression datasets previously generated from mixed tumor and stromal cell samples. AVAILABILITY All codes are written in C and integrated into an R function, which is available at http://odin.mdacc.tmc.edu/∼wwang7/DeMix.html. CONTACT wwang7@mdanderson.org SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jaeil Ahn
- Department of Bioinformatics and Computational Biology and Department of Biostatistics, The University of Texas, MD Anderson Cancer Center, Houston, TX 77030, USA
| | | | | | | | | | | | | |
Collapse
|
49
|
Seo JH, Li Q, Fatima A, Eklund A, Szallasi Z, Polyak K, Richardson AL, Freedman ML. Deconvoluting complex tissues for expression quantitative trait locus-based analyses. Philos Trans R Soc Lond B Biol Sci 2013; 368:20120363. [PMID: 23650637 DOI: 10.1098/rstb.2012.0363] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Breast cancer genome-wide association studies have pinpointed dozens of variants associated with breast cancer pathogenesis. The majority of risk variants, however, are located outside of known protein-coding regions. Therefore, identifying which genes the risk variants are acting through presents an important challenge. Variants that are associated with mRNA transcript levels are referred to as expression quantitative trait loci (eQTLs). Many studies have demonstrated that eQTL-based strategies provide a direct way to connect a trait-associated locus with its candidate target gene. Performing eQTL-based analyses in human samples is complicated because of the heterogeneous nature of human tissue. We addressed this issue by devising a method to computationally infer the fraction of cell types in normal human breast tissues. We then applied this method to 13 known breast cancer risk loci, which we hypothesized were eQTLs. For each risk locus, we took all known transcripts within a 2 Mb interval and performed an eQTL analysis in 100 reduction mammoplasty cases. A total of 18 significant associations were discovered (eight in the epithelial compartment and 10 in the stromal compartment). This study highlights the ability to perform large-scale eQTL studies in heterogeneous tissues.
Collapse
Affiliation(s)
- Ji-Heui Seo
- Department of Medical Oncology, The Center for Functional Cancer Epigenetics, Dana Farber Cancer Institute, Boston, MA 02215, USA
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Lambrou GI, Koultouki E, Adamaki M, Moschovi M. Resolving Sample Traces in Complex Mixtures with Microarray Analyses. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
This chapter reviews the microarray technology and deal with the majority of aspects regarding microarrays. It focuses on today’s knowledge of separation techniques and methodologies of complex signal, i.e. samples. Overall, the chapter reviews the current knowledge on the topic of microarrays and presents the analyses and techniques used, which facilitate such approaches. It starts with the theoretical framework on microarray technology; second, the chapter gives a brief review on statistical methods used for microarray analyses, and finally, it contains a detailed review of the methods used for discriminating traces of nucleic acids within a complex mixture of samples.
Collapse
|