1
|
Wang C, Lin Y, Li S, Guan J. Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data. BMC Genomics 2024; 25:875. [PMID: 39294558 PMCID: PMC11409548 DOI: 10.1186/s12864-024-10728-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/20/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential. RESULTS We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies. CONCLUSIONS DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.
Collapse
Affiliation(s)
- Chenqi Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Yifan Lin
- Department of Automation, Xiamen University, Xiamen, China
| | - Shuchao Li
- Department of Automation, Xiamen University, Xiamen, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, China.
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.
| |
Collapse
|
2
|
Vathrakokoili Pournara A, Miao Z, Beker OY, Nolte N, Brazma A, Papatheodorou I. CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues. BIOINFORMATICS ADVANCES 2024; 4:vbae048. [PMID: 38638280 PMCID: PMC11023940 DOI: 10.1093/bioadv/vbae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/21/2024] [Indexed: 04/20/2024]
Abstract
Motivation Cell-type deconvolution methods aim to infer cell composition from bulk transcriptomic data. The proliferation of developed methods coupled with inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods. Additionally, the growing accessibility of single-cell RNA sequencing datasets, often accompanied by bulk expression from related samples enable the benchmark of existing methods. Results In this study, we conduct a comprehensive assessment of 31 methods, utilizing single-cell RNA-sequencing data from diverse human and mouse tissues. Employing various simulation scenarios, we reveal the efficacy of regression-based deconvolution methods, highlighting their sensitivity to reference choices. We investigate the impact of bulk-reference differences, incorporating variables such as sample, study and technology. We provide validation using a gold standard dataset from mononuclear cells and suggest a consensus prediction of proportions when ground truth is not available. We validated the consensus method on data from the stomach and studied its spillover effect. Importantly, we propose the use of the critical assessment of transcriptomic deconvolution (CATD) pipeline which encompasses functionalities for generating references and pseudo-bulks and running implemented deconvolution methods. CATD streamlines simultaneous deconvolution of numerous bulk samples, providing a practical solution for speeding up the evaluation of newly developed methods. Availability and implementation https://github.com/Papatheodorou-Group/CATD_snakemake.
Collapse
Affiliation(s)
- Anna Vathrakokoili Pournara
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Zhichao Miao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Open Targets, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- GMU-GIBH Joint School of Life Sciences, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, 511436, China
| | - Ozgur Yilimaz Beker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla 34956, Turkey
| | - Nadja Nolte
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, 121-1000, Slovenia
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Open Targets, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
| |
Collapse
|
3
|
Guo X, Huang Z, Ju F, Zhao C, Yu L. Highly Accurate Estimation of Cell Type Abundance in Bulk Tissues Based on Single-Cell Reference and Domain Adaptive Matching. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2306329. [PMID: 38072669 PMCID: PMC10870031 DOI: 10.1002/advs.202306329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/27/2023] [Indexed: 02/17/2024]
Abstract
Accurately identifies the cellular composition of complex tissues, which is critical for understanding disease pathogenesis, early diagnosis, and prevention. However, current methods for deconvoluting bulk RNA sequencing (RNA-seq) typically rely on matched single-cell RNA sequencing (scRNA-seq) as a reference, which can be limiting due to differences in sequencing distribution and the potential for invalid information from single-cell references. Hence, a novel computational method named SCROAM is introduced to address these challenges. SCROAM transforms scRNA-seq and bulk RNA-seq into a shared feature space, effectively eliminating distributional differences in the latent space. Subsequently, cell-type-specific expression matrices are generated from the scRNA-seq data, facilitating the precise identification of cell types within bulk tissues. The performance of SCROAM is assessed through benchmarking against simulated and real datasets, demonstrating its accuracy and robustness. To further validate SCROAM's performance, single-cell and bulk RNA-seq experiments are conducted on mouse spinal cord tissue, with SCROAM applied to identify cell types in bulk tissue. Results indicate that SCROAM is a highly effective tool for identifying similar cell types. An integrated analysis of liver cancer and primary glioblastoma is then performed. Overall, this research offers a novel perspective for delivering precise insights into disease pathogenesis and potential therapeutic strategies.
Collapse
Affiliation(s)
- Xinyang Guo
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Zhaoyang Huang
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Fen Ju
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Chenguang Zhao
- Department of Rehabilitation MedicineXijing HospitalFourth Military Medical UniversityXi'an710032China
| | - Liang Yu
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| |
Collapse
|
4
|
Zhang H, Yu X, Ye J, Li H, Hu J, Tan Y, Fang Y, Akbay E, Yu F, Weng C, Sankaran VG, Bachoo RM, Maher E, Minna J, Zhang A, Li B. Systematic investigation of mitochondrial transfer between cancer cells and T cells at single-cell resolution. Cancer Cell 2023; 41:1788-1802.e10. [PMID: 37816332 PMCID: PMC10568073 DOI: 10.1016/j.ccell.2023.09.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 06/27/2023] [Accepted: 09/05/2023] [Indexed: 10/12/2023]
Abstract
Mitochondria (MT) participate in most metabolic activities of mammalian cells. A near-unidirectional mitochondrial transfer from T cells to cancer cells was recently observed to "metabolically empower" cancer cells while "depleting immune cells," providing new insights into tumor-T cell interaction and immune evasion. Here, we leverage single-cell RNA-seq technology and introduce MERCI, a statistical deconvolution method for tracing and quantifying mitochondrial trafficking between cancer and T cells. Through rigorous benchmarking and validation, MERCI accurately predicts the recipient cells and their relative mitochondrial compositions. Application of MERCI to human cancer samples identifies a reproducible MT transfer phenotype, with its signature genes involved in cytoskeleton remodeling, energy production, and TNF-α signaling pathways. Moreover, MT transfer is associated with increased cell cycle activity and poor clinical outcome across different cancer types. In summary, MERCI enables systematic investigation of an understudied aspect of tumor-T cell interactions that may lead to the development of therapeutic opportunities.
Collapse
Affiliation(s)
- Hongyi Zhang
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA; Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Xuexin Yu
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA; Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jianfeng Ye
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Huiyu Li
- Hamon Center for Therapeutic Oncology Research, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Jing Hu
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA; Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuhao Tan
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yan Fang
- Department of Molecular Biology, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Esra Akbay
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Fulong Yu
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Chen Weng
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Vijay G Sankaran
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Robert M Bachoo
- Department of Neurology and Neurotherapeutics, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Elizabeth Maher
- Department of Neurology and Neurotherapeutics, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - John Minna
- Hamon Center for Therapeutic Oncology Research, UT Southwestern Medical Center, Dallas, TX 75390, USA
| | - Anli Zhang
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA.
| | - Bo Li
- Lyda Hill Department of Bioinformatics, UT Southwestern Medical Center, Dallas, TX 75390, USA; Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
5
|
Tran KA, Addala V, Johnston RL, Lovell D, Bradley A, Koufariotis LT, Wood S, Wu SZ, Roden D, Al-Eryani G, Swarbrick A, Williams ED, Pearson JV, Kondrashova O, Waddell N. Performance of tumour microenvironment deconvolution methods in breast cancer using single-cell simulated bulk mixtures. Nat Commun 2023; 14:5758. [PMID: 37717006 PMCID: PMC10505141 DOI: 10.1038/s41467-023-41385-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
Cells within the tumour microenvironment (TME) can impact tumour development and influence treatment response. Computational approaches have been developed to deconvolve the TME from bulk RNA-seq. Using scRNA-seq profiling from breast tumours we simulate thousands of bulk mixtures, representing tumour purities and cell lineages, to compare the performance of nine TME deconvolution methods (BayesPrism, Scaden, CIBERSORTx, MuSiC, DWLS, hspe, CPM, Bisque, and EPIC). Some methods are more robust in deconvolving mixtures with high tumour purity levels. Most methods tend to mis-predict normal epithelial for cancer epithelial as tumour purity increases, a finding that is validated in two independent datasets. The breast cancer molecular subtype influences this mis-prediction. BayesPrism and DWLS have the lowest combined numbers of false positives and false negatives, and have the best performance when deconvolving granular immune lineages. Our findings highlight the need for more single-cell characterisation of rarer cell types, and suggest that tumour cell compositions should be considered when deconvolving the TME.
Collapse
Affiliation(s)
- Khoa A Tran
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
| | - Venkateswar Addala
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Rebecca L Johnston
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - David Lovell
- School of Computer Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- QUT Centre for Data Science, Brisbane, QLD, 4000, Australia
| | - Andrew Bradley
- Faculty of Engineering, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Lambros T Koufariotis
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Scott Wood
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Sunny Z Wu
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Daniel Roden
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Ghamdan Al-Eryani
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Alexander Swarbrick
- Cancer Ecosystems Program, Garvan Institute of Medical Research, Darlinghurst, NSW, 2010, Australia
- School of Clinical Medicine, Faculty of Medicine and Health, UNSW Sydney, Kensington, NSW, 2052, Australia
| | - Elizabeth D Williams
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia
- Australian Prostate Cancer Research Centre - Queensland (APCRC-Q) and Queensland Bladder Cancer Initiative (QBCI), Brisbane, QLD, 4000, Australia
| | - John V Pearson
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Olga Kondrashova
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia
| | - Nicola Waddell
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia.
- School of Biomedical Sciences, Queensland University of Technology (QUT), Brisbane, QLD, 4000, Australia.
| |
Collapse
|
6
|
Chiu Y, Ni C, Huang Y. Deconvolution of bulk gene expression profiles reveals the association between immune cell polarization and the prognosis of hepatocellular carcinoma patients. Cancer Med 2023; 12:15736-15760. [PMID: 37366298 PMCID: PMC10417088 DOI: 10.1002/cam4.6197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/02/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Many studies have utilized computational methods, including cell composition deconvolution (CCD), to correlate immune cell polarizations with the survival of cancer patients, including those with hepatocellular carcinoma (HCC). However, currently available cell deconvolution estimated (CDE) tools do not cover the wide range of immune cell changes that are known to influence tumor progression. RESULTS A new CCD tool, HCCImm, was designed to estimate the abundance of tumor cells and 16 immune cell types in the bulk gene expression profiles of HCC samples. HCCImm was validated using real datasets derived from human peripheral blood mononuclear cells (PBMCs) and HCC tissue samples, demonstrating that HCCImm outperforms other CCD tools. We used HCCImm to analyze the bulk RNA-seq datasets of The Cancer Genome Atlas (TCGA)-liver hepatocellular carcinoma (LIHC) samples. We found that the proportions of memory CD8+ T cells and Tregs were negatively associated with patient overall survival (OS). Furthermore, the proportion of naïve CD8+ T cells was positively associated with patient OS. In addition, the TCGA-LIHC samples with a high tumor mutational burden had a significantly high abundance of nonmacrophage leukocytes. CONCLUSIONS HCCImm was equipped with a new set of reference gene expression profiles that allowed for a more robust analysis of HCC patient expression data. The source code is provided at https://github.com/holiday01/HCCImm.
Collapse
Affiliation(s)
- Yen‐Jung Chiu
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Department of Biomedical EngineeringMing Chuan UniversityTaoyuanTaiwan
| | - Chung‐En Ni
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Yen‐Hua Huang
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Center for Systems and Synthetic BiologyNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| |
Collapse
|
7
|
Kalatskaya I, Giovannoni G, Leist T, Cerra J, Boschert U, Rolfe PA. Revealing the immune cell subtype reconstitution profile in patients from the CLARITY study using deconvolution algorithms after cladribine tablets treatment. Sci Rep 2023; 13:8067. [PMID: 37202447 DOI: 10.1038/s41598-023-34384-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 04/28/2023] [Indexed: 05/20/2023] Open
Abstract
Immune Cell Deconvolution methods utilizing gene expression profiling to quantify immune cells in tissues and blood are an appealing alternative to flow cytometry. Our objective was to investigate the applicability of deconvolution approaches in clinical trial settings to better investigate the mode of action of drugs for autoimmune diseases. Popular deconvolution methods CIBERSORT and xCell were validated using gene expression from the publicly available GSE93777 dataset that has comprehensive matching flow cytometry. As shown in the online tool, ~ 50% of signatures show strong correlation (r > 0.5) with the remainder showing moderate correlation, or in a few cases, no correlation. Deconvolution methods were then applied to gene expression data from the phase III CLARITY study (NCT00213135) to evaluate the immune cell profile of relapsing multiple sclerosis patients treated with cladribine tablets. At 96 weeks after treatment, deconvolution scores showed the following changes vs placebo: naïve, mature, memory CD4+ and CD8+ T cells, non-class switched, and class switched memory B cells and plasmablasts were significantly reduced, naïve B cells and M2 macrophages were more abundant. Results confirm previously described changes in immune cell composition following cladribine tablets treatment and reveal immune homeostasis of pro- vs anti-inflammatory immune cell subtypes, potentially supporting long-term efficacy.
Collapse
Affiliation(s)
- Irina Kalatskaya
- EMD Serono Research & Development Institute, Inc. (an affiliate of Merck KGaA), 45 Middlesex Turnpike, Billerica, MA, 01821, USA.
| | - Gavin Giovannoni
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Thomas Leist
- Division of Clinical Neuroimmunology, Jefferson University, Comprehensive MS Center, Philadelphia, PA, USA
| | - Joseph Cerra
- EMD Serono Research & Development Institute, Inc. (an affiliate of Merck KGaA), 45 Middlesex Turnpike, Billerica, MA, 01821, USA
- BISC Global, Boston, MA, USA
| | - Ursula Boschert
- Ares Trading S.A. (an affiliate of Merck KGaA), Eysins, Switzerland
| | - P Alexander Rolfe
- EMD Serono Research & Development Institute, Inc. (an affiliate of Merck KGaA), 45 Middlesex Turnpike, Billerica, MA, 01821, USA
| |
Collapse
|
8
|
Deng W, Li B, Wang J, Jiang W, Yan X, Li N, Vukmirovic M, Kaminski N, Wang J, Zhao H. A novel Bayesian framework for harmonizing information across tissues and studies to increase cell type deconvolution accuracy. Brief Bioinform 2023; 24:bbac616. [PMID: 36631398 PMCID: PMC9851324 DOI: 10.1093/bib/bbac616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/28/2022] [Accepted: 12/14/2022] [Indexed: 01/13/2023] Open
Abstract
Computational cell type deconvolution on bulk transcriptomics data can reveal cell type proportion heterogeneity across samples. One critical factor for accurate deconvolution is the reference signature matrix for different cell types. Compared with inferring reference signature matrices from cell lines, rapidly accumulating single-cell RNA-sequencing (scRNA-seq) data provide a richer and less biased resource. However, deriving cell type signature from scRNA-seq data is challenging due to high biological and technical noises. In this article, we introduce a novel Bayesian framework, tranSig, to improve signature matrix inference from scRNA-seq by leveraging shared cell type-specific expression patterns across different tissues and studies. Our simulations show that tranSig is robust to the number of signature genes and tissues specified in the model. Applications of tranSig to bulk RNA sequencing data from peripheral blood, bronchoalveolar lavage and aorta demonstrate its accuracy and power to characterize biological heterogeneity across groups. In summary, tranSig offers an accurate and robust approach to defining gene expression signatures of different cell types, facilitating improved in silico cell type deconvolutions.
Collapse
Affiliation(s)
- Wenxuan Deng
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Bolun Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Jiawei Wang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| | - Xiting Yan
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Ningshan Li
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Milica Vukmirovic
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
- Leslie Dan Faculty of Pharmacy, University of Toronto, 144 College St., ON, Canada
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Jing Wang
- State Key Laboratory of Medical Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Department of Pathophysiology, Peking Union Medical College, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT, USA
| |
Collapse
|
9
|
Teh RQ, Liu GS, Wang JH. Bioinformatics Tools for Bulk Gene Expression Deconvolution in Diabetic Retinopathy. Methods Mol Biol 2023; 2678:107-115. [PMID: 37326707 DOI: 10.1007/978-1-0716-3255-0_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Retinal neovascularization is one of the leading causes of vision loss and a hallmark of proliferative diabetic retinopathy (PDR). The immune system is observed to be involved in the pathogenesis of diabetic retinopathy (DR). The specific immune cell type that contributes to retinal neovascularization can be identified via a bioinformatics analysis of RNA sequencing (RNA-seq) data, known as deconvolution analysis. Previous study has identified the infiltration of macrophages in the retina of rats with hypoxia-induced retinal neovascularization and patients with PDR through a deconvolution algorithm, known as CIBERSORTx. Here, we describe the protocols of using CIBERSORTx to perform the deconvolution analysis and downstream analysis of RNA-seq data.
Collapse
Affiliation(s)
- Ru Qi Teh
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia
| | - Guei-Sheung Liu
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia.
- Ophthalmology, Department of Surgery, University of Melbourne, East Melbourne, VIC, Australia.
| | - Jiang-Hui Wang
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, Australia.
| |
Collapse
|
10
|
Hasanaj E, Alavi A, Gupta A, Póczos B, Bar-Joseph Z. Multiset multicover methods for discriminative marker selection. CELL REPORTS METHODS 2022; 2:100332. [PMID: 36452867 PMCID: PMC9701606 DOI: 10.1016/j.crmeth.2022.100332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 08/12/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Markers are increasingly being used for several high-throughput data analysis and experimental design tasks. Examples include the use of markers for assigning cell types in scRNA-seq studies, for deconvolving bulk gene expression data, and for selecting marker proteins in single-cell spatial proteomics studies. Most marker selection methods focus on differential expression (DE) analysis. Although such methods work well for data with a few non-overlapping marker sets, they are not appropriate for large atlas-size datasets where several cell types and tissues are considered. To address this, we define the phenotype cover (PC) problem for marker selection and present algorithms that can improve the discriminative power of marker sets. Analysis of these sets on several marker-selection tasks suggests that these methods can lead to solutions that accurately distinguish different phenotypes in the data.
Collapse
Affiliation(s)
- Euxhen Hasanaj
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Amir Alavi
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Anupam Gupta
- Computer Science Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Barnabás Póczos
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ziv Bar-Joseph
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
11
|
Anene CA, Taggart E, Harwood CA, Pennington DJ, Wang J. Decosus: An R Framework for Universal Integration of Cell Proportion Estimation Methods. Front Genet 2022; 13:802838. [PMID: 35432466 PMCID: PMC9011041 DOI: 10.3389/fgene.2022.802838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 03/04/2022] [Indexed: 12/26/2022] Open
Abstract
The assessment of the cellular heterogeneity and abundance in bulk tissue samples is essential for characterising cellular and organismal states. Computational approaches to estimate cellular abundance from bulk RNA-Seq datasets have variable performances, often requiring benchmarking matrices to select the best performing methods for individual studies. However, such benchmarking investigations are difficult to perform and assess in typical applications because of the absence of gold standard/ground-truth cellular measurements. Here we describe Decosus, an R package that integrates seven methods and signatures for deconvoluting cell types from gene expression profiles (GEP). Benchmark analysis on a range of datasets with ground-truth measurements revealed that our integrated estimates consistently exhibited stable performances across datasets than individual methods and signatures. We further applied Decosus to characterise the immune compartment of skin samples in different settings, confirming the well-established Th1 and Th2 polarisation in psoriasis and atopic dermatitis, respectively. Secondly, we revealed immune system-related UV-induced changes in sun-exposed skin. Furthermore, a significant motivation in the design of Decosus is flexibility and the ability for the user to include new gene signatures, algorithms, and integration methods at run time.
Collapse
Affiliation(s)
- Chinedu A. Anene
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, United Kingdom
- Centre for Cancer Biology and Therapy, School of Applied Science, London South Bank University, London, United Kingdom
| | - Emma Taggart
- Centre for Immunobiology, Barts and the London School of Medicine, Blizard Institute, Queen Mary University of London, London, United Kingdom
| | - Catherine A. Harwood
- Centre for Cell Biology and Cutaneous Research, Barts and The London School of Medicine and Dentistry, Blizard Institute, Queen Mary University of London, London, United Kingdom
- Department of Dermatology, The Royal London Hospital, Barts Health NHS Trust, London, United Kingdom
| | - Daniel J. Pennington
- Centre for Immunobiology, Barts and the London School of Medicine, Blizard Institute, Queen Mary University of London, London, United Kingdom
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
12
|
Bunis DG, Wang W, Vallvé-Juanico J, Houshdaran S, Sen S, Ben Soltane I, Kosti I, Vo KC, Irwin JC, Giudice LC, Sirota M. Whole-Tissue Deconvolution and scRNAseq Analysis Identify Altered Endometrial Cellular Compositions and Functionality Associated With Endometriosis. Front Immunol 2022; 12:788315. [PMID: 35069565 PMCID: PMC8766492 DOI: 10.3389/fimmu.2021.788315] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
The uterine lining (endometrium) exhibits a pro-inflammatory phenotype in women with endometriosis, resulting in pain, infertility, and poor pregnancy outcomes. The full complement of cell types contributing to this phenotype has yet to be identified, as most studies have focused on bulk tissue or select cell populations. Herein, through integrating whole-tissue deconvolution and single-cell RNAseq, we comprehensively characterized immune and nonimmune cell types in the endometrium of women with or without disease and their dynamic changes across the menstrual cycle. We designed metrics to evaluate specificity of deconvolution signatures that resulted in single-cell identification of 13 novel signatures for immune cell subtypes in healthy endometrium. Guided by statistical metrics, we identified contributions of endometrial epithelial, endothelial, plasmacytoid dendritic cells, classical dendritic cells, monocytes, macrophages, and granulocytes to the endometrial pro-inflammatory phenotype, underscoring roles for nonimmune as well as immune cells to the dysfunctionality of this tissue.
Collapse
Affiliation(s)
- Daniel G. Bunis
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Wanxin Wang
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Júlia Vallvé-Juanico
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Sahar Houshdaran
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Sushmita Sen
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Isam Ben Soltane
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Idit Kosti
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Kim Chi Vo
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Juan C. Irwin
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Linda C. Giudice
- Center for Reproductive Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Marina Sirota
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, Division of Neonatology, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
13
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
14
|
Zhang W, Xu H, Qiao R, Zhong B, Zhang X, Gu J, Zhang X, Wei L, Wang X. ARIC: accurate and robust inference of cell type proportions from bulk gene expression or DNA methylation data. Brief Bioinform 2021; 23:6361035. [PMID: 34472588 DOI: 10.1093/bib/bbab362] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 08/13/2021] [Accepted: 08/16/2021] [Indexed: 11/12/2022] Open
Abstract
Quantifying cell proportions, especially for rare cell types in some scenarios, is of great value in tracking signals associated with certain phenotypes or diseases. Although some methods have been proposed to infer cell proportions from multicomponent bulk data, they are substantially less effective for estimating the proportions of rare cell types which are highly sensitive to feature outliers and collinearity. Here we proposed a new deconvolution algorithm named ARIC to estimate cell type proportions from gene expression or DNA methylation data. ARIC employs a novel two-step marker selection strategy, including collinear feature elimination based on the component-wise condition number and adaptive removal of outlier markers. This strategy can systematically obtain effective markers for weighted $\upsilon$-support vector regression to ensure a robust and precise rare proportion prediction. We showed that ARIC can accurately estimate fractions in both DNA methylation and gene expression data from different experiments. We further applied ARIC to the survival prediction of ovarian cancer and the condition monitoring of chronic kidney disease, and the results demonstrate the high accuracy and robustness as well as clinical potentials of ARIC. Taken together, ARIC is a promising tool to solve the deconvolution problem of bulk data where rare components are of vital importance.
Collapse
Affiliation(s)
- Wei Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Hanwen Xu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Rong Qiao
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Bixi Zhong
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xianglin Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Jin Gu
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Lei Wei
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| | - Xiaowo Wang
- Ministry of Education Key Laboratory of Bioinformatics; Center for Synthetic and Systems Biology; Bioinformatics Division, Beijing National Research Center for Information Science and Technology; Department of Automation, Tsinghua University, Beijing 100084, China
| |
Collapse
|
15
|
Le T, Aronow RA, Kirshtein A, Shahriyari L. A review of digital cytometry methods: estimating the relative abundance of cell types in a bulk of cells. Brief Bioinform 2021; 22:bbaa219. [PMID: 33003193 PMCID: PMC8293826 DOI: 10.1093/bib/bbaa219] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 08/15/2020] [Accepted: 08/17/2020] [Indexed: 01/20/2023] Open
Abstract
Due to the high cost of flow and mass cytometry, there has been a recent surge in the development of computational methods for estimating the relative distributions of cell types from the gene expression profile of a bulk of cells. Here, we review the five common 'digital cytometry' methods: deconvolution of RNA-Seq, cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT), CIBERSORTx, single sample gene set enrichment analysis and single-sample scoring of molecular phenotypes deconvolution method. The results show that CIBERSORTx B-mode, which uses batch correction to adjust the gene expression profile of the bulk of cells ('mixture data') to eliminate possible cross-platform variations between the mixture data and the gene expression data of single cells ('signature matrix'), outperforms other methods, especially when signature matrix and mixture data come from different platforms. However, in our tests, CIBERSORTx S-mode, which uses batch correction for adjusting the signature matrix instead of mixture data, did not perform better than the original CIBERSORT method, which does not use any batch correction method. This result suggests the need for further investigations into how to utilize batch correction in deconvolution methods.
Collapse
Affiliation(s)
- Trang Le
- University of Massachusetts Amherst
| | - Rachel A Aronow
- Department of Mathematics and Statistics at the University of Massachusetts Amherst
| | - Arkadz Kirshtein
- Department of Mathematics and Statistics at the University of Massachusetts Amherst
| | | |
Collapse
|
16
|
He H, Liyanarachchi S, Li W, Comiskey DF, Yan P, Bundschuh R, Turkoglu AM, Brock P, Ringel MD, de la Chapelle A. Transcriptome analysis discloses dysregulated genes in normal appearing tumor-adjacent thyroid tissues from patients with papillary thyroid carcinoma. Sci Rep 2021; 11:14126. [PMID: 34238982 PMCID: PMC8266864 DOI: 10.1038/s41598-021-93526-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 06/22/2021] [Indexed: 01/10/2023] Open
Abstract
Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. The molecular characteristics of histologically normal appearing tissue adjacent to the tumor (NAT) from PTC patients are not well characterized. The aim of this study was to characterize the global gene expression profile of NAT and compare it with those of normal and tumor thyroid tissues. We performed total RNA sequencing with fresh frozen thyroid tissues from a cohort of three categories of samples including NAT, normal thyroid (N), and PTC tumor (T). Transcriptome analysis shows that NAT presents a unique gene expression profile, which was not associated with sex or the presence of lymphocytic thyroiditis. Among the differentially expressed genes (DEGs) of NAT vs N, 256 coding genes and 5 noncoding genes have been reported as cancer genes involved in cell proliferation, apoptosis, and/or tumorigenesis. Bioinformatics analysis with Ingenuity Pathway Analysis software revealed that “Cancer, Organismal Injury and Abnormalities, Cellular Response to Therapeutics, and Cellular Movement” were major dysregulated pathways in the NAT tissues. This study provides improved insight into the complexity of gene expression changes in the thyroid glands of patients with PTC.
Collapse
Affiliation(s)
- Huiling He
- Department of Cancer Biology and Genetics, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Sandya Liyanarachchi
- Department of Cancer Biology and Genetics, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Wei Li
- Department of Cancer Biology and Genetics, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Daniel F Comiskey
- Department of Cancer Biology and Genetics, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Pearlly Yan
- Department of Internal Medicine, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Ralf Bundschuh
- Department of Internal Medicine, The Ohio State University, Columbus, OH, 43210, USA.,Department of Physics, The Ohio State University, Columbus, OH, 43210, USA.,Department of Chemistry and Biochemistry, The Ohio State University, Columbus, OH, 43210, USA
| | - Altan M Turkoglu
- Department of Physics, The Ohio State University, Columbus, OH, 43210, USA
| | - Pamela Brock
- Department of Internal Medicine, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| | - Matthew D Ringel
- Division of Endocrinology, Diabetes, and Metabolism, Department of Internal Medicine, The Ohio State University, Columbus, OH, 43210, USA. .,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA.
| | - Albert de la Chapelle
- Department of Cancer Biology and Genetics, The Ohio State University, Columbus, OH, 43210, USA.,The Ohio State University Comprehensive Cancer Center, The Ohio State University, McCampbell Hall South Room 565, 1581 Dodd Drive, Columbus, OH, 43210, USA
| |
Collapse
|
17
|
Doostparast Torshizi A, Duan J, Wang K. A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders. NAR Genom Bioinform 2021; 3:lqab056. [PMID: 34169279 PMCID: PMC8219045 DOI: 10.1093/nargab/lqab056] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/21/2021] [Indexed: 02/06/2023] Open
Abstract
The importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, most gene expression studies are conducted on bulk tissues, without examining cell type-specific expression profiles. Several computational methods are available for cell type deconvolution (i.e. inference of cellular composition) from bulk RNA-Seq data, but few of them impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq and population-wide expression profiles, it can be computationally tractable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations and uses a multi-variate stochastic search algorithm to estimate the cell type-specific expression profiles. Analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease and type 2 diabetes validated the efficiency of CellR, while revealing how specific cell types contribute to different diseases. In summary, CellR compares favorably against competing approaches, enabling cell type-specific re-analysis of gene expression data on bulk tissues in complex diseases.
Collapse
Affiliation(s)
- Abolfazl Doostparast Torshizi
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
18
|
Amrhein L, Fuchs C. stochprofML: stochastic profiling using maximum likelihood estimation in R. BMC Bioinformatics 2021; 22:123. [PMID: 33722188 PMCID: PMC7958472 DOI: 10.1186/s12859-021-03970-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. RESULTS We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm's performance in simulation studies and present further application opportunities. CONCLUSION Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.
Collapse
Affiliation(s)
- Lisa Amrhein
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Christiane Fuchs
- Institute of Computational Biology, Helmholtz Zentrum München, Ingolstädter Landstrasse 1, 85764 Neuherberg, Germany
- Department of Mathematics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
- Faculty of Business Administration and Economics, Bielefeld University, Universitätsstrasse 25, 33615 Bielefeld, Germany
| |
Collapse
|
19
|
Hunt GJ, Gagnon-Bartsch JA. The role of scale in the estimation of cell-type proportions. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1395] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Tai AS, Tseng GC, Hsieh WP. BayICE: A Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- An-Shun Tai
- Institute of Statistics, National Tsing Hua University
| | | | | |
Collapse
|
21
|
Jaakkola MK, Elo LL. Computational deconvolution to estimate cell type-specific gene expression from bulk data. NAR Genom Bioinform 2021; 3:lqaa110. [PMID: 33575652 PMCID: PMC7803005 DOI: 10.1093/nargab/lqaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/24/2022] Open
Abstract
Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.
Collapse
Affiliation(s)
- Maria K Jaakkola
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| |
Collapse
|
22
|
Chen Z, Wu A. Progress and challenge for computational quantification of tissue immune cells. Brief Bioinform 2021; 22:6065002. [PMID: 33401306 DOI: 10.1093/bib/bbaa358] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 10/23/2020] [Accepted: 11/07/2020] [Indexed: 12/28/2022] Open
Abstract
Tissue immune cells have long been recognized as important regulators for the maintenance of balance in the body system. Quantification of the abundance of different immune cells will provide enhanced understanding of the correlation between immune cells and normal or abnormal situations. Currently, computational methods to predict tissue immune cell compositions from bulk transcriptomes have been largely developed. Therefore, summarizing the advantages and disadvantages is appropriate. In addition, an examination of the challenges and possible solutions for these computational models will assist the development of this field. The common hypothesis of these models is that the expression of signature genes for immune cell types might represent the proportion of immune cells that contribute to the tissue transcriptome. In general, we grouped all reported tools into three groups, including reference-free, reference-based scoring and reference-based deconvolution methods. In this review, a summary of all the currently reported computational immune cell quantification tools and their applications, limitations, and perspectives are presented. Furthermore, some critical problems are found that have limited the performance and application of these models, including inadequate immune cell type, the collinearity problem, the impact of the tissue environment on the immune cell expression level, and the deficiency of standard datasets for model validation. To address these issues, tissue specific training datasets that include all known immune cells, a hierarchical computational framework, and benchmark datasets including both tissue expression profiles and the abundances of all the immune cells are proposed to further promote the development of this field.
Collapse
Affiliation(s)
- Ziyi Chen
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Jiangsu, Suzhou, China
| | - Aiping Wu
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Jiangsu, Suzhou, China
| |
Collapse
|
23
|
Qin Y, Zhang W, Sun X, Nan S, Wei N, Wu HJ, Zheng X. Deconvolution of heterogeneous tumor samples using partial reference signals. PLoS Comput Biol 2020; 16:e1008452. [PMID: 33253170 PMCID: PMC7728196 DOI: 10.1371/journal.pcbi.1008452] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/10/2020] [Accepted: 10/19/2020] [Indexed: 12/16/2022] Open
Abstract
Deconvolution of heterogeneous bulk tumor samples into distinct cellular populations is an important yet challenging problem, particularly when only partial references are available. A common approach to dealing with this problem is to deconvolve the mixed signals using available references and leverage the remaining signal as a new cell component. However, as indicated in our simulation, such an approach tends to over-estimate the proportions of known cell types and fails to detect novel cell types. Here, we propose PREDE, a partial reference-based deconvolution method using an iterative non-negative matrix factorization algorithm. Our method is verified to be effective in estimating cell proportions and expression profiles of unknown cell types based on simulated datasets at a variety of parameter settings. Applying our method to TCGA tumor samples, we found that proportions of pure cancer cells better indicate different subtypes of tumor samples. We also detected several cell types for each cancer type whose proportions successfully predicted patient survival. Our method makes a significant contribution to deconvolution of heterogeneous tumor samples and could be widely applied to varieties of high throughput bulk data. PREDE is implemented in R and is freely available from GitHub (https://xiaoqizheng.github.io/PREDE). Tumor tissues are mixtures of different cell types. Identification and quantification of constitutional cell types within tumor tissues are important tasks in cancer research. The problem can be readily solved using regression-based methods if reference signals are available. But in most clinical applications, only partial references are available, which significantly reduces the deconvolution accuracy of the existing regression-based methods. In this paper, we propose a partial-reference based deconvolution model, PREDE, integrating the non-negative matrix factorization framework with an iterative optimization strategy. We conducted comprehensive evaluations for PREDE using both simulation and real data analyses, demonstrating better performance of our method than other existing methods.
Collapse
Affiliation(s)
- Yufang Qin
- College of Information Technology, Shanghai Ocean University, Shanghai, China
- Key Laboratory of Fisheries Information Ministry of Agriculture, Shanghai, China
| | - Weiwei Zhang
- School of Science, East China University of Technology, Nanchang, Jiangxi, China
| | - Xiaoqiang Sun
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Siwei Nan
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Nana Wei
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | - Hua-Jun Wu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Xiaoqi Zheng
- Department of Mathematics, Shanghai Normal University, Shanghai, China
- * E-mail:
| |
Collapse
|
24
|
Li Z, Guo Z, Cheng Y, Jin P, Wu H. Robust partial reference-free cell composition estimation from tissue expression. Bioinformatics 2020; 36:3431-3438. [PMID: 32167531 DOI: 10.1093/bioinformatics/btaa184] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 03/05/2020] [Accepted: 03/10/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. RESULTS We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. AVAILABILITY AND IMPLEMENTATION The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. CONTACT ziyi.li@emory.edu or hao.wu@emory.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Zhenxing Guo
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Ying Cheng
- Institute of Biomedical Research, Yunnan University, Kunming, China
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
25
|
Groth EE, Weber M, Bahmer T, Pedersen F, Kirsten A, Börnigen D, Rabe KF, Watz H, Ammerpohl O, Goldmann T. Exploration of the sputum methylome and omics deconvolution by quadratic programming in molecular profiling of asthma and COPD: the road to sputum omics 2.0. Respir Res 2020; 21:274. [PMID: 33076907 PMCID: PMC7574293 DOI: 10.1186/s12931-020-01544-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 10/11/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND To date, most studies involving high-throughput analyses of sputum in asthma and COPD have focused on identifying transcriptomic signatures of disease. No whole-genome methylation analysis of sputum cells has been performed yet. In this context, the highly variable cellular composition of sputum has potential to confound the molecular analyses. METHODS Whole-genome transcription (Agilent Human 4 × 44 k array) and methylation (Illumina 450 k BeadChip) analyses were performed on sputum samples of 9 asthmatics, 10 healthy and 10 COPD subjects. RNA integrity was checked by capillary electrophoresis and used to correct in silico for bias conferred by RNA degradation during biobank sample storage. Estimates of cell type-specific molecular profiles were derived via regression by quadratic programming based on sputum differential cell counts. All analyses were conducted using the open-source R/Bioconductor software framework. RESULTS A linear regression step was found to perform well in removing RNA degradation-related bias among the main principal components of the gene expression data, increasing the number of genes detectable as differentially expressed in asthma and COPD sputa (compared to controls). We observed a strong influence of the cellular composition on the results of mixed-cell sputum analyses. Exemplarily, upregulated genes derived from mixed-cell data in asthma were dominated by genes predominantly expressed in eosinophils after deconvolution. The deconvolution, however, allowed to perform differential expression and methylation analyses on the level of individual cell types and, though we only analyzed a limited number of biological replicates, was found to provide good estimates compared to previously published data about gene expression in lung eosinophils in asthma. Analysis of the sputum methylome indicated presence of differential methylation in genomic regions of interest, e.g. mapping to a number of human leukocyte antigen (HLA) genes related to both major histocompatibility complex (MHC) class I and II molecules in asthma and COPD macrophages. Furthermore, we found the SMAD3 (SMAD family member 3) gene, among others, to lie within differentially methylated regions which has been previously reported in the context of asthma. CONCLUSIONS In this methodology-oriented study, we show that methylation profiling can be easily integrated into sputum analysis workflows and exhibits a strong potential to contribute to the profiling and understanding of pulmonary inflammation. Wherever RNA degradation is of concern, in silico correction can be effective in improving both sensitivity and specificity of downstream analyses. We suggest that deconvolution methods should be integrated in sputum omics analysis workflows whenever possible in order to facilitate the unbiased discovery and interpretation of molecular patterns of inflammation.
Collapse
Affiliation(s)
- Espen E Groth
- LungenClinic Grosshansdorf, Großhansdorf, Germany. .,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany. .,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany. .,Department of Oncology, Hematology and BMT with Section Pneumology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Melanie Weber
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Thomas Bahmer
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Frauke Pedersen
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Anne Kirsten
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Daniela Börnigen
- Bioinformatics Core Unit, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Klaus F Rabe
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany
| | - Henrik Watz
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Ole Ammerpohl
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Institute of Human Genetics, University Medical Center Ulm, Ulm, Germany
| | - Torsten Goldmann
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Research Center Borstel, Pathology, Borstel, Germany
| |
Collapse
|
26
|
Li Y, He X, Li Q, Lai H, Zhang H, Hu Z, Li Y, Huang S. EV-origin: Enumerating the tissue-cellular origin of circulating extracellular vesicles using exLR profile. Comput Struct Biotechnol J 2020; 18:2851-2859. [PMID: 33133426 PMCID: PMC7588739 DOI: 10.1016/j.csbj.2020.10.002] [Citation(s) in RCA: 75] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/29/2020] [Accepted: 10/02/2020] [Indexed: 02/07/2023] Open
Abstract
Extracellular vesicles (EVs) are complex ecosystems that can be derived from all body cells and circulated in the body fluids. Characterizing the tissue-cellular source contributing to circulating EVs provides biological information about the cell or tissue of origin and their functional states. However, the relative proportion of tissue-cellular origin of circulating EVs in body fluid has not been thoroughly characterized. Here, we developed an approach for digital EVs quantification, called EV-origin, that enables enumerating of EVs tissue-cellular source contribution from plasma extracellular vesicles long RNA sequencing profiles. EV-origin was constructed by the input matrix of gene expression signatures and robust deconvolution algorithm, collectively used to separate the relative proportions of each tissue or cell type of interest. EV-origin respectively predicted the relative enrichment of seven types of hemopoietic cells and sixteen solid tissue subsets from exLR-seq profile. Using the EV-origin approach, we depicted an integrated landscape of the traceability system of plasma EVs for healthy individuals. We also compared the heterogenous tissue-cellular source components from plasma EVs samples with diverse disease status. Notably, the aberrant liver fraction could reflect the development and progression of hepatic disease. The liver fraction could also serve as a diagnostic indicator and effectively separate HCC patients from normal individuals. The EV-origin provides an approach to decipher the complex heterogeneity of tissue-cellular origin in circulating EVs. Our approach could inform the development of exLR-based applications for liquid biopsy.
Collapse
Affiliation(s)
- Yuchen Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xigan He
- Department of Hepatic Surgery, Fudan University Shanghai Cancer Center, Shanghai Medical College, Fudan University, Shanghai 200032, China
| | - Qin Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Hongyan Lai
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Hena Zhang
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Zhixiang Hu
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yan Li
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Shenglin Huang
- Department of Integrative Oncology, Fudan University Shanghai Cancer Center, and the Shanghai Key Laboratory of Medical Epigenetics, the International Co-laboratory of Medical Epigenetics and Metabolism, Ministry of Science and Technology, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China.,Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| |
Collapse
|
27
|
Arneson D, Yang X, Wang K. MethylResolver-a method for deconvoluting bulk DNA methylation profiles into known and unknown cell contents. Commun Biol 2020; 3:422. [PMID: 32747663 PMCID: PMC7400544 DOI: 10.1038/s42003-020-01146-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 07/02/2020] [Indexed: 12/14/2022] Open
Abstract
Bulk tissue DNA methylation profiling has been used to examine epigenetic mechanisms and biomarkers of complex diseases such as cancer. However, heterogeneity of cellular content in tissues complicates result interpretation and utility. In silico deconvolution of cellular fractions from bulk tissue data offers a fast and inexpensive alternative to experimentally measuring such fractions. In this study, we report the design, implementation, and benchmarking of MethylResolver, a Least Trimmed Squares regression-based method for inferring leukocyte subset fractions from methylation profiles of tumor admixtures. Compared to previous approaches MethylResolver is more accurate as unknown cellular content in the mixture increases and is able to resolve tumor purity-scaled immune cell-type fractions without a cancer-specific signature. We also present a pan-cancer deconvolution of TCGA, recapitulating that high eosinophil fraction predicts improved cervical carcinoma survival and identifying elevated B cell fraction as a previously unreported predictor of poor survival for papillary renal cell carcinoma.
Collapse
Affiliation(s)
- Douglas Arneson
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA.
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Molecular Biology Institute, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Kai Wang
- Informatics and Predictive Sciences, Bristol-Myers Squibb, San Diego, CA, 92121, USA.
| |
Collapse
|
28
|
Li Z, Wu Z, Jin P, Wu H. Dissecting differential signals in high-throughput data from complex tissues. Bioinformatics 2020; 35:3898-3905. [PMID: 30903684 DOI: 10.1093/bioinformatics/btz196] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 03/08/2019] [Accepted: 03/20/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Samples from clinical practices are often mixtures of different cell types. The high-throughput data obtained from these samples are thus mixed signals. The cell mixture brings complications to data analysis, and will lead to biased results if not properly accounted for. RESULTS We develop a method to model the high-throughput data from mixed, heterogeneous samples, and to detect differential signals. Our method allows flexible statistical inference for detecting a variety of cell-type specific changes. Extensive simulation studies and analyses of two real datasets demonstrate the favorable performance of our proposed method compared with existing ones serving similar purpose. AVAILABILITY AND IMPLEMENTATION The proposed method is implemented as an R package and is freely available on GitHub (https://github.com/ziyili20/TOAST). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA, USA
| |
Collapse
|
29
|
Li H, Sharma A, Luo K, Qin ZS, Sun X, Liu H. DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples. Front Genet 2020; 11:392. [PMID: 32547592 PMCID: PMC7269180 DOI: 10.3389/fgene.2020.00392] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/30/2020] [Indexed: 12/26/2022] Open
Abstract
While our understanding of cellular and molecular processes has grown exponentially, issues related to the cell microenvironment and cellular heterogeneity have sparked a new debate concerning the cell identity. Cell composition (chromatin and nuclear architecture) poses a strong risk for dynamic changes in the diseased condition. Since chromatin accessibility patterns play a major role in human diseases, it is therefore anticipated that a deconvolution tool based on open chromatin data will provide better performance in identifying cell composition. Herein, we have designed the deconvolution tool "DeconPeaker," which can precisely define the uniqueness among subpopulations of cells using open chromatin datasets. Using this tool, we simultaneously evaluated chromatin accessibility and gene expression datasets to estimate cell types and their respective proportions in a mixture of samples. In comparison to other known deconvolution methods, we observed the lowest average root-mean-square error (RMSE = 0.042) and the highest average correlation coefficient (r = 0.919) between the prediction and "true" proportion. As a proof-of-concept, we also tested chromatin accessibility data from acute myeloid leukemia (AML) and successfully obtained unique cell types associated with AML progression. Furthermore, we showed that chromatin accessibility represents more essential characteristics in the identification of cell types than gene expression. Taken together, DeconPeaker as a powerful tool has the potential to combine different datasets (primarily, chromatin accessibility and gene expression) and define different cell types in mixtures. The Python package of DeconPeaker is now available at https://github.com/lihuamei/DeconPeaker.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Amit Sharma
- Department of Ophthalmology, University Hospital Bonn, Bonn, Germany
| | - Kun Luo
- Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
30
|
Dong L, Kollipara A, Darville T, Zou F, Zheng X. Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information. Sci Rep 2020; 10:5434. [PMID: 32214192 PMCID: PMC7096458 DOI: 10.1038/s41598-020-62330-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/26/2020] [Indexed: 01/03/2023] Open
Abstract
Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.
Collapse
Affiliation(s)
- Li Dong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Avinash Kollipara
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| |
Collapse
|
31
|
Görtler F, Schön M, Simeth J, Solbrig S, Wettig T, Oefner PJ, Spang R, Altenbuchinger M. Loss-Function Learning for Digital Tissue Deconvolution. J Comput Biol 2020; 27:342-355. [DOI: 10.1089/cmb.2019.0462] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Affiliation(s)
- Franziska Görtler
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Marian Schön
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Jakob Simeth
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Stefan Solbrig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Tilo Wettig
- Department of Physics, University of Regensburg, Regensburg, Germany
| | - Peter J. Oefner
- Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Rainer Spang
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| | - Michael Altenbuchinger
- Department of Statistical Bioinformatics, Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
| |
Collapse
|
32
|
Steen CB, Liu CL, Alizadeh AA, Newman AM. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. Methods Mol Biol 2020; 2117:135-157. [PMID: 31960376 DOI: 10.1007/978-1-0716-0301-7_7] [Citation(s) in RCA: 249] [Impact Index Per Article: 62.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
CIBERSORTx is a suite of machine learning tools for the assessment of cellular abundance and cell type-specific gene expression patterns from bulk tissue transcriptome profiles. With this framework, single-cell or bulk-sorted RNA sequencing data can be used to learn molecular signatures of distinct cell types from a small collection of biospecimens. These signatures can then be repeatedly applied to characterize cellular heterogeneity from bulk tissue transcriptomes without physical cell isolation. In this chapter, we provide a detailed primer on CIBERSORTx and demonstrate its capabilities for high-throughput profiling of cell types and cellular states in normal and neoplastic tissues.
Collapse
Affiliation(s)
- Chloé B Steen
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Chih Long Liu
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
| | - Ash A Alizadeh
- Division of Oncology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA. .,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA. .,Center for Cancer Systems Biology, Stanford University, Stanford, CA, USA. .,Division of Hematology, Department of Medicine, Stanford Cancer Institute, Stanford University, Stanford, CA, USA.
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA. .,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
33
|
Chiu YJ, Hsieh YH, Huang YH. Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells. BMC Med Genomics 2019; 12:169. [PMID: 31856824 PMCID: PMC6923925 DOI: 10.1186/s12920-019-0613-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 10/31/2019] [Indexed: 01/07/2023] Open
Abstract
Background To facilitate the investigation of the pathogenic roles played by various immune cells in complex tissues such as tumors, a few computational methods for deconvoluting bulk gene expression profiles to predict cell composition have been created. However, available methods were usually developed along with a set of reference gene expression profiles consisting of imbalanced replicates across different cell types. Therefore, the objective of this study was to create a new deconvolution method equipped with a new set of reference gene expression profiles that incorporate more microarray replicates of the immune cells that have been frequently implicated in the poor prognosis of cancers, such as T helper cells, regulatory T cells and macrophage M1/M2 cells. Methods Our deconvolution method was developed by choosing ε-support vector regression (ε-SVR) as the core algorithm assigned with a loss function subject to the L1-norm penalty. To construct the reference gene expression signature matrix for regression, a subset of differentially expressed genes were chosen from 148 microarray-based gene expression profiles for 9 types of immune cells by using ANOVA and minimizing condition number. Agreement analyses including mean absolute percentage errors and Bland-Altman plots were carried out to compare the performances of our method and CIBERSORT. Results In silico cell mixtures, simulated bulk tissues, and real human samples with known immune-cell fractions were used as the test datasets for benchmarking. Our method outperformed CIBERSORT in the benchmarks using in silico breast tissue-immune cell mixtures in the proportions of 30:70 and 50:50, and in the benchmark using 164 human PBMC samples. Our results suggest that the performance of our method was at least comparable to that of a state-of-the-art tool, CIBERSORT. Conclusions We developed a new cell composition deconvolution method and the implementation was entirely based on the publicly available R and Python packages. In addition, we compiled a new set of reference gene expression profiles, which might allow for a more robust prediction of the immune cell fractions from the expression profiles of cell mixtures. The source code of our method could be downloaded from https://github.com/holiday01/deconvolution-to-estimate-immune-cell-subsets.
Collapse
Affiliation(s)
- Yen-Jung Chiu
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yi-Hsuan Hsieh
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yen-Hua Huang
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan. .,Centre for Systems and Synthetic Biology, National Yang-Ming University, Taipei, 11221, Taiwan.
| |
Collapse
|
34
|
Lightbody G, Haberland V, Browne F, Taggart L, Zheng H, Parkes E, Blayney JK. Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application. Brief Bioinform 2019; 20:1795-1811. [PMID: 30084865 PMCID: PMC6917217 DOI: 10.1093/bib/bby051] [Citation(s) in RCA: 88] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 05/01/2018] [Indexed: 12/28/2022] Open
Abstract
There has been an exponential growth in the performance and output of sequencing technologies (omics data) with full genome sequencing now producing gigabases of reads on a daily basis. These data may hold the promise of personalized medicine, leading to routinely available sequencing tests that can guide patient treatment decisions. In the era of high-throughput sequencing (HTS), computational considerations, data governance and clinical translation are the greatest rate-limiting steps. To ensure that the analysis, management and interpretation of such extensive omics data is exploited to its full potential, key factors, including sample sourcing, technology selection and computational expertise and resources, need to be considered, leading to an integrated set of high-performance tools and systems. This article provides an up-to-date overview of the evolution of HTS and the accompanying tools, infrastructure and data management approaches that are emerging in this space, which, if used within in a multidisciplinary context, may ultimately facilitate the development of personalized medicine.
Collapse
Affiliation(s)
- Gaye Lightbody
- School of Computing, Ulster University, Newtownabbey, UK
| | - Valeriia Haberland
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
| | - Fiona Browne
- School of Computing, Ulster University, Newtownabbey, UK
| | | | - Huiru Zheng
- School of Computing, Ulster University, Newtownabbey, UK
| | - Eileen Parkes
- Centre for Cancer Research & Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University, Belfast, UK
| | - Jaine K Blayney
- Centre for Cancer Research & Cell Biology, School of Medicine, Dentistry and Biomedical Sciences, Queen's University, Belfast, UK
| |
Collapse
|
35
|
Li Z, Wu H. TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol 2019; 20:190. [PMID: 31484546 PMCID: PMC6727351 DOI: 10.1186/s13059-019-1778-0] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
In the analysis of high-throughput data from complex samples, cell composition is an important factor that needs to be accounted for. Except for a limited number of tissues with known pure cell type profiles, a majority of genomics and epigenetics data relies on the "reference-free deconvolution" methods to estimate cell composition. We develop a novel computational method to improve reference-free deconvolution, which iteratively searches for cell type-specific features and performs composition estimation. Simulation studies and applications to six real datasets including both DNA methylation and gene expression data demonstrate favorable performance of the proposed method. TOAST is available at https://bioconductor.org/packages/TOAST .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA.
| |
Collapse
|
36
|
Gastric Normal Adjacent Mucosa Versus Healthy and Cancer Tissues: Distinctive Transcriptomic Profiles and Biological Features. Cancers (Basel) 2019; 11:cancers11091248. [PMID: 31454993 PMCID: PMC6769942 DOI: 10.3390/cancers11091248] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 08/05/2019] [Accepted: 08/22/2019] [Indexed: 01/25/2023] Open
Abstract
Gastric cancer (GC) is a leading cause of cancer-related deaths in the world. Molecular heterogeneity is a major determinant for the clinical outcomes and an exhaustive tumor classification is currently missing. Histologically normal tissue adjacent to the tumor (NAT) is commonly used as a control in cancer studies, nevertheless a recently published paper described the unique characteristics of the NAT in several tumor types. Little is known about the global gene expression profile of gastric NAT (gNAT) which could be an effective tool for a more realistic definition of GC molecular signature. Here, we integrated data of 512 samples from the Genotype-Tissue Expression project (GETx) and The Cancer Genome Atlas (TCGA) to analyze the transcriptome of healthy gastric tissues, gNAT, and GC samples. We validated TCGA-GETx data mining through inHouse gNAT and GC expression dataset. Differential gene expression together with pathway enrichment analyses, indeed, led to different results when using the gNAT or the healthy tissue as control. Based on our analyses, gNAT showed a peculiar gene signature and biological features, like the estrogen receptor pathways activation, suggesting a molecular behavior partially different from both healthy and GC tissues. Therefore, using gNAT as healthy control tissue in the characterization of tumor associated biological processes and pathways could lead to suboptimal results.
Collapse
|
37
|
Avila Cobos F, Vandesompele J, Mestdagh P, De Preter K. Computational deconvolution of transcriptomics data from mixed cell populations. Bioinformatics 2019; 34:1969-1979. [PMID: 29351586 DOI: 10.1093/bioinformatics/bty019] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 01/10/2018] [Indexed: 12/22/2022] Open
Abstract
Summary Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. In this review, we highlight the importance and value of computational deconvolution methods to infer the abundance of different cell types and/or cell type-specific expression profiles in heterogeneous samples without performing physical cell sorting. We also explain the various deconvolution scenarios, the mathematical approaches used to solve them and the effect of data processing and different confounding factors on the accuracy of the deconvolution results. Contact katleen.depreter@ugent.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Francisco Avila Cobos
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Jo Vandesompele
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Pieter Mestdagh
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| | - Katleen De Preter
- Center for Medical Genetics Ghent (CMGG), Ghent University, 9000 Ghent, Belgium.,Cancer Research Institute Ghent (CRIG), 9000 Ghent, Belgium.,Bioinformatics Institute Ghent from Nucleotides to Networks (BIG N2N), 9000 Ghent, Belgium
| |
Collapse
|
38
|
Accurate estimation of cell-type composition from gene expression data. Nat Commun 2019; 10:2975. [PMID: 31278265 PMCID: PMC6611906 DOI: 10.1038/s41467-019-10802-z] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 05/24/2019] [Indexed: 01/20/2023] Open
Abstract
The rapid development of single-cell transcriptomic technologies has helped uncover the cellular heterogeneity within cell populations. However, bulk RNA-seq continues to be the main workhorse for quantifying gene expression levels due to technical simplicity and low cost. To most effectively extract information from bulk data given the new knowledge gained from single-cell methods, we have developed a novel algorithm to estimate the cell-type composition of bulk data from a single-cell RNA-seq-derived cell-type signature. Comparison with existing methods using various real RNA-seq data sets indicates that our new approach is more accurate and comprehensive than previous methods, especially for the estimation of rare cell types. More importantly, our method can detect cell-type composition changes in response to external perturbations, thereby providing a valuable, cost-effective method for dissecting the cell-type-specific effects of drug treatments or condition changes. As such, our method is applicable to a wide range of biological and clinical investigations. Bulk RNA-seq data harbors valuable information about gene expression levels from different cell types in tissue samples. Here, the authors develop DWLS, a computational method for estimating cell-type composition of bulk data by leveraging single-cell RNA-seq-derived cell-type signatures.
Collapse
|
39
|
Domanskyi S, Szedlak A, Hawkins NT, Wang J, Paternostro G, Piermarocchi C. Polled Digital Cell Sorter (p-DCS): Automatic identification of hematological cell types from single cell RNA-sequencing clusters. BMC Bioinformatics 2019; 20:369. [PMID: 31262249 PMCID: PMC6604348 DOI: 10.1186/s12859-019-2951-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 06/13/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Single cell RNA sequencing (scRNA-seq) brings unprecedented opportunities for mapping the heterogeneity of complex cellular environments such as bone marrow, and provides insight into many cellular processes. Single cell RNA-seq has a far larger fraction of missing data reported as zeros (dropouts) than traditional bulk RNA-seq, and unsupervised clustering combined with Principal Component Analysis (PCA) can be used to overcome this limitation. After clustering, however, one has to interpret the average expression of markers on each cluster to identify the corresponding cell types, and this is normally done by hand by an expert curator. RESULTS We present a computational tool for processing single cell RNA-seq data that uses a voting algorithm to automatically identify cells based on approval votes received by known molecular markers. Using a stochastic procedure that accounts for imbalances in the number of known molecular signatures for different cell types, the method computes the statistical significance of the final approval score and automatically assigns a cell type to clusters without an expert curator. We demonstrate the utility of the tool in the analysis of eight samples of bone marrow from the Human Cell Atlas. The tool provides a systematic identification of cell types in bone marrow based on a list of markers of immune cell types, and incorporates a suite of visualization tools that can be overlaid on a t-SNE representation. The software is freely available as a Python package at https://github.com/sdomanskyi/DigitalCellSorter . CONCLUSIONS This methodology assures that extensive marker to cell type matching information is taken into account in a systematic way when assigning cell clusters to cell types. Moreover, the method allows for a high throughput processing of multiple scRNA-seq datasets, since it does not involve an expert curator, and it can be applied recursively to obtain cell sub-types. The software is designed to allow the user to substitute the marker to cell type matching information and apply the methodology to different cellular environments.
Collapse
Affiliation(s)
- Sergii Domanskyi
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA.
| | - Anthony Szedlak
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| | - Nathaniel T Hawkins
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| | | | | | - Carlo Piermarocchi
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
40
|
Finotello F, Mayer C, Plattner C, Laschober G, Rieder D, Hackl H, Krogsdam A, Loncova Z, Posch W, Wilflingseder D, Sopper S, Ijsselsteijn M, Brouwer TP, Johnson D, Xu Y, Wang Y, Sanders ME, Estrada MV, Ericsson-Gonzalez P, Charoentong P, Balko J, de Miranda NFDCC, Trajanoski Z. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med 2019; 11:34. [PMID: 31126321 PMCID: PMC6534875 DOI: 10.1186/s13073-019-0638-6] [Citation(s) in RCA: 726] [Impact Index Per Article: 145.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 04/09/2019] [Indexed: 12/26/2022] Open
Abstract
We introduce quanTIseq, a method to quantify the fractions of ten immune cell types from bulk RNA-sequencing data. quanTIseq was extensively validated in blood and tumor samples using simulated, flow cytometry, and immunohistochemistry data.quanTIseq analysis of 8000 tumor samples revealed that cytotoxic T cell infiltration is more strongly associated with the activation of the CXCR3/CXCL9 axis than with mutational load and that deconvolution-based cell scores have prognostic value in several solid cancers. Finally, we used quanTIseq to show how kinase inhibitors modulate the immune contexture and to reveal immune-cell types that underlie differential patients' responses to checkpoint blockers.Availability: quanTIseq is available at http://icbi.at/quantiseq .
Collapse
Affiliation(s)
- Francesca Finotello
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Clemens Mayer
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Christina Plattner
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Gerhard Laschober
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Dietmar Rieder
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Hubert Hackl
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Anne Krogsdam
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Zuzana Loncova
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria
| | - Wilfried Posch
- Division of Hygiene and Medical Microbiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Doris Wilflingseder
- Division of Hygiene and Medical Microbiology, Medical University of Innsbruck, Innsbruck, Austria
| | - Sieghart Sopper
- Department of Haematology and Oncology, Medical University of Innsbruck, Innsbruck, Austria
| | - Marieke Ijsselsteijn
- Department of Pathology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Thomas P Brouwer
- Department of Pathology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Douglas Johnson
- Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yaomin Xu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yu Wang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Melinda E Sanders
- Department Pathology Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Monica V Estrada
- Department Pathology Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Paula Ericsson-Gonzalez
- Department Pathology Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Pornpimol Charoentong
- Department of Medical Oncology and Internal Medicine VI, National Center for Tumor Diseases, University Hospital Heidelberg, Heidelberg, Germany
- Division of Translational Immunotherapy, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Justin Balko
- Vanderbilt University, Nashville, TN, USA
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Zlatko Trajanoski
- Biocenter, Division of Bioinformatics, Medical University of Innsbruck, Innrain 80, Innsbruck, Austria.
- Austrian Drug Screening Institute, Innrain 66A, Innsbruck, Austria.
| |
Collapse
|
41
|
Hao Y, Yan M, Heath BR, Lei YL, Xie Y. Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares. PLoS Comput Biol 2019; 15:e1006976. [PMID: 31059559 PMCID: PMC6522071 DOI: 10.1371/journal.pcbi.1006976] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Revised: 05/16/2019] [Accepted: 03/25/2019] [Indexed: 02/08/2023] Open
Abstract
Gene-expression deconvolution is used to quantify different types of cells in a mixed population. It provides a highly promising solution to rapidly characterize the tumor-infiltrating immune landscape and identify cold cancers. However, a major challenge is that gene-expression data are frequently contaminated by many outliers that decrease the estimation accuracy. Thus, it is imperative to develop a robust deconvolution method that automatically decontaminates data by reliably detecting and removing outliers. We developed a new machine learning tool, Fast And Robust DEconvolution of Expression Profiles (FARDEEP), to enumerate immune cell subsets from whole tumor tissue samples. To reduce noise in the tumor gene expression datasets, FARDEEP utilizes an adaptive least trimmed square to automatically detect and remove outliers before estimating the cell compositions. We show that FARDEEP is less susceptible to outliers and returns a better estimation of coefficients than the existing methods with both numerical simulations and real datasets. FARDEEP provides an estimate related to the absolute quantity of each immune cell subset in addition to relative percentages. Hence, FARDEEP represents a novel robust algorithm to complement the existing toolkit for the characterization of tissue-infiltrating immune cell landscape. The source code for FARDEEP is implemented in R and available for download at https://github.com/YuningHao/FARDEEP.git. Rapidly emerging evidence suggests that the tumor immune microenvironment not only predisposes cancer patients to diverse treatment outcomes but also represents a promising source of biomarkers for better patient stratification. Different from the immunohistochemistry-based scoring practice, which focuses on a few selected marker proteins, immune deconvolution pipelines inform a previously untapped method to comprehensively reveal the tumor-infiltrating immune landscape. Recognizing the numerous strengths of existing immune deconvolution algorithms, here we show data outliers, which are inevitable in whole tissue sequencing data sets, substantially skew estimation results. Moreover, an estimate related to the absolute amount of each immune subset offers valuable insight into the nature of the host response in addition to percentage information alone. Thus, we engineered a new immune deconvolution pipeline, coined as Fast and Robust Deconvolution of Expression Profiles (FARDEEP), to automatically detect and remove outliers prior feeding data into the deconvolution algorithm and to provide estimates related to the absolute quantity of each immune subset. Utilizing both synthetic and real data sets, we found that FARDEEP returns superior coefficients and offers a robust tool to reveal the immune landscape of human cancers.
Collapse
Affiliation(s)
- Yuning Hao
- Department of Statistics and Probability, Michigan State University, East Lansing, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
| | - Ming Yan
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
- Department of Mathematics, Michigan State University, East Lansing, United States of America
| | - Blake R. Heath
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry Ann Arbor, United States of America
| | - Yu L. Lei
- Department of Periodontics and Oral Medicine, University of Michigan School of Dentistry Ann Arbor, United States of America
- University of Michigan Rogel Cancer Center, Ann Arbor, United States of America
- * E-mail: (YLL); (YX)
| | - Yuying Xie
- Department of Statistics and Probability, Michigan State University, East Lansing, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, United States of America
- * E-mail: (YLL); (YX)
| |
Collapse
|
42
|
Klopfenstein Q, Truntzer C, Vincent J, Ghiringhelli F. Cell lines and immune classification of glioblastoma define patient's prognosis. Br J Cancer 2019; 120:806-814. [PMID: 30899088 PMCID: PMC6474266 DOI: 10.1038/s41416-019-0404-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 01/11/2019] [Accepted: 01/28/2019] [Indexed: 12/26/2022] Open
Abstract
Background Prognostic markers for glioblastoma are lacking. Both intrinsic tumour characteristics and microenvironment could influence cancer prognostic. The aim of our study was to generate a pure glioblastoma cell lines and immune classification in order to decipher the respective role of glioblastoma cell and microenvironment on prognosis. Methods We worked on two large cohorts of patients suffering from glioblastoma (TCGA, n = 481 and Rembrandt, n = 180) for which clinical data, transcriptomic profiles and outcome were recorded. Transcriptomic profiles of 129 pure glioblastoma cell lines were clustered to generate a glioblastoma cell lines classification. Presence of subtypes of glioblastoma cell lines and immune cells was determined using deconvolution. Results Glioblastoma cell lines classification defined three new molecular groups called oncogenic, metabolic and neuronal communication enriched. Neuronal communication-enriched tumours were associated with poor prognosis in both cohorts. Immune cell infiltrate was more frequent in mesenchymal classical classification subgroup and metabolic-enriched tumours. A combination of age, glioblastoma cell lines classification and immune classification could be used to determine patient’s outcome in both cohorts. Conclusions Our study shows that glioblastoma-bearing patients can be classified based on their age, glioblastoma cell lines classification and immune classification. The combination of these information improves the capacity to address prognosis.
Collapse
Affiliation(s)
- Quentin Klopfenstein
- Research Platform in Biological Oncology, Dijon, France.,GIMI Genetic and Immunology Medical Institute, Dijon, France
| | - Caroline Truntzer
- Research Platform in Biological Oncology, Dijon, France.,GIMI Genetic and Immunology Medical Institute, Dijon, France
| | - Julie Vincent
- Department of Medical Oncology, Centre GF Leclerc, Dijon, France
| | - Francois Ghiringhelli
- Research Platform in Biological Oncology, Dijon, France. .,GIMI Genetic and Immunology Medical Institute, Dijon, France. .,Department of Medical Oncology, Centre GF Leclerc, Dijon, France. .,INSERM, UMR1231, Dijon, France.
| |
Collapse
|
43
|
Nachun D, Gao F, Isaacs C, Strawser C, Yang Z, Dokuru D, Van Berlo V, Sears R, Farmer J, Perlman S, Lynch DR, Coppola G. Peripheral blood gene expression reveals an inflammatory transcriptomic signature in Friedreich's ataxia patients. Hum Mol Genet 2019; 27:2965-2977. [PMID: 29790959 PMCID: PMC6097013 DOI: 10.1093/hmg/ddy198] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 05/17/2018] [Indexed: 12/15/2022] Open
Abstract
Transcriptional changes in Friedreich's ataxia (FRDA), a rare and debilitating recessive Mendelian neurodegenerative disorder, have been studied in affected but inaccessible tissues-such as dorsal root ganglia, sensory neurons and cerebellum-in animal models or small patient series. However, transcriptional changes induced by FRDA in peripheral blood, a readily accessible tissue, have not been characterized in a large sample. We used differential expression, association with disability stage, network analysis and enrichment analysis to characterize the peripheral blood transcriptome and identify genes that were differentially expressed in FRDA patients (n = 418) compared with both heterozygous expansion carriers (n = 228) and controls (n = 93 739 individuals in total), or were associated with disease progression, resulting in a disease signature for FRDA. We identified a transcriptional signature strongly enriched for an inflammatory innate immune response. Future studies should seek to further characterize the role of peripheral inflammation in FRDA pathology and determine its relevance to overall disease progression.
Collapse
Affiliation(s)
- Daniel Nachun
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Fuying Gao
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Charles Isaacs
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Zhongan Yang
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Deepika Dokuru
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Victoria Van Berlo
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | - Renee Sears
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Susan Perlman
- Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| | - David R Lynch
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Giovanni Coppola
- Department of Psychiatry and Semel Institute, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Neurology, University of California, Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
44
|
Monaco G, Lee B, Xu W, Mustafah S, Hwang YY, Carré C, Burdin N, Visan L, Ceccarelli M, Poidinger M, Zippelius A, Pedro de Magalhães J, Larbi A. RNA-Seq Signatures Normalized by mRNA Abundance Allow Absolute Deconvolution of Human Immune Cell Types. Cell Rep 2019; 26:1627-1640.e7. [PMID: 30726743 PMCID: PMC6367568 DOI: 10.1016/j.celrep.2019.01.041] [Citation(s) in RCA: 492] [Impact Index Per Article: 98.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 12/03/2018] [Accepted: 01/10/2019] [Indexed: 01/22/2023] Open
Abstract
The molecular characterization of immune subsets is important for designing effective strategies to understand and treat diseases. We characterized 29 immune cell types within the peripheral blood mononuclear cell (PBMC) fraction of healthy donors using RNA-seq (RNA sequencing) and flow cytometry. Our dataset was used, first, to identify sets of genes that are specific, are co-expressed, and have housekeeping roles across the 29 cell types. Then, we examined differences in mRNA heterogeneity and mRNA abundance revealing cell type specificity. Last, we performed absolute deconvolution on a suitable set of immune cell types using transcriptomics signatures normalized by mRNA abundance. Absolute deconvolution is ready to use for PBMC transcriptomic data using our Shiny app (https://github.com/giannimonaco/ABIS). We benchmarked different deconvolution and normalization methods and validated the resources in independent cohorts. Our work has research, clinical, and diagnostic value by making it possible to effectively associate observations in bulk transcriptomics data to specific immune subsets.
Collapse
Affiliation(s)
- Gianni Monaco
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore; Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool L78TX, UK; Department of Biomedicine, University Hospital and University of Basel, 4031 Basel, Switzerland.
| | - Bernett Lee
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Weili Xu
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Seri Mustafah
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - You Yi Hwang
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | | | | | | | - Michele Ceccarelli
- BIOGEM Research Center, Ariano Irpino, Italy; Department of Science and Technology, University of Sannio, Benevento, Italy
| | - Michael Poidinger
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore
| | - Alfred Zippelius
- Department of Biomedicine, University Hospital and University of Basel, 4031 Basel, Switzerland
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool L78TX, UK.
| | - Anis Larbi
- Singapore Immunology Network (SIgN), Agency for Science Technology and Research, Biopolis, 8A Biomedical Grove, 138648, Singapore, Singapore; Department of Biology, Faculty of Sciences, University Tunis El Manar, Tunis, Tunisia; Faculty of Medicine, University of Sherbrooke, Sherbrooke, QC, Canada; Department of Microbiology, Immunology Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
45
|
Mohammad TA, Tsai YS, Ameer S, Chen HIH, Chiu YC, Chen Y. CeL-ID: cell line identification using RNA-seq data. BMC Genomics 2019; 20:81. [PMID: 30712511 PMCID: PMC6360649 DOI: 10.1186/s12864-018-5371-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Cell lines form the cornerstone of cell-based experimentation studies into understanding the underlying mechanisms of normal and disease biology including cancer. However, it is commonly acknowledged that contamination of cell lines is a prevalent problem affecting biomedical science and available methods for cell line authentication suffer from limited access as well as being too daunting and time-consuming for many researchers. Therefore, a new and cost effective approach for authentication and quality control of cell lines is needed. RESULTS We have developed a new RNA-seq based approach named CeL-ID for cell line authentication. CeL-ID uses RNA-seq data to identify variants and compare with variant profiles of other cell lines. RNA-seq data for 934 CCLE cell lines downloaded from NCI GDC were used to generate cell line specific variant profiles and pair-wise correlations were calculated using frequencies and depth of coverage values of all the variants. Comparative analysis of variant profiles revealed that variant profiles differ significantly from cell line to cell line whereas identical, synonymous and derivative cell lines share high variant identity and are highly correlated (ρ > 0.9). Our benchmarking studies revealed that CeL-ID method can identify a cell line with high accuracy and can be a valuable tool of cell line authentication in biomedical science. Finally, CeL-ID estimates the possible cross contamination using linear mixture model if no perfect match was detected. CONCLUSIONS In this study, we show the utility of an RNA-seq based approach for cell line authentication. Our comparative analysis of variant profiles derived from RNA-seq data revealed that variant profiles of each cell line are distinct and overall share low variant identity with other cell lines whereas identical or synonymous cell lines show significantly high variant identity and hence variant profiles can be used as a discriminatory/identifying feature in cell authentication model.
Collapse
Affiliation(s)
- Tabrez A Mohammad
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Yun S Tsai
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Safwa Ameer
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Hung-I Harry Chen
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Yu-Chiao Chiu
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA
| | - Yidong Chen
- Greehey Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA. .,Department of Epidemiology and Biostatistics, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.
| |
Collapse
|
46
|
Ray M, Ruffalo MM, Bar‐Joseph Z. Construction of integrated microRNA and mRNA immune cell signatures to predict survival of patients with breast and ovarian cancer. Genes Chromosomes Cancer 2018; 58:34-42. [DOI: 10.1002/gcc.22688] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2018] [Revised: 09/26/2018] [Accepted: 09/26/2018] [Indexed: 12/15/2022] Open
Affiliation(s)
- Mondira Ray
- Medical Research Fellows Program Howard Hughes Medical Institute Chevy Chase Maryland
- University of Pittsburgh School of Medicine Pittsburgh Pennsylvania
| | - Matthew M. Ruffalo
- Department of Computational Biology, School of Computer Science Carnegie Mellon University Pittsburgh Pennsylvania
| | - Ziv Bar‐Joseph
- Department of Computational Biology, School of Computer Science Carnegie Mellon University Pittsburgh Pennsylvania
- Department of Machine Learning, School of Computer Science Carnegie Mellon University Pittsburgh Pennsylvania
| |
Collapse
|
47
|
Leveraging heterogeneity across multiple datasets increases cell-mixture deconvolution accuracy and reduces biological and technical biases. Nat Commun 2018; 9:4735. [PMID: 30413720 PMCID: PMC6226523 DOI: 10.1038/s41467-018-07242-6] [Citation(s) in RCA: 103] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 10/19/2018] [Indexed: 02/07/2023] Open
Abstract
In silico quantification of cell proportions from mixed-cell transcriptomics data (deconvolution) requires a reference expression matrix, called basis matrix. We hypothesize that matrices created using only healthy samples from a single microarray platform would introduce biological and technical biases in deconvolution. We show presence of such biases in two existing matrices, IRIS and LM22, irrespective of deconvolution method. Here, we present immunoStates, a basis matrix built using 6160 samples with different disease states across 42 microarray platforms. We find that immunoStates significantly reduces biological and technical biases. Importantly, we find that different methods have virtually no or minimal effect once the basis matrix is chosen. We further show that cellular proportion estimates using immunoStates are consistently more correlated with measured proportions than IRIS and LM22, across all methods. Our results demonstrate the need and importance of incorporating biological and technical heterogeneity in a basis matrix for achieving consistently high accuracy.
Collapse
|
48
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
49
|
Petitprez F, Sun CM, Lacroix L, Sautès-Fridman C, de Reyniès A, Fridman WH. Quantitative Analyses of the Tumor Microenvironment Composition and Orientation in the Era of Precision Medicine. Front Oncol 2018; 8:390. [PMID: 30319963 PMCID: PMC6167550 DOI: 10.3389/fonc.2018.00390] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 08/30/2018] [Indexed: 11/20/2022] Open
Abstract
Tumors are formed by aggregates of cells of various origins including malignant, stromal and immune cells. The number of therapies targeting the microenvironment is increasing as the tumor microenvironment is more and more recognized as playing an essential role in tumor control. In the era of precision medicine, it is essential to precisely estimate the composition, organization and functionality of the individual patient tumor microenvironment and to find ways to therapeutically modulate it. To quantify the cell populations present in the tumor microenvironment, many tools are now available and the most recent approaches will be reviewed herein. We provide an overview of experimental and computational methodologies used to quantify tumor-associated cellular populations, including immunohistochemistry, flow and mass cytometry, bulk and single-cell transcriptomic approaches. We illustrate their respective contribution to characterize the microenvironment. We also discuss how these methods allow to guide therapeutic choices, in relation to the predictive value of some characteristics of the microenvironment.
Collapse
Affiliation(s)
- Florent Petitprez
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
| | - Cheng-Ming Sun
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Laetitia Lacroix
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Catherine Sautès-Fridman
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
| | - Wolf H Fridman
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| |
Collapse
|
50
|
Xie F, Zhou M, Xu Y. BayCount: A Bayesian decomposition method for inferring tumor heterogeneity using RNA-Seq counts. Ann Appl Stat 2018. [DOI: 10.1214/17-aoas1123] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|