1
|
Mercadié A, Gravier É, Josse G, Fournier I, Viodé C, Vialaneix N, Brouard C. NMFProfiler: a multi-omics integration method for samples stratified in groups. Bioinformatics 2025; 41:btaf066. [PMID: 39921890 PMCID: PMC11855281 DOI: 10.1093/bioinformatics/btaf066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Revised: 01/13/2025] [Accepted: 02/05/2025] [Indexed: 02/10/2025] Open
Abstract
MOTIVATION The development of high-throughput sequencing enabled the massive production of "omics" data for various applications in biology. By analyzing simultaneously paired datasets collected on the same samples, integrative statistical approaches allow researchers to get a global picture of such systems and to highlight existing relationships between various molecular types and levels. Here, we introduce NMFProfiler, an integrative supervised NMF that accounts for the stratification of samples into groups of biological interest. RESULTS NMFProfiler was shown to successfully extract signatures characterizing groups with performances comparable to or better than state-of-the-art approaches. In particular, NMFProfiler was used in a clinical study on atopic dermatitis (AD) and to analyze a multi-omic cancer dataset. In the first case, it successfully identified signatures combining known AD protein biomarkers and novel transcriptomic biomarkers. In addition, it was also able to extract signatures significantly associated to cancer survival. AVAILABILITY AND IMPLEMENTATION NMFProfiler is released as a Python package, NMFProfiler (v0.3.0), available on PyPI.
Collapse
Affiliation(s)
- Aurélie Mercadié
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| | - Éléonore Gravier
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Gwendal Josse
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Isabelle Fournier
- Université de Lille, Inserm, CHU Lille, U1192 PRISM, Lille 59000, France
| | - Cécile Viodé
- Recherche & Développement, Pierre Fabre Dermo-cosmétique, Toulouse 31300, France
| | - Nathalie Vialaneix
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| | - Céline Brouard
- Université de Toulouse, INRAE, UR MIAT, Castanet-Tolosan Cedex 31326, France
| |
Collapse
|
2
|
Kurian NC, Gann PH, Kumar N, McGregor SM, Verma R, Sethi A. Deep Learning Predicts Subtype Heterogeneity and Outcomes in Luminal A Breast Cancer Using Routinely Stained Whole-Slide Images. CANCER RESEARCH COMMUNICATIONS 2025; 5:157-166. [PMID: 39740059 PMCID: PMC11770635 DOI: 10.1158/2767-9764.crc-24-0397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 12/09/2024] [Accepted: 12/20/2024] [Indexed: 01/02/2025]
Abstract
SIGNIFICANCE A deep learning model, trained using transcriptomic data, inexpensively quantifies and fine-maps ITH due to subtype admixture in routine images of LumA breast cancer, the most favorable subtype. This new approach could facilitate exploration of the mechanisms behind such heterogeneity and its impact on selection of therapy for individual patients.
Collapse
Affiliation(s)
- Nikhil Cherian Kurian
- Department of Electrical Engineering, Indian Institute of Technology-Bombay, Mumbai, India
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Peter H. Gann
- Department of Pathology and University of Illinois Cancer Center, University of Illinois at Chicago, Chicago, Illinois
| | - Neeraj Kumar
- Department of Pathology, Warren Alpert Center for Computational Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Stephanie M. McGregor
- Department of Pathology and Laboratory Medicine, University of Wisconsin Carbone Cancer Center, University of Wisconsin, Madison, Wisconsin
| | - Ruchika Verma
- Windreich Department of Artificial Intelligence and Human Health, Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Amit Sethi
- Department of Electrical Engineering, Indian Institute of Technology-Bombay, Mumbai, India
- Department of Pathology and University of Illinois Cancer Center, University of Illinois at Chicago, Chicago, Illinois
| |
Collapse
|
3
|
Feng S, Huang L, Pournara AV, Huang Z, Yang X, Zhang Y, Brazma A, Shi M, Papatheodorou I, Miao Z. Alleviating batch effects in cell type deconvolution with SCCAF-D. Nat Commun 2024; 15:10867. [PMID: 39738054 DOI: 10.1038/s41467-024-55213-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 12/02/2024] [Indexed: 01/01/2025] Open
Abstract
Cell type deconvolution methods can impute cell proportions from bulk transcriptomics data, revealing changes in disease progression or organ development. But benchmarking studies often use simulated bulk data from the same source as the reference, which limits its application scenarios. This study examines batch effects in deconvolution and introduces SCCAF-D, a computational workflow that ensures a Pearson Correlation Coefficient above 0.75 across simulated and real bulk data for various tissue types. Applied to non-alcoholic fatty liver disease, SCCAF-D unveils meaningful insights into changes in cell proportions during disease progression.
Collapse
Affiliation(s)
- Shuo Feng
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macao Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, 230027, China
| | - Liangfeng Huang
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macao Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
- Translational Research Institute of Brain and Brain-Like Intelligence and Department of Anesthesiology, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai, China
| | - Anna Vathrakokoili Pournara
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Ziliang Huang
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macao Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China
| | - Xinlu Yang
- Department of Obstetrics and Gynaecology, Harbin Red Cross Central Hospital, Harbin, 150001, China
| | - Yongjian Zhang
- Harbin Medical University the Sixth Affiliated Hospital, Harbin, 150023, China
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Cambridge, CB10 1SD, UK
| | - Ming Shi
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| | - Irene Papatheodorou
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK.
- Medical School, University of East Anglia, Norwich Research Park, Norwich, NR4 7UA, UK.
| | - Zhichao Miao
- GMU-GIBH Joint School of Life Sciences, The Guangdong-Hong Kong-Macao Joint Laboratory for Cell Fate Regulation and Diseases, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, China.
- Translational Research Institute of Brain and Brain-Like Intelligence and Department of Anesthesiology, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai, China.
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Genome Campus, Cambridge, CB10 1SD, UK.
| |
Collapse
|
4
|
Wang C, Lin Y, Li S, Guan J. Deconvolution from bulk gene expression by leveraging sample-wise and gene-wise similarities and single-cell RNA-Seq data. BMC Genomics 2024; 25:875. [PMID: 39294558 PMCID: PMC11409548 DOI: 10.1186/s12864-024-10728-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Accepted: 08/20/2024] [Indexed: 09/20/2024] Open
Abstract
BACKGROUND The widely adopted bulk RNA-seq measures the gene expression average of cells, masking cell type heterogeneity, which confounds downstream analyses. Therefore, identifying the cellular composition and cell type-specific gene expression profiles (GEPs) facilitates the study of the underlying mechanisms of various biological processes. Although single-cell RNA-seq focuses on cell type heterogeneity in gene expression, it requires specialized and expensive resources and currently is not practical for a large number of samples or a routine clinical setting. Recently, computational deconvolution methodologies have been developed, while many of them only estimate cell type composition or cell type-specific GEPs by requiring the other as input. The development of more accurate deconvolution methods to infer cell type abundance and cell type-specific GEPs is still essential. RESULTS We propose a new deconvolution algorithm, DSSC, which infers cell type-specific gene expression and cell type proportions of heterogeneous samples simultaneously by leveraging gene-gene and sample-sample similarities in bulk expression and single-cell RNA-seq data. Through comparisons with the other existing methods, we demonstrate that DSSC is effective in inferring both cell type proportions and cell type-specific GEPs across simulated pseudo-bulk data (including intra-dataset and inter-dataset simulations) and experimental bulk data (including mixture data and real experimental data). DSSC shows robustness to the change of marker gene number and sample size and also has cost and time efficiencies. CONCLUSIONS DSSC provides a practical and promising alternative to the experimental techniques to characterize cellular composition and heterogeneity in the gene expression of heterogeneous samples.
Collapse
Affiliation(s)
- Chenqi Wang
- Department of Automation, Xiamen University, Xiamen, China
| | - Yifan Lin
- Department of Automation, Xiamen University, Xiamen, China
| | - Shuchao Li
- Department of Automation, Xiamen University, Xiamen, China
| | - Jinting Guan
- Department of Automation, Xiamen University, Xiamen, China.
- Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.
| |
Collapse
|
5
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. Genome Med 2024; 16:65. [PMID: 38685057 PMCID: PMC11057104 DOI: 10.1186/s13073-024-01338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 04/18/2024] [Indexed: 05/02/2024] Open
Abstract
Using computational tools, bulk transcriptomics can be deconvoluted to estimate the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, ignoring person-to-person heterogeneity. Here, we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. Simulation studies demonstrate reduced bias compared with existing methods. Real data analyses on longitudinal consortia show disparities in cell type proportions are associated with several disease phenotypes in Type 1 diabetes and Parkinson's disease. imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/ .
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, 38105, TN, USA.
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA.
| |
Collapse
|
6
|
Eteleeb AM, Novotny BC, Tarraga CS, Sohn C, Dhungel E, Brase L, Nallapu A, Buss J, Farias F, Bergmann K, Bradley J, Norton J, Gentsch J, Wang F, Davis AA, Morris JC, Karch CM, Perrin RJ, Benitez BA, Harari O. Brain high-throughput multi-omics data reveal molecular heterogeneity in Alzheimer's disease. PLoS Biol 2024; 22:e3002607. [PMID: 38687811 PMCID: PMC11086901 DOI: 10.1371/journal.pbio.3002607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Revised: 05/10/2024] [Accepted: 03/28/2024] [Indexed: 05/02/2024] Open
Abstract
Unbiased data-driven omic approaches are revealing the molecular heterogeneity of Alzheimer disease. Here, we used machine learning approaches to integrate high-throughput transcriptomic, proteomic, metabolomic, and lipidomic profiles with clinical and neuropathological data from multiple human AD cohorts. We discovered 4 unique multimodal molecular profiles, one of them showing signs of poor cognitive function, a faster pace of disease progression, shorter survival with the disease, severe neurodegeneration and astrogliosis, and reduced levels of metabolomic profiles. We found this molecular profile to be present in multiple affected cortical regions associated with higher Braak tau scores and significant dysregulation of synapse-related genes, endocytosis, phagosome, and mTOR signaling pathways altered in AD early and late stages. AD cross-omics data integration with transcriptomic data from an SNCA mouse model revealed an overlapping signature. Furthermore, we leveraged single-nuclei RNA-seq data to identify distinct cell-types that most likely mediate molecular profiles. Lastly, we identified that the multimodal clusters uncovered cerebrospinal fluid biomarkers poised to monitor AD progression and possibly cognition. Our cross-omics analyses provide novel critical molecular insights into AD.
Collapse
Affiliation(s)
- Abdallah M. Eteleeb
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
| | - Brenna C. Novotny
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Carolina Soriano Tarraga
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Christopher Sohn
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Eliza Dhungel
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, United States of America
| | - Logan Brase
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Aasritha Nallapu
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Jared Buss
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
| | - Fabiana Farias
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Kristy Bergmann
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Joseph Bradley
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Joanne Norton
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Jen Gentsch
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Fengxian Wang
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
| | - Albert A. Davis
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - John C. Morris
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - Celeste M. Karch
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- NeuroGenomics and Informatics Center, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| | - Richard J. Perrin
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Department of Neurology, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
- Department of Pathology and Immunology, Washington University, St. Louis, Missouri, United States of America
| | - Bruno A. Benitez
- Department of Neurology and Neuroscience, Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Oscar Harari
- Department of Psychiatry, Washington University, Saint Louis, St. Louis, Missouri, United States of America
- The Charles F. and Joanne Knight Alzheimer Disease Research Center, Washington University, St. Louis, Missouri, United States of America
- Hope Center for Neurological Disorders, Washington University, St. Louis, Missouri, United States of America
| |
Collapse
|
7
|
Vathrakokoili Pournara A, Miao Z, Beker OY, Nolte N, Brazma A, Papatheodorou I. CATD: a reproducible pipeline for selecting cell-type deconvolution methods across tissues. BIOINFORMATICS ADVANCES 2024; 4:vbae048. [PMID: 38638280 PMCID: PMC11023940 DOI: 10.1093/bioadv/vbae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 03/21/2024] [Indexed: 04/20/2024]
Abstract
Motivation Cell-type deconvolution methods aim to infer cell composition from bulk transcriptomic data. The proliferation of developed methods coupled with inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods. Additionally, the growing accessibility of single-cell RNA sequencing datasets, often accompanied by bulk expression from related samples enable the benchmark of existing methods. Results In this study, we conduct a comprehensive assessment of 31 methods, utilizing single-cell RNA-sequencing data from diverse human and mouse tissues. Employing various simulation scenarios, we reveal the efficacy of regression-based deconvolution methods, highlighting their sensitivity to reference choices. We investigate the impact of bulk-reference differences, incorporating variables such as sample, study and technology. We provide validation using a gold standard dataset from mononuclear cells and suggest a consensus prediction of proportions when ground truth is not available. We validated the consensus method on data from the stomach and studied its spillover effect. Importantly, we propose the use of the critical assessment of transcriptomic deconvolution (CATD) pipeline which encompasses functionalities for generating references and pseudo-bulks and running implemented deconvolution methods. CATD streamlines simultaneous deconvolution of numerous bulk samples, providing a practical solution for speeding up the evaluation of newly developed methods. Availability and implementation https://github.com/Papatheodorou-Group/CATD_snakemake.
Collapse
Affiliation(s)
- Anna Vathrakokoili Pournara
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Zhichao Miao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Open Targets, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- GMU-GIBH Joint School of Life Sciences, Guangzhou Laboratory, Guangzhou Medical University, Guangzhou, 511436, China
| | - Ozgur Yilimaz Beker
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Faculty of Engineering and Natural Sciences, Sabanci University, Tuzla 34956, Turkey
| | - Nadja Nolte
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, 121-1000, Slovenia
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
- Open Targets, Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- Earlham Institute, Norwich Research Park, Norwich NR4 7UZ, United Kingdom
| |
Collapse
|
8
|
Morita K, Mizuno T, Azuma I, Suzuki Y, Kusuhara H. Rat Deconvolution as Knowledge Miner for Immune Cell Trafficking from Toxicogenomics Databases. Toxicol Sci 2023; 197:kfad117. [PMID: 37941435 PMCID: PMC10823770 DOI: 10.1093/toxsci/kfad117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023] Open
Abstract
Toxicogenomics databases are useful for understanding biological responses in individuals because they include a diverse spectrum of biological responses. Although these databases contain no information regarding immune cells in the liver, which are important in the progression of liver injury, deconvolution that estimates cell-type proportions from bulk transcriptome could extend immune information. However, deconvolution has been mainly applied to humans and mice and less often to rats, which are the main target of toxicogenomics databases. Here, we developed a deconvolution method for rats to retrieve information regarding immune cells from toxicogenomics databases. The rat-specific deconvolution showed high correlations for several types of immune cells between spleen and blood, and between liver treated with toxicants compared with those based on human and mouse data. Additionally, we found 4 clusters of compounds in Open TG-GATEs database based on estimated immune cell trafficking, which are different from those based on transcriptome data itself. The contributions of this work are three-fold. First, we obtained the gene expression profiles of 6 rat immune cells necessary for deconvolution. Second, we clarified the importance of species differences on deconvolution. Third, we retrieved immune cell trafficking from toxicogenomics databases. Accumulated and comparable immune cell profiles of massive data of immune cell trafficking in rats could deepen our understanding of enable us to clarify the relationship between the order and the contribution rate of immune cells, chemokines and cytokines, and pathologies. Ultimately, these findings will lead to the evaluation of organ responses in Adverse Outcome Pathway.
Collapse
Affiliation(s)
- Katsuhisa Morita
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Tadahaya Mizuno
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Iori Azuma
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| | - Yutaka Suzuki
- Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Hiroyuki Kusuhara
- Department of Pharmaceutical Sciences, The University of Tokyo, Bunkyo, Tokyo, Japan
| |
Collapse
|
9
|
Meng G, Pan Y, Tang W, Zhang L, Cui Y, Schumacher FR, Wang M, Wang R, He S, Krischer J, Li Q, Feng H. imply: improving cell-type deconvolution accuracy using personalized reference profiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.27.559579. [PMID: 37808714 PMCID: PMC10557724 DOI: 10.1101/2023.09.27.559579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Real-world clinical samples are often admixtures of signal mosaics from multiple pure cell types. Using computational tools, bulk transcriptomics can be deconvoluted to solve for the abundance of constituent cell types. However, existing deconvolution methods are conditioned on the assumption that the whole study population is served by a single reference panel, which ignores person-to-person heterogeneity. Here we present imply, a novel algorithm to deconvolute cell type proportions using personalized reference panels. imply can borrow information across repeatedly measured samples for each subject, and obtain precise cell type proportion estimations. Simulation studies demonstrate reduced bias in cell type abundance estimation compared with existing methods. Real data analyses on large longitudinal consortia show more realistic deconvolution results that align with biological facts. Our results suggest that disparities in cell type proportions are associated with several disease phenotypes in type 1 diabetes and Parkinson's disease. Our proposed tool imply is available through the R/Bioconductor package ISLET at https://bioconductor.org/packages/ISLET/.
Collapse
Affiliation(s)
- Guanqun Meng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Yue Pan
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Wen Tang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Lijun Zhang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ying Cui
- Department of Biomedical Data Science, Stanford University, Stanford, 94305, CA, USA
| | - Fredrick R. Schumacher
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Ming Wang
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| | - Rui Wang
- Department of Surgery, Division of Surgical Oncology, University Hospitals Cleveland Medical Center, Cleveland, 44106, OH, USA
| | - Sijia He
- Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA
| | - Jeffrey Krischer
- Health Informatics Institute, University of South Florida, Tampa, 38105, FL, USA
| | - Qian Li
- Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, 38105, TN, USA
| | - Hao Feng
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106, OH, USA
| |
Collapse
|
10
|
Wang J, Lu L, Zheng S, Wang D, Jin L, Zhang Q, Li M, Zhang Z. DeCOOC Deconvoluted Hi-C Map Characterizes the Chromatin Architecture of Cells in Physiologically Distinctive Tissues. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2301058. [PMID: 37515382 PMCID: PMC10520690 DOI: 10.1002/advs.202301058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/06/2023] [Indexed: 07/30/2023]
Abstract
Deciphering variations in chromosome conformations based on bulk three-dimensional (3D) genomic data from heterogenous tissues is a key to understanding cell-type specific genome architecture and dynamics. Surprisingly, computational deconvolution methods for high-throughput chromosome conformation capture (Hi-C) data remain very rare in the literature. Here, a deep convolutional neural network (CNN), deconvolve bulk Hi-C data (deCOOC) that remarkably outperformed all the state-of-the-art tools in the deconvolution task is developed. Interestingly, it is noticed that the chromatin accessibility or the Hi-C contact frequency alone is insufficient to explain the power of deCOOC, suggesting the existence of a latent embedded layer of information pertaining to the cell type specific 3D genome architecture. By applying deCOOC to in-house-generated bulk Hi-C data from visceral and subcutaneous adipose tissues, it is found that the characteristic chromatin features of M2 cells in the two anatomical loci are distinctively bound to different physiological functionalities. Taken together, deCOOC is both a reliable Hi-C data deconvolution method and a powerful tool for functional extraction of 3D genome architecture.
Collapse
Affiliation(s)
- Junmei Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Lu Lu
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Shiqi Zheng
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| | - Danyang Wang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
- Sars‐Fang Centre & MOE Key Laboratory of Marine Genetics and BreedingCollege of Marine Life SciencesOcean University of ChinaQingdao266100China
| | - Long Jin
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Qing Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
| | - Mingzhou Li
- Livestock and Poultry Multiomics Key Laboratory of Ministry of Agriculture and Rural AffairsCollege of Animal Science and TechnologySichuan Agricultural UniversityChengdu611130China
- Animal Breeding and Genetics Key Laboratory of Sichuan ProvinceInstitute of Animal Genetics and BreedingSichuan Agricultural UniversityChengdu611130China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and InformationBeijing Institute of GenomicsChinese Academy of Sciences and China National Center for BioinformationBeijing100101China
- School of Life ScienceUniversity of Chinese Academy of SciencesBeijing100049China
| |
Collapse
|
11
|
Chiu Y, Ni C, Huang Y. Deconvolution of bulk gene expression profiles reveals the association between immune cell polarization and the prognosis of hepatocellular carcinoma patients. Cancer Med 2023; 12:15736-15760. [PMID: 37366298 PMCID: PMC10417088 DOI: 10.1002/cam4.6197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/02/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Many studies have utilized computational methods, including cell composition deconvolution (CCD), to correlate immune cell polarizations with the survival of cancer patients, including those with hepatocellular carcinoma (HCC). However, currently available cell deconvolution estimated (CDE) tools do not cover the wide range of immune cell changes that are known to influence tumor progression. RESULTS A new CCD tool, HCCImm, was designed to estimate the abundance of tumor cells and 16 immune cell types in the bulk gene expression profiles of HCC samples. HCCImm was validated using real datasets derived from human peripheral blood mononuclear cells (PBMCs) and HCC tissue samples, demonstrating that HCCImm outperforms other CCD tools. We used HCCImm to analyze the bulk RNA-seq datasets of The Cancer Genome Atlas (TCGA)-liver hepatocellular carcinoma (LIHC) samples. We found that the proportions of memory CD8+ T cells and Tregs were negatively associated with patient overall survival (OS). Furthermore, the proportion of naïve CD8+ T cells was positively associated with patient OS. In addition, the TCGA-LIHC samples with a high tumor mutational burden had a significantly high abundance of nonmacrophage leukocytes. CONCLUSIONS HCCImm was equipped with a new set of reference gene expression profiles that allowed for a more robust analysis of HCC patient expression data. The source code is provided at https://github.com/holiday01/HCCImm.
Collapse
Affiliation(s)
- Yen‐Jung Chiu
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Department of Biomedical EngineeringMing Chuan UniversityTaoyuanTaiwan
| | - Chung‐En Ni
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| | - Yen‐Hua Huang
- Institute of Biomedical InformaticsNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
- Center for Systems and Synthetic BiologyNational Yang Ming Chiao Tung UniversityTaipeiTaiwan
| |
Collapse
|
12
|
Alonso-Moreda N, Berral-González A, De La Rosa E, González-Velasco O, Sánchez-Santos JM, De Las Rivas J. Comparative Analysis of Cell Mixtures Deconvolution and Gene Signatures Generated for Blood, Immune and Cancer Cells. Int J Mol Sci 2023; 24:10765. [PMID: 37445946 DOI: 10.3390/ijms241310765] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/15/2023] Open
Abstract
In the last two decades, many detailed full transcriptomic studies on complex biological samples have been published and included in large gene expression repositories. These studies primarily provide a bulk expression signal for each sample, including multiple cell-types mixed within the global signal. The cellular heterogeneity in these mixtures does not allow the activity of specific genes in specific cell types to be identified. Therefore, inferring relative cellular composition is a very powerful tool to achieve a more accurate molecular profiling of complex biological samples. In recent decades, computational techniques have been developed to solve this problem by applying deconvolution methods, designed to decompose cell mixtures into their cellular components and calculate the relative proportions of these elements. Some of them only calculate the cell proportions (supervised methods), while other deconvolution algorithms can also identify the gene signatures specific for each cell type (unsupervised methods). In these work, five deconvolution methods (CIBERSORT, FARDEEP, DECONICA, LINSEED and ABIS) were implemented and used to analyze blood and immune cells, and also cancer cells, in complex mixture samples (using three bulk expression datasets). Our study provides three analytical tools (corrplots, cell-signature plots and bar-mixture plots) that allow a thorough comparative analysis of the cell mixture data. The work indicates that CIBERSORT is a robust method optimized for the identification of immune cell-types, but not as efficient in the identification of cancer cells. We also found that LINSEED is a very powerful unsupervised method that provides precise and specific gene signatures for each of the main immune cell types tested: neutrophils and monocytes (of the myeloid lineage), B-cells, NK cells and T-cells (of the lymphoid lineage), and also for cancer cells.
Collapse
Affiliation(s)
- Natalia Alonso-Moreda
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Alberto Berral-González
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Enrique De La Rosa
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| | - Oscar González-Velasco
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - José Manuel Sánchez-Santos
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
- Department of Statistics, University of Salamanca (USAL), 37008 Salamanca, Spain
| | - Javier De Las Rivas
- Cancer Research Center (CiC-IBMCC, CSIC/USAL & IBSAL), Consejo Superior de Investigaciones Científicas (CSIC), University of Salamanca (USAL), & Instituto de Investigación Biomédica de Salamanca (IBSAL), 37007 Salamanca, Spain
| |
Collapse
|
13
|
Kumar N, Gann PH, McGregor SM, Sethi A. Quantification of subtype purity in Luminal A breast cancer predicts clinical characteristics and survival. Breast Cancer Res Treat 2023:10.1007/s10549-023-06961-9. [PMID: 37209182 DOI: 10.1007/s10549-023-06961-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 04/26/2023] [Indexed: 05/22/2023]
Abstract
PURPOSE PAM50 profiling assigns each breast cancer to a single intrinsic subtype based on a bulk tissue sample. However, individual cancers may show evidence of admixture with an alternate subtype that could affect prognosis and treatment response. We developed a method to model subtype admixture using whole transcriptome data and associated it with tumor, molecular, and survival characteristics for Luminal A (LumA) samples. METHODS We combined TCGA and METABRIC cohorts and obtained transcriptome, molecular, and clinical data, which yielded 11,379 gene transcripts in common and 1,178 cases assigned to LumA. We used semi-supervised non-negative matrix factorization (ssNMF) to compute the subtype admixture proportions of the four major subtypes-pLumA, pLumB, pHER2, and pBasal-for each case and measured associations with tumor characteristics, molecular features, and survival. RESULTS Luminal A cases in the lowest versus highest quartile for pLumA transcriptomic proportion had a 27% higher prevalence of stage > 1, nearly a threefold higher prevalence of TP53 mutation, and a hazard ratio of 2.08 for overall mortality. We found positive associations between pHER2 and HER2 positivity by IHC or FISH; between pLumB and PR negativity; and between pBasal and younger age, node positivity, TP53 mutation, and EGFR expression. Predominant basal admixture, in contrast to predominant LumB or HER2 admixture, was not associated with shorter survival. CONCLUSION Bulk sampling for genomic analyses provides an opportunity to expose intratumor heterogeneity, as reflected by subtype admixture. Our results elucidate the striking extent of diversity among LumA cancers and suggest that determining the extent and type of admixture holds promise for refining individualized therapy. LumA cancers with a high degree of basal admixture appear to have distinct biological characteristics that warrant further study.
Collapse
Affiliation(s)
- Neeraj Kumar
- Alberta Machine Intelligence Institute, Edmonton, AB, Canada
| | - Peter H Gann
- Department of Pathology, College of Medicine, University of Illinois Cancer Center, University of Illinois at Chicago, Chicago, IL, USA.
| | - Stephanie M McGregor
- Department of Pathology and Laboratory Medicine, University of Wisconsin Carbone Cancer Center, University of Wisconsin-Madison, Madison, WI, USA
| | - Amit Sethi
- Department of Pathology, College of Medicine, University of Illinois Cancer Center, University of Illinois at Chicago, Chicago, IL, USA
- Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
14
|
Krum-Hansen S, Standahl Olsen K, Anderssen E, Frantzen JO, Lund E, Paulssen RH. Associations of breast cancer related exposures and gene expression profiles in normal breast tissue-The Norwegian Women and Cancer normal breast tissue study. Cancer Rep (Hoboken) 2023; 6:e1777. [PMID: 36617746 PMCID: PMC10075301 DOI: 10.1002/cnr2.1777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 11/11/2022] [Accepted: 12/12/2022] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Normal breast tissue is utilized in tissue-based studies of breast carcinogenesis. While gene expression in breast tumor tissue is well explored, our knowledge of transcriptomic signatures in normal breast tissue is still incomplete. The aim of this study was to investigate variability of gene expression in a large sample of normal breast tissue biopsies, according to breast cancer related exposures (obesity, smoking, alcohol, hormone therapy, and parity). METHODS We analyzed gene expression profiles from 311 normal breast tissue biopsies from cancer-free, post-menopausal women, using Illumina bead chip arrays. Principal component analysis and K-means clustering was used for initial analysis of the dataset. The association of exposures and covariates with gene expression was determined using linear models for microarrays. RESULTS Heterogeneity of the breast tissue and cell composition had the strongest influence on gene expression profiles. After adjusting for cell composition, obesity, smoking, and alcohol showed the highest numbers of associated genes and pathways, whereas hormone therapy and parity were associated with negligible gene expression differences. CONCLUSION Our results provide insight into associations between major exposures and gene expression profiles and provide an informative baseline for improved understanding of exposure-related molecular events in normal breast tissue of cancer-free, post-menopausal women.
Collapse
Affiliation(s)
- Sanda Krum-Hansen
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway.,Department of Hematology and Oncology, Stavanger University Hospital, Stavanger, Norway
| | - Karina Standahl Olsen
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| | - Endre Anderssen
- Genomics Support Center Tromsø (GSCT), UiT The Arctic University of Norway, Tromsø, Norway
| | - Jan Ole Frantzen
- Narvik Hospital, University Hospital of North Norway, Narvik, Norway
| | - Eiliv Lund
- Department of Community Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| | - Ruth H Paulssen
- Genomics Support Center Tromsø (GSCT), UiT The Arctic University of Norway, Tromsø, Norway.,Department of Clinical Medicine, UiT The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
15
|
Chen D, Li S, Wang X. GEOMETRIC STRUCTURE GUIDED MODEL AND ALGORITHMS FOR COMPLETE DECONVOLUTION OF GENE EXPRESSION DATA. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2022; 4:441-466. [PMID: 38250319 PMCID: PMC10798655 DOI: 10.3934/fods.2022013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Complete deconvolution analysis for bulk RNA-seq data is important and helpful to distinguish whether the differences of disease-associated GEPs (gene expression profiles) in tissues of patients and normal controls are due to changes in cellular composition of tissue samples, or due to GEPs changes in specific cells. One of the major techniques to perform complete deconvolution is nonnegative matrix factorization (NMF), which also has a wide-range of applications in the machine learning community. However, the NMF is a well-known strongly ill-posed problem, so a direct application of NMF to RNA-seq data will suffer severe difficulties in the interpretability of solutions. In this paper, we develop an NMF-based mathematical model and corresponding computational algorithms to improve the solution identifiability of deconvoluting bulk RNA-seq data. In our approach, we combine the biological concept of marker genes with the solvability conditions of the NMF theories, and develop a geometric structures guided optimization model. In this strategy, the geometric structure of bulk tissue data is first explored by the spectral clustering technique. Then, the identified information of marker genes is integrated as solvability constraints, while the overall correlation graph is used as manifold regularization. Both synthetic and biological data are used to validate the proposed model and algorithms, from which solution interpretability and accuracy are significantly improved.
Collapse
Affiliation(s)
- Duan Chen
- Department of Mathematics and Statistics School of Data Science University of North Carolina at Charlotte, USA
| | - Shaoyu Li
- Department of Mathematics and Statistics University of North Carolina at Charlotte, USA
| | - Xue Wang
- Department of Quantitative Health Sciences Mayo Clinic, Florida, 32224, USA
| |
Collapse
|
16
|
Zhang Y, Sun H, Mandava A, Aevermann BD, Kollmann TR, Scheuermann RH, Qiu X, Qian Y. FastMix: a versatile data integration pipeline for cell type-specific biomarker inference. Bioinformatics 2022; 38:4735-4744. [PMID: 36018232 PMCID: PMC9801972 DOI: 10.1093/bioinformatics/btac585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 08/18/2022] [Accepted: 08/25/2022] [Indexed: 01/07/2023] Open
Abstract
MOTIVATION Flow cytometry (FCM) and transcription profiling are the two widely used assays in translational immunology research. However, there is no data integration pipeline for analyzing these two types of assays together with experiment variables for biomarker inference. Current FCM data analysis mainly relies on subjective manual gating analysis, which is difficult to be directly integrated with other automated computational methods. Existing deconvolutional analysis of bulk transcriptomics relies on predefined marker genes in the transcriptomics data, which are unavailable for novel cell types and does not utilize the FCM data that provide canonical phenotypic definitions of the cell types. RESULTS We developed a novel analytics pipeline-FastMix-for computational immunology, which integrates flow cytometry, bulk transcriptomics and clinical covariates for identifying cell type-specific gene expression signatures and biomarker genes. FastMix addresses the 'large p, small n' problem in the gene expression and flow cytometry integration analysis via a linear mixed effects model (LMER) for both cross-sectional and longitudinal studies. Its novel moment-based estimator not only reduces bias in parameter estimation but also is more efficient than iterative optimization. The FastMix pipeline also includes a cutting-edge flow cytometry data analysis method-DAFi-for identifying cell populations of interest and their characteristics. Simulation studies showed that FastMix produced smaller type I/II errors than competing methods. Validation using real data of two vaccine studies showed that FastMix identified a consistent set of signature genes as in independent single-cell RNA-seq analysis, producing additional interesting findings. AVAILABILITY AND IMPLEMENTATION Source code of FastMix is publicly available at https://github.com/terrysun0302/FastMix. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Aishwarya Mandava
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Brian D Aevermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Tobias R Kollmann
- Systems Vaccinology, Telethon Kids Institute, Perth Children’s Hospital, University of Western Australia, Nedlands, WA 6009, Australia
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, USA,Department of Pathology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xing Qiu
- To whom correspondence should be addressed. or
| | - Yu Qian
- To whom correspondence should be addressed. or
| |
Collapse
|
17
|
Yao T, Liu Q, Tian W. Deconvolution of a Large Cohort of Placental Microarray Data Reveals Clinically Distinct Subtypes of Preeclampsia. Front Bioeng Biotechnol 2022; 10:917086. [PMID: 35910034 PMCID: PMC9326345 DOI: 10.3389/fbioe.2022.917086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 06/13/2022] [Indexed: 11/28/2022] Open
Abstract
It has been well established that the dysfunctional placenta plays an important role in the pathogenesis of preeclampsia (PE), a hypertensive disorder in pregnancy. However, it is not well understood how individual cell types in the placenta are involved in placenta dysfunction because of limited single-cell studies of placenta with PE. Given that a high-resolution single-cell atlas in the placenta is now available, deconvolution of publicly available bulk PE transcriptome data may provide us with the opportunity to investigate the contribution of individual placental cell types to PE. Recent benchmark studies on deconvolution have provided suggestions on the strategy of marker gene selection and the choice of methodologies. In this study, we experimented with these suggestions by using real bulk data with known cell-type proportions and established a deconvolution pipeline using CIBERSORT. Applying the deconvolution pipeline to a large cohort of PE placental microarray data, we found that the proportions of trophoblast cells in the placenta were significantly different between PE and normal controls. We then predicted cell-type-level expression profiles for each sample using CIBERSORTx and found that the activities of several canonical PE-related pathways were significantly altered in specific subtypes of trophoblasts in PE. Finally, we constructed an integrated expression profile for each PE sample by combining the predicted cell-type-level expression profiles of several clinically relevant placental cell types and identified four clusters likely representing four PE subtypes with clinically distinct features. As such, our study showed that deconvolution of a large cohort of placental microarray provided new insights about the molecular mechanism of PE that would not be obtained by analyzing bulk expression profiles.
Collapse
Affiliation(s)
- Tian Yao
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Human Phenome Institute, Fudan University, Shanghai, China
| | - Qiming Liu
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Computational Biology, School of Life Sciences, Fudan University, Shanghai, China
- Children’s Hospital of Fudan University, Shanghai, China
- Qilu Children’s Hospital of Shandong University, Jinan, China
- *Correspondence: Weidong Tian,
| |
Collapse
|
18
|
Karikomi M, Zhou P, Nie Q. DURIAN: an integrative deconvolution and imputation method for robust signaling analysis of single-cell transcriptomics data. Brief Bioinform 2022; 23:6609525. [PMID: 35709795 PMCID: PMC9294432 DOI: 10.1093/bib/bbac223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/29/2022] [Accepted: 05/11/2022] [Indexed: 01/31/2023] Open
Abstract
Single-cell RNA sequencing trades read-depth for dimensionality, often leading to loss of critical signaling gene information that is typically present in bulk data sets. We introduce DURIAN (Deconvolution and mUltitask-Regression-based ImputAtioN), an integrative method for recovery of gene expression in single-cell data. Through systematic benchmarking, we demonstrate the accuracy, robustness and empirical convergence of DURIAN using both synthetic and published data sets. We show that use of DURIAN improves single-cell clustering, low-dimensional embedding, and recovery of intercellular signaling networks. Our study resolves several inconsistent results of cell-cell communication analysis using single-cell or bulk data independently. The method has broad application in biomarker discovery and cell signaling analysis using single-cell transcriptomics data sets.
Collapse
Affiliation(s)
| | - Peijie Zhou
- Corresponding authors: Peijie Zhou, 540P Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993; ; Qing Nie, 540F Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993;
| | - Qing Nie
- Corresponding authors: Peijie Zhou, 540P Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993; ; Qing Nie, 540F Rowland Hall, University of California Irvine, Irvine CA 92697, USA. Tel: 949-824-5530; Fax: 949-8247993;
| |
Collapse
|
19
|
Ma W, Sharma S, Jin P, Gourley SL, Qin ZS. LRcell: detecting the source of differential expression at the sub-cell-type level from bulk RNA-seq data. Brief Bioinform 2022; 23:bbac063. [PMID: 35272348 PMCID: PMC9116223 DOI: 10.1093/bib/bbac063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 01/23/2022] [Accepted: 02/08/2022] [Indexed: 11/13/2022] Open
Abstract
Given most tissues are consist of abundant and diverse (sub-)cell types, an important yet unaddressed problem in bulk RNA-seq analysis is to identify at which (sub-)cell type(s) the differential expression occurs. Single-cell RNA-sequencing (scRNA-seq) technologies can answer the question, but they are often labor-intensive and cost-prohibitive. Here, we present LRcell, a computational method aiming to identify specific (sub-)cell type(s) that drives the changes observed in a bulk RNA-seq experiment. In addition, LRcell provides pre-embedded marker genes computed from putative scRNA-seq experiments as options to execute the analyses. We conduct a simulation study to demonstrate the effectiveness and reliability of LRcell. Using three different real datasets, we show that LRcell successfully identifies known cell types involved in psychiatric disorders. Applying LRcell to bulk RNA-seq results can produce a hypothesis on which (sub-)cell type(s) contributes to the differential expression. LRcell is complementary to cell type deconvolution methods.
Collapse
Affiliation(s)
- Wenjing Ma
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, USA
| | - Sumeet Sharma
- Graduate Program in Neuroscience, Emory University, 1462 Clifton Road NE, Atlanta, GA 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University, 1365 Clifton Road, Atlanta, GA 30322, USA
| | - Shannon L Gourley
- Department of Pediatrics, School of Medicine, Emory University, 100 Woodruff Circle, Atlanta, GA 30322, USA; Yerkes National Primate Research Center, Atlanta, GA 30322, USA
| | - Zhaohui S Qin
- Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, USA
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA
| |
Collapse
|
20
|
Boldina G, Fogel P, Rocher C, Bettembourg C, Luta G, Augé F. A2Sign: Agnostic Algorithms for Signatures-a universal method for identifying molecular signatures from transcriptomic datasets prior to cell-type deconvolution. Bioinformatics 2022; 38:1015-1021. [PMID: 34788798 DOI: 10.1093/bioinformatics/btab773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 09/17/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Molecular signatures are critical for inferring the proportions of cell types from bulk transcriptomics data. However, the identification of these signatures is based on a methodology that relies on prior biological knowledge of the cell types being studied. When working with less known biological material, a data-driven approach is required to uncover the underlying classes and generate ad hoc signatures from healthy or pathogenic tissue. RESULTS We present a new approach, A2Sign: Agnostic Algorithms for Signatures, based on a non-negative tensor factorization (NTF) strategy that allows us to identify cell-type-specific molecular signatures, greatly reduce collinearities and also account for inter-individual variability. We propose a global framework that can be applied to uncover molecular signatures for cell-type deconvolution in arbitrary tissues using bulk transcriptome data. We also present two new molecular signatures for deconvolution of up to 16 immune cell types using microarray or RNA-seq data. AVAILABILITY AND IMPLEMENTATION All steps of our analysis were implemented in annotated Python notebooks (https://github.com/paulfogel/A2SIGN). To perform NTF, we used the NMTF package, which can be downloaded using Python pip install. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Galina Boldina
- Sanofi, R&D Translational Sciences France, Bioinformatics, Sanofi, F-91385 Chilly-Mazarin Cedex, France
| | - Paul Fogel
- Consultant, F-75006 Paris, France.,Advestis, F-75008 Paris, France.,Quinten, F-75017 Paris, France
| | - Corinne Rocher
- Sanofi, R&D Translational Sciences France, Bioinformatics, Sanofi, F-91385 Chilly-Mazarin Cedex, France
| | - Charles Bettembourg
- Sanofi, R&D Translational Sciences France, Bioinformatics, Sanofi, F-91385 Chilly-Mazarin Cedex, France
| | - George Luta
- Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington, DC 20057, USA
| | - Franck Augé
- Sanofi, R&D Translational Sciences France, Bioinformatics, Sanofi, F-91385 Chilly-Mazarin Cedex, France
| |
Collapse
|
21
|
Jaakkola MK, Elo LL. Estimating cell type-specific differential expression using deconvolution. Brief Bioinform 2021; 23:6396788. [PMID: 34651640 PMCID: PMC8769698 DOI: 10.1093/bib/bbab433] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 09/17/2021] [Accepted: 09/23/2021] [Indexed: 12/02/2022] Open
Affiliation(s)
- Maria K Jaakkola
- Department of Mathematics and Statistics, University of Turku, Yliopistonmäki, 20014, Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520, Turku, Finland.,Institute of Biomedicine, University of Turku, Kiinamyllynkatu 10, FI-20520, Turku, Finland
| |
Collapse
|
22
|
Qiu Y, Wang J, Lei J, Roeder K. Identification of cell-type-specific marker genes from co-expression patterns in tissue samples. Bioinformatics 2021; 37:3228-3234. [PMID: 33904573 PMCID: PMC8504631 DOI: 10.1093/bioinformatics/btab257] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 03/15/2021] [Accepted: 04/24/2021] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Marker genes, defined as genes that are expressed primarily in a single-cell type, can be identified from the single-cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. RESULTS To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. AVAILABILITY AND IMPLEMENTATION We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yixuan Qiu
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jing Lei
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
23
|
Doostparast Torshizi A, Duan J, Wang K. A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders. NAR Genom Bioinform 2021; 3:lqab056. [PMID: 34169279 PMCID: PMC8219045 DOI: 10.1093/nargab/lqab056] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/21/2021] [Indexed: 02/06/2023] Open
Abstract
The importance of cell type-specific gene expression in disease-relevant tissues is increasingly recognized in genetic studies of complex diseases. However, most gene expression studies are conducted on bulk tissues, without examining cell type-specific expression profiles. Several computational methods are available for cell type deconvolution (i.e. inference of cellular composition) from bulk RNA-Seq data, but few of them impute cell type-specific expression profiles. We hypothesize that with external prior information such as single cell RNA-seq and population-wide expression profiles, it can be computationally tractable to estimate both cellular composition and cell type-specific expression from bulk RNA-Seq data. Here we introduce CellR, which addresses cross-individual gene expression variations to adjust the weights of cell-specific gene markers. It then transforms the deconvolution problem into a linear programming model while taking into account inter/intra cellular correlations and uses a multi-variate stochastic search algorithm to estimate the cell type-specific expression profiles. Analyses on several complex diseases such as schizophrenia, Alzheimer’s disease, Huntington’s disease and type 2 diabetes validated the efficiency of CellR, while revealing how specific cell types contribute to different diseases. In summary, CellR compares favorably against competing approaches, enabling cell type-specific re-analysis of gene expression data on bulk tissues in complex diseases.
Collapse
Affiliation(s)
- Abolfazl Doostparast Torshizi
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Jubao Duan
- Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, IL 60201, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| |
Collapse
|
24
|
Kang K, Huang C, Li Y, Umbach DM, Li L. CDSeqR: fast complete deconvolution for gene expression data from bulk tissues. BMC Bioinformatics 2021; 22:262. [PMID: 34030626 PMCID: PMC8142515 DOI: 10.1186/s12859-021-04186-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Accepted: 05/12/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. RESULT We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. CONCLUSIONS The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| | - Caizhi Huang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - David M Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA.
| |
Collapse
|
25
|
Kuksin M, Morel D, Aglave M, Danlos FX, Marabelle A, Zinovyev A, Gautheret D, Verlingue L. Applications of single-cell and bulk RNA sequencing in onco-immunology. Eur J Cancer 2021; 149:193-210. [PMID: 33866228 DOI: 10.1016/j.ejca.2021.03.005] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 02/26/2021] [Accepted: 03/04/2021] [Indexed: 02/08/2023]
Abstract
The rising interest for precise characterization of the tumour immune contexture has recently brought forward the high potential of RNA sequencing (RNA-seq) in identifying molecular mechanisms engaged in the response to immunotherapy. In this review, we provide an overview of the major principles of single-cell and conventional (bulk) RNA-seq applied to onco-immunology. We describe standard preprocessing and statistical analyses of data obtained from such techniques and highlight some computational challenges relative to the sequencing of individual cells. We notably provide examples of gene expression analyses such as differential expression analysis, dimensionality reduction, clustering and enrichment analysis. Additionally, we used public data sets to exemplify how deconvolution algorithms can identify and quantify multiple immune subpopulations from either bulk or single-cell RNA-seq. We give examples of machine and deep learning models used to predict patient outcomes and treatment effect from high-dimensional data. Finally, we balance the strengths and weaknesses of single-cell and bulk RNA-seq regarding their applications in the clinic.
Collapse
Affiliation(s)
- Maria Kuksin
- ENS de Lyon, 15 Parvis René Descartes, 69007, Lyon, France; Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Daphné Morel
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; Département de Radiothérapie, Gustave Roussy Cancer Campus, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM UMR1030, Molecular Radiotherapy and Therapeutic Innovations, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | - Marine Aglave
- INSERM US23, CNRS UMS 3655, Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France
| | | | - Aurélien Marabelle
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM U1015, Gustave Roussy, Université Paris Saclay, France
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, F-75005, Paris, France; INSERM, U900, F-75005, Paris, France; MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, F-75006, Paris, France; Laboratory of Advanced Methods for High-dimensional Data Analysis, Lobachevsky University, 603000, Nizhny Novgorod, Russia
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, UMR 9198, CEA, CNRS, Université Paris-Saclay, Gif-Sur-Yvette, France; IHU PRISM, Gustave Roussy Cancer Campus, Gustave Roussy, 114 Rue Edouard Vaillant, 94800, Villejuif, France; Université Paris-Saclay, France
| | - Loïc Verlingue
- Département d'Innovations Thérapeutiques et Essais Précoces (DITEP), Gustave Roussy Cancer Campus, 114 rue Edouard Vaillant, 94800, Villejuif, France; INSERM UMR1030, Molecular Radiotherapy and Therapeutic Innovations, Gustave Roussy, 114 rue Edouard Vaillant, 94800, Villejuif, France; Institut Curie, PSL Research University, F-75005, Paris, France; Université Paris-Saclay, France.
| |
Collapse
|
26
|
Tai AS, Tseng GC, Hsieh WP. BayICE: A Bayesian hierarchical model for semireference-based deconvolution of bulk transcriptomic data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- An-Shun Tai
- Institute of Statistics, National Tsing Hua University
| | | | | |
Collapse
|
27
|
Jaakkola MK, Elo LL. Computational deconvolution to estimate cell type-specific gene expression from bulk data. NAR Genom Bioinform 2021; 3:lqaa110. [PMID: 33575652 PMCID: PMC7803005 DOI: 10.1093/nargab/lqaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 12/14/2020] [Accepted: 12/17/2020] [Indexed: 12/24/2022] Open
Abstract
Computational deconvolution is a time and cost-efficient approach to obtain cell type-specific information from bulk gene expression of heterogeneous tissues like blood. Deconvolution can aim to either estimate cell type proportions or abundances in samples, or estimate how strongly each present cell type expresses different genes, or both tasks simultaneously. Among the two separate goals, the estimation of cell type proportions/abundances is widely studied, but less attention has been paid on defining the cell type-specific expression profiles. Here, we address this gap by introducing a novel method Rodeo and empirically evaluating it and the other available tools from multiple perspectives utilizing diverse datasets.
Collapse
Affiliation(s)
- Maria K Jaakkola
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Tykistökatu 6, FI-20520 Turku, Finland
| |
Collapse
|
28
|
Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 2020; 11:5650. [PMID: 33159064 PMCID: PMC7648640 DOI: 10.1038/s41467-020-19015-1] [Citation(s) in RCA: 230] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2019] [Accepted: 09/16/2020] [Indexed: 01/05/2023] Open
Abstract
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
Collapse
|
29
|
Groth EE, Weber M, Bahmer T, Pedersen F, Kirsten A, Börnigen D, Rabe KF, Watz H, Ammerpohl O, Goldmann T. Exploration of the sputum methylome and omics deconvolution by quadratic programming in molecular profiling of asthma and COPD: the road to sputum omics 2.0. Respir Res 2020; 21:274. [PMID: 33076907 PMCID: PMC7574293 DOI: 10.1186/s12931-020-01544-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 10/11/2020] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND To date, most studies involving high-throughput analyses of sputum in asthma and COPD have focused on identifying transcriptomic signatures of disease. No whole-genome methylation analysis of sputum cells has been performed yet. In this context, the highly variable cellular composition of sputum has potential to confound the molecular analyses. METHODS Whole-genome transcription (Agilent Human 4 × 44 k array) and methylation (Illumina 450 k BeadChip) analyses were performed on sputum samples of 9 asthmatics, 10 healthy and 10 COPD subjects. RNA integrity was checked by capillary electrophoresis and used to correct in silico for bias conferred by RNA degradation during biobank sample storage. Estimates of cell type-specific molecular profiles were derived via regression by quadratic programming based on sputum differential cell counts. All analyses were conducted using the open-source R/Bioconductor software framework. RESULTS A linear regression step was found to perform well in removing RNA degradation-related bias among the main principal components of the gene expression data, increasing the number of genes detectable as differentially expressed in asthma and COPD sputa (compared to controls). We observed a strong influence of the cellular composition on the results of mixed-cell sputum analyses. Exemplarily, upregulated genes derived from mixed-cell data in asthma were dominated by genes predominantly expressed in eosinophils after deconvolution. The deconvolution, however, allowed to perform differential expression and methylation analyses on the level of individual cell types and, though we only analyzed a limited number of biological replicates, was found to provide good estimates compared to previously published data about gene expression in lung eosinophils in asthma. Analysis of the sputum methylome indicated presence of differential methylation in genomic regions of interest, e.g. mapping to a number of human leukocyte antigen (HLA) genes related to both major histocompatibility complex (MHC) class I and II molecules in asthma and COPD macrophages. Furthermore, we found the SMAD3 (SMAD family member 3) gene, among others, to lie within differentially methylated regions which has been previously reported in the context of asthma. CONCLUSIONS In this methodology-oriented study, we show that methylation profiling can be easily integrated into sputum analysis workflows and exhibits a strong potential to contribute to the profiling and understanding of pulmonary inflammation. Wherever RNA degradation is of concern, in silico correction can be effective in improving both sensitivity and specificity of downstream analyses. We suggest that deconvolution methods should be integrated in sputum omics analysis workflows whenever possible in order to facilitate the unbiased discovery and interpretation of molecular patterns of inflammation.
Collapse
Affiliation(s)
- Espen E Groth
- LungenClinic Grosshansdorf, Großhansdorf, Germany. .,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany. .,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany. .,Department of Oncology, Hematology and BMT with Section Pneumology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Melanie Weber
- Program in Applied and Computational Mathematics, Princeton University, Princeton, NJ, USA
| | - Thomas Bahmer
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Department of Internal Medicine I, Pneumology, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany
| | - Frauke Pedersen
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Anne Kirsten
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Daniela Börnigen
- Bioinformatics Core Unit, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Klaus F Rabe
- LungenClinic Grosshansdorf, Großhansdorf, Germany.,Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany
| | - Henrik Watz
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Pulmonary Research Institute at LungenClinic Grosshansdorf, Großhansdorf, Germany
| | - Ole Ammerpohl
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Institute of Human Genetics, University Medical Center Ulm, Ulm, Germany
| | - Torsten Goldmann
- Airway Research Center North (ARCN), Member of the German Center for Lung Research (DZL), Großhansdorf, Germany.,Research Center Borstel, Pathology, Borstel, Germany
| |
Collapse
|
30
|
Bortolomeazzi M, Keddar MR, Ciccarelli FD, Benedetti L. Identification of non-cancer cells from cancer transcriptomic data. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2020; 1863:194445. [PMID: 31654804 PMCID: PMC7346884 DOI: 10.1016/j.bbagrm.2019.194445] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/20/2019] [Accepted: 10/07/2019] [Indexed: 02/07/2023]
Abstract
Interactions between cancer cells and non-cancer cells composing the tumour microenvironment play a primary role in determining cancer progression and shaping the response to therapy. The qualitative and quantitative characterisation of the different cell populations in the tumour microenvironment is therefore crucial to understand its role in cancer. In recent years, many experimental and computational approaches have been developed to identify the cell populations composing heterogeneous tissue samples, such as cancer. In this review, we describe the state-of-the-art approaches for the quantification of non-cancer cells from bulk and single-cell cancer transcriptomic data, with a focus on immune cells. We illustrate the main features of these approaches and highlight their applications for the analysis of the tumour microenvironment in solid cancers. We also discuss techniques that are complementary and alternative to RNA sequencing, particularly focusing on approaches that can provide spatial information on the distribution of the cells within the tumour in addition to their qualitative and quantitative measurements. This article is part of a Special Issue entitled: Transcriptional Profiles and Regulatory Gene Networks edited by Dr. Federico Manuel Giorgi and Dr. Shaun Mahony.
Collapse
Affiliation(s)
- Michele Bortolomeazzi
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Mohamed Reda Keddar
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK
| | - Francesca D Ciccarelli
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| | - Lorena Benedetti
- Cancer Systems Biology Laboratory, The Francis Crick Institute, London NW1 1AT, UK; School of Cancer and Pharmaceutical Sciences, King's College London, London SE11UL, UK.
| |
Collapse
|
31
|
Li H, Sharma A, Luo K, Qin ZS, Sun X, Liu H. DeconPeaker, a Deconvolution Model to Identify Cell Types Based on Chromatin Accessibility in ATAC-Seq Data of Mixture Samples. Front Genet 2020; 11:392. [PMID: 32547592 PMCID: PMC7269180 DOI: 10.3389/fgene.2020.00392] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 03/30/2020] [Indexed: 12/26/2022] Open
Abstract
While our understanding of cellular and molecular processes has grown exponentially, issues related to the cell microenvironment and cellular heterogeneity have sparked a new debate concerning the cell identity. Cell composition (chromatin and nuclear architecture) poses a strong risk for dynamic changes in the diseased condition. Since chromatin accessibility patterns play a major role in human diseases, it is therefore anticipated that a deconvolution tool based on open chromatin data will provide better performance in identifying cell composition. Herein, we have designed the deconvolution tool "DeconPeaker," which can precisely define the uniqueness among subpopulations of cells using open chromatin datasets. Using this tool, we simultaneously evaluated chromatin accessibility and gene expression datasets to estimate cell types and their respective proportions in a mixture of samples. In comparison to other known deconvolution methods, we observed the lowest average root-mean-square error (RMSE = 0.042) and the highest average correlation coefficient (r = 0.919) between the prediction and "true" proportion. As a proof-of-concept, we also tested chromatin accessibility data from acute myeloid leukemia (AML) and successfully obtained unique cell types associated with AML progression. Furthermore, we showed that chromatin accessibility represents more essential characteristics in the identification of cell types than gene expression. Taken together, DeconPeaker as a powerful tool has the potential to combine different datasets (primarily, chromatin accessibility and gene expression) and define different cell types in mixtures. The Python package of DeconPeaker is now available at https://github.com/lihuamei/DeconPeaker.
Collapse
Affiliation(s)
- Huamei Li
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Amit Sharma
- Department of Ophthalmology, University Hospital Bonn, Bonn, Germany
| | - Kun Luo
- Department of Neurosurgery, Xinjiang Evidence-Based Medicine Research Institute, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Zhaohui S. Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
32
|
Dong L, Kollipara A, Darville T, Zou F, Zheng X. Semi-CAM: A semi-supervised deconvolution method for bulk transcriptomic data with partial marker gene information. Sci Rep 2020; 10:5434. [PMID: 32214192 PMCID: PMC7096458 DOI: 10.1038/s41598-020-62330-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 02/26/2020] [Indexed: 01/03/2023] Open
Abstract
Deconvolution of bulk transcriptomics data from mixed cell populations is vital to identify the cellular mechanism of complex diseases. Existing deconvolution approaches can be divided into two major groups: supervised and unsupervised methods. Supervised deconvolution methods use cell type-specific prior information including cell proportions, reference cell type-specific gene signatures, or marker genes for each cell type, which may not be available in practice. Unsupervised methods, such as non-negative matrix factorization (NMF) and Convex Analysis of Mixtures (CAM), in contrast, completely disregard prior information and thus are not efficient for data with partial cell type-specific information. In this paper, we propose a semi-supervised deconvolution method, semi-CAM, that extends CAM by utilizing marker information from partial cell types. Analysis of simulation and two benchmark data have demonstrated that semi-CAM outperforms CAM by yielding more accurate cell proportion estimations when markers from partial/all cell types are available. In addition, when markers from all cell types are available, semi-CAM achieves better or similar accuracy compared to the supervised method using signature genes, CIBERSORT, and the marker-based supervised methods semi-NMF and DSA. Furthermore, analysis of human chlamydia-infection data with bulk expression profiles from six cell types and prior marker information of only three cell types suggests that semi-CAM achieves more accurate cell proportion estimations than CAM.
Collapse
Affiliation(s)
- Li Dong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Avinash Kollipara
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Toni Darville
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Fei Zou
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | - Xiaojing Zheng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
- Department of Pediatrics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| |
Collapse
|
33
|
Wang L, Sebra RP, Sfakianos JP, Allette K, Wang W, Yoo S, Bhardwaj N, Schadt EE, Yao X, Galsky MD, Zhu J. A reference profile-free deconvolution method to infer cancer cell-intrinsic subtypes and tumor-type-specific stromal profiles. Genome Med 2020; 12:24. [PMID: 32111252 PMCID: PMC7049190 DOI: 10.1186/s13073-020-0720-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Accepted: 02/03/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Patient stratification based on molecular subtypes is an important strategy for cancer precision medicine. Deriving clinically informative cancer molecular subtypes from transcriptomic data generated on whole tumor tissue samples is a non-trivial task, especially given the various non-cancer cellular elements intertwined with cancer cells in the tumor microenvironment. METHODS We developed a computational deconvolution method, DeClust, that stratifies patients into subtypes based on cancer cell-intrinsic signals identified by distinguishing cancer-type-specific signals from non-cancer signals in bulk tumor transcriptomic data. DeClust differs from most existing methods by directly incorporating molecular subtyping of solid tumors into the deconvolution process and outputting molecular subtype-specific tumor reference profiles for the cohort rather than individual tumor profiles. In addition, DeClust does not require reference expression profiles or signature matrices as inputs and estimates cancer-type-specific microenvironment signals from bulk tumor transcriptomic data. RESULTS DeClust was evaluated on both simulated data and 13 solid tumor datasets from The Cancer Genome Atlas (TCGA). DeClust performed among the best, relative to existing methods, for estimation of cellular composition. Compared to molecular subtypes reported by TCGA or other similar approaches, the subtypes generated by DeClust had higher correlations with cancer-intrinsic genomic alterations (e.g., somatic mutations and copy number variations) and lower correlations with tumor purity. While DeClust-identified subtypes were not more significantly associated with survival in general, DeClust identified a poor prognosis subtype of clear cell renal cancer, papillary renal cancer, and lung adenocarcinoma, all of which were characterized by CDKN2A deletions. As a reference profile-free deconvolution method, the tumor-type-specific stromal profiles and cancer cell-intrinsic subtypes generated by DeClust were supported by single-cell RNA sequencing data. CONCLUSIONS DeClust is a useful tool for cancer cell-intrinsic molecular subtyping of solid tumors. DeClust subtypes, together with the tumor-type-specific stromal profiles generated by this pan-cancer study, may lead to mechanistic and clinical insights across multiple tumor types.
Collapse
Affiliation(s)
- Li Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Sema4, a Mount Sinai venture, Stamford, CT, 06902, USA
| | - Robert P Sebra
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Sema4, a Mount Sinai venture, Stamford, CT, 06902, USA
| | - John P Sfakianos
- Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Kimaada Allette
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Wenhui Wang
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Seungyeul Yoo
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Nina Bhardwaj
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric E Schadt
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Sema4, a Mount Sinai venture, Stamford, CT, 06902, USA
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Xin Yao
- Department of Genitourinary Oncology, Tianjin Medical University Cancer Institute and Hospital, National Clinical Research Center for Cancer, Key Laboratory of Cancer Prevention and Therapy, Tianjin, China
| | - Matthew D Galsky
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Jun Zhu
- Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
- Sema4, a Mount Sinai venture, Stamford, CT, 06902, USA.
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
| |
Collapse
|
34
|
Ibarra A, Zhuang J, Zhao Y, Salathia NS, Huang V, Acosta AD, Aballi J, Toden S, Karns AP, Purnajo I, Parks JR, Guo L, Mason J, Sigal D, Nova TS, Quake SR, Nerenberg M. Non-invasive characterization of human bone marrow stimulation and reconstitution by cell-free messenger RNA sequencing. Nat Commun 2020; 11:400. [PMID: 31964864 PMCID: PMC6972916 DOI: 10.1038/s41467-019-14253-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Accepted: 12/17/2019] [Indexed: 01/13/2023] Open
Abstract
Circulating cell-free mRNA (cf-mRNA) holds great promise as a non-invasive diagnostic biomarker. However, cf-mRNA composition and its potential clinical applications remain largely unexplored. Here we show, using Next Generation Sequencing-based profiling, that cf-mRNA is enriched in transcripts derived from the bone marrow compared to circulating cells. Further, longitudinal studies involving bone marrow ablation followed by hematopoietic stem cell transplantation in multiple myeloma and acute myeloid leukemia patients indicate that cf-mRNA levels reflect the transcriptional activity of bone marrow-resident hematopoietic lineages during bone marrow reconstitution. Mechanistically, stimulation of specific bone marrow cell populations in vivo using growth factor pharmacotherapy show that cf-mRNA reflects dynamic functional changes over time associated with cellular activity. Our results shed light on the biology of the circulating transcriptome and highlight the potential utility of cf-mRNA to non-invasively monitor bone marrow involved pathologies. Circulating cell-free mRNA holds great promise as a non-invasive diagnostic biomarker. Here the authors show that cell-free mRNA captures transcripts from the bone marrow and can be used to non-invasively monitor dynamic changes in bone marrow physiology.
Collapse
Affiliation(s)
- Arkaitz Ibarra
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA.
| | - Jiali Zhuang
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Yue Zhao
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Neeraj S Salathia
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Vera Huang
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Alexander D Acosta
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Jonathan Aballi
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Shusuke Toden
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Amy P Karns
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Intan Purnajo
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Julianna R Parks
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Lucy Guo
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - James Mason
- Scripps Clinic Medical Group, Scripps Green Hospital, 10666 N Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Darren Sigal
- Scripps Clinic Medical Group, Scripps Green Hospital, 10666 N Torrey Pines Road, La Jolla, CA, 92037, USA
| | - Tina S Nova
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA
| | - Stephen R Quake
- Department of Bioengineering and Department of Applied Physics, Stanford University and Chan Zuckerberg Biohub, 318 Campus Drive, Stanford, CA, 94305, USA
| | - Michael Nerenberg
- Molecular Stethoscope, Inc., 3210 Merryfield Row, San Diego, CA, 92121, USA.
| |
Collapse
|
35
|
Lin X, Boutros PC. Optimization and expansion of non-negative matrix factorization. BMC Bioinformatics 2020; 21:7. [PMID: 31906867 PMCID: PMC6945623 DOI: 10.1186/s12859-019-3312-5] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 12/10/2019] [Indexed: 11/17/2022] Open
Abstract
Background Non-negative matrix factorization (NMF) is a technique widely used in various fields, including artificial intelligence (AI), signal processing and bioinformatics. However existing algorithms and R packages cannot be applied to large matrices due to their slow convergence or to matrices with missing entries. Besides, most NMF research focuses only on blind decompositions: decomposition without utilizing prior knowledge. Finally, the lack of well-validated methodology for choosing the rank hyperparameters also raises concern on derived results. Results We adopt the idea of sequential coordinate-wise descent to NMF to increase the convergence rate. We demonstrate that NMF can handle missing values naturally and this property leads to a novel method to determine the rank hyperparameter. Further, we demonstrate some novel applications of NMF and show how to use masking to inject prior knowledge and desirable properties to achieve a more meaningful decomposition. Conclusions We show through complexity analysis and experiments that our implementation converges faster than well-known methods. We also show that using NMF for tumour content deconvolution can achieve results similar to existing methods like ISOpure. Our proposed missing value imputation is more accurate than conventional methods like multiple imputation and comparable to missForest while achieving significantly better computational efficiency. Finally, we argue that the suggested rank tuning method based on missing value imputation is theoretically superior to existing methods. All algorithms are implemented in the R package NNLM, which is freely available on CRAN and Github.
Collapse
Affiliation(s)
- Xihui Lin
- Informatics & Biocomputing, Ontario Institute for Cancer Research, Toronto, Canada.
| | - Paul C Boutros
- Informatics & Biocomputing, Ontario Institute for Cancer Research, Toronto, Canada.,Department of Human Genetics, University of California, Los Angeles, USA.,Jonsson Comprehensive Cancer Center, University of California, Los Angeles, USA
| |
Collapse
|
36
|
Chiu YJ, Hsieh YH, Huang YH. Improved cell composition deconvolution method of bulk gene expression profiles to quantify subsets of immune cells. BMC Med Genomics 2019; 12:169. [PMID: 31856824 PMCID: PMC6923925 DOI: 10.1186/s12920-019-0613-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Accepted: 10/31/2019] [Indexed: 01/07/2023] Open
Abstract
Background To facilitate the investigation of the pathogenic roles played by various immune cells in complex tissues such as tumors, a few computational methods for deconvoluting bulk gene expression profiles to predict cell composition have been created. However, available methods were usually developed along with a set of reference gene expression profiles consisting of imbalanced replicates across different cell types. Therefore, the objective of this study was to create a new deconvolution method equipped with a new set of reference gene expression profiles that incorporate more microarray replicates of the immune cells that have been frequently implicated in the poor prognosis of cancers, such as T helper cells, regulatory T cells and macrophage M1/M2 cells. Methods Our deconvolution method was developed by choosing ε-support vector regression (ε-SVR) as the core algorithm assigned with a loss function subject to the L1-norm penalty. To construct the reference gene expression signature matrix for regression, a subset of differentially expressed genes were chosen from 148 microarray-based gene expression profiles for 9 types of immune cells by using ANOVA and minimizing condition number. Agreement analyses including mean absolute percentage errors and Bland-Altman plots were carried out to compare the performances of our method and CIBERSORT. Results In silico cell mixtures, simulated bulk tissues, and real human samples with known immune-cell fractions were used as the test datasets for benchmarking. Our method outperformed CIBERSORT in the benchmarks using in silico breast tissue-immune cell mixtures in the proportions of 30:70 and 50:50, and in the benchmark using 164 human PBMC samples. Our results suggest that the performance of our method was at least comparable to that of a state-of-the-art tool, CIBERSORT. Conclusions We developed a new cell composition deconvolution method and the implementation was entirely based on the publicly available R and Python packages. In addition, we compiled a new set of reference gene expression profiles, which might allow for a more robust prediction of the immune cell fractions from the expression profiles of cell mixtures. The source code of our method could be downloaded from https://github.com/holiday01/deconvolution-to-estimate-immune-cell-subsets.
Collapse
Affiliation(s)
- Yen-Jung Chiu
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yi-Hsuan Hsieh
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan
| | - Yen-Hua Huang
- Institute of Biomedical Informatics, National Yang-Ming University, No.155, Sec. 2, Li-Nong St., Beitou Dist, Taipei, 11221, Taiwan. .,Centre for Systems and Synthetic Biology, National Yang-Ming University, Taipei, 11221, Taiwan.
| |
Collapse
|
37
|
Kang K, Meng Q, Shats I, Umbach DM, Li M, Li Y, Li X, Li L. CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data. PLoS Comput Biol 2019; 15:e1007510. [PMID: 31790389 PMCID: PMC6907860 DOI: 10.1371/journal.pcbi.1007510] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/12/2019] [Accepted: 10/25/2019] [Indexed: 11/18/2022] Open
Abstract
Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Current approaches that address tissue heterogeneity have drawbacks. Experimental techniques, such as fluorescence-activated cell sorting, and single cell RNA sequencing are expensive. Computational approaches that use expression data from heterogeneous samples are promising, but most of the current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-Seq data from bulk tissue samples to simultaneously estimate both cell-type proportions and cell-type-specific expression profiles. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we compared CDSeq’s complete deconvolution performance with seven other established deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples. CDSeq is available at GitHub repository (MATLAB and Octave code): https://github.com/kkang7/CDSeq. Understanding the cellular composition of bulk tissues is critical to investigate the underlying mechanisms of many biological processes. Single cell sequencing is a promising technique, however, it is expensive and the analysis of single cell data is non-trivial. Therefore, tissue samples are still routinely processed in bulk. To estimate cell-type composition using bulk gene expression data, computational deconvolution methods are needed. Many deconvolution methods have been proposed, however, they often estimate only cell type proportions using a reference cell type gene expression profile, which in many cases may not be available. We present a novel complete deconvolution method that uses only bulk gene expression data to simultaneously estimate cell-type-specific gene expression profiles and sample-specific cell-type proportions. We showed that, using multiple RNA-Seq and microarray datasets where the cell-type composition was previously known, our method could accurately determine the cell-type composition. By providing a method that requires a single input to determine both cell-type proportion and cell-type-specific expression profiles, we expect that our method will be beneficial to biologists and facilitate the research and identification of mechanisms underlying many biological processes.
Collapse
Affiliation(s)
- Kai Kang
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| | - Qian Meng
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Igor Shats
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - David M. Umbach
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Melissa Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Yuanyuan Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Xiaoling Li
- Signal Transduction Laboratory, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
| | - Leping Li
- Biostatistics and Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, North Carolina, United States of America
- * E-mail: (KK); (LL)
| |
Collapse
|
38
|
Sompairac N, Nazarov PV, Czerwinska U, Cantini L, Biton A, Molkenov A, Zhumadilov Z, Barillot E, Radvanyi F, Gorban A, Kairov U, Zinovyev A. Independent Component Analysis for Unraveling the Complexity of Cancer Omics Datasets. Int J Mol Sci 2019; 20:E4414. [PMID: 31500324 PMCID: PMC6771121 DOI: 10.3390/ijms20184414] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2019] [Revised: 09/02/2019] [Accepted: 09/04/2019] [Indexed: 12/13/2022] Open
Abstract
Independent component analysis (ICA) is a matrix factorization approach where the signals captured by each individual matrix factors are optimized to become as mutually independent as possible. Initially suggested for solving source blind separation problems in various fields, ICA was shown to be successful in analyzing functional magnetic resonance imaging (fMRI) and other types of biomedical data. In the last twenty years, ICA became a part of the standard machine learning toolbox, together with other matrix factorization methods such as principal component analysis (PCA) and non-negative matrix factorization (NMF). Here, we review a number of recent works where ICA was shown to be a useful tool for unraveling the complexity of cancer biology from the analysis of different types of omics data, mainly collected for tumoral samples. Such works highlight the use of ICA in dimensionality reduction, deconvolution, data pre-processing, meta-analysis, and others applied to different data types (transcriptome, methylome, proteome, single-cell data). We particularly focus on the technical aspects of ICA application in omics studies such as using different protocols, determining the optimal number of components, assessing and improving reproducibility of the ICA results, and comparison with other popular matrix factorization techniques. We discuss the emerging ICA applications to the integrative analysis of multi-level omics datasets and introduce a conceptual view on ICA as a tool for defining functional subsystems of a complex biological system and their interactions under various conditions. Our review is accompanied by a Jupyter notebook which illustrates the discussed concepts and provides a practical tool for applying ICA to the analysis of cancer omics datasets.
Collapse
Affiliation(s)
- Nicolas Sompairac
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
- Centre de Recherches Interdisciplinaires, Université Paris Descartes, 75004 Paris, France.
| | - Petr V Nazarov
- Multiomics Data Science Research Group, Quantitative Biology Unit, Luxembourg Institute of Health (LIH), L-1445 Strassen, Luxembourg.
| | - Urszula Czerwinska
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Laura Cantini
- Computational Systems Biology Team, Institut de Biologie de l'Ecole Normale Supérieure, CNRS UMR8197, INSERM U1024, Ecole Normale Supérieure, PSL Research University, 75005 Paris, France.
| | - Anne Biton
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI, USR 3756 Institut Pasteur et CNRS), 75015 Paris, France.
| | - Askhat Molkenov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Zhaxybay Zhumadilov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
- University Medical Center, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Emmanuel Barillot
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| | - Francois Radvanyi
- Institut Curie, PSL Research University, 75005 Paris, France.
- CNRS, UMR 144, 75248 Paris, France.
| | - Alexander Gorban
- Center for Mathematical Modeling, University of Leicester, Leicester LE1 7RH, UK.
- Lobachevsky University, 603022 Nizhny Novgorod, Russia.
| | - Ulykbek Kairov
- Laboratory of Bioinformatics and Systems Biology, Center for Life Sciences, National Laboratory Astana, Nazarbayev University, 010000 Nur-Sultan, Kazakhstan.
| | - Andrei Zinovyev
- Institut Curie, PSL Research University, 75005 Paris, France.
- INSERM U900, 75248 Paris, France.
- CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.
| |
Collapse
|
39
|
Li Z, Wu H. TOAST: improving reference-free cell composition estimation by cross-cell type differential analysis. Genome Biol 2019; 20:190. [PMID: 31484546 PMCID: PMC6727351 DOI: 10.1186/s13059-019-1778-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Accepted: 07/30/2019] [Indexed: 02/07/2023] Open
Abstract
In the analysis of high-throughput data from complex samples, cell composition is an important factor that needs to be accounted for. Except for a limited number of tissues with known pure cell type profiles, a majority of genomics and epigenetics data relies on the "reference-free deconvolution" methods to estimate cell composition. We develop a novel computational method to improve reference-free deconvolution, which iteratively searches for cell type-specific features and performs composition estimation. Simulation studies and applications to six real datasets including both DNA methylation and gene expression data demonstrate favorable performance of the proposed method. TOAST is available at https://bioconductor.org/packages/TOAST .
Collapse
Affiliation(s)
- Ziyi Li
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, 30322, GA, USA.
| |
Collapse
|
40
|
Way GP, Greene CS. Discovering Pathway and Cell Type Signatures in Transcriptomic Compendia with Machine Learning. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021348] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Pathway and cell type signatures are patterns present in transcriptome data that are associated with biological processes or phenotypic consequences. These signatures result from specific cell type and pathway expression but can require large transcriptomic compendia to detect. Machine learning techniques can be powerful tools for signature discovery through their ability to provide accurate and interpretable results. In this review, we discuss various machine learning applications to extract pathway and cell type signatures from transcriptomic compendia. We focus on the biological motivations and interpretation for both supervised and unsupervised learning approaches in this setting. We consider recent advances, including deep learning, and their applications to expanding bulk and single-cell RNA data. As data and computational resources increase, there will be more opportunities for machine learning to aid in revealing biological signatures.
Collapse
Affiliation(s)
- Gregory P. Way
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
41
|
Boufaied N, Takhar M, Nash C, Erho N, Bismar TA, Davicioni E, Thomson AA. Development of a predictive model for stromal content in prostate cancer samples to improve signature performance. J Pathol 2019; 249:411-424. [PMID: 31206668 PMCID: PMC6900085 DOI: 10.1002/path.5315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Revised: 05/27/2019] [Accepted: 06/13/2019] [Indexed: 01/23/2023]
Abstract
Prostate cancer is heterogeneous in both cellular composition and patient outcome, and development of biomarker signatures to distinguish indolent from aggressive tumours is a high priority. Stroma plays an important role during prostate cancer progression and undergoes histological and transcriptional changes associated with disease. However, identification and validation of stromal markers is limited by a lack of datasets with defined stromal/tumour ratio. We have developed a prostate‐selective signature to estimate the stromal content in cancer samples of mixed cellular composition. We identified stromal‐specific markers from transcriptomic datasets of developmental prostate mesenchyme and prostate cancer stroma. These were experimentally validated in cell lines, datasets of known stromal content, and by immunohistochemistry in tissue samples to verify stromal‐specific expression. Linear models based upon six transcripts were able to infer the stromal content and estimate stromal composition in mixed tissues. The best model had a coefficient of determination R2 of 0.67. Application of our stromal content estimation model in various prostate cancer datasets led to improved performance of stromal predictive signatures for disease progression and metastasis. The stromal content of prostate tumours varies considerably; consequently, deconvolution of stromal proportion may yield better results than tumour cell deconvolution. We suggest that adjusting expression data for cell composition will improve stromal signature performance and lead to better prognosis and stratification of men with prostate cancer. © 2019 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Nadia Boufaied
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Mandeep Takhar
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Claire Nash
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| | - Nicholas Erho
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Tarek A Bismar
- Department of Pathology and Laboratory Medicine, University of Calgary Cumming School of Medicine, Calgary, Canada.,Department of Oncology, Biochemistry and Molecular Biology, University of Calgary Cumming School of Medicine, Calgary, Canada
| | - Elai Davicioni
- Research and Development, GenomeDx Biosciences, Vancouver, Canada
| | - Axel A Thomson
- Division of Urology and Cancer Research Program, McGill University Health Centre Research Institute, Quebec, Canada
| |
Collapse
|
42
|
Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures. Nat Commun 2019; 10:2209. [PMID: 31101809 PMCID: PMC6525259 DOI: 10.1038/s41467-019-09990-5] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Accepted: 04/11/2019] [Indexed: 11/08/2022] Open
Abstract
Changes in bulk transcriptional profiles of heterogeneous samples often reflect changes in proportions of individual cell types. Several robust techniques have been developed to dissect the composition of such mixed samples given transcriptional signatures of the pure components or their proportions. These approaches are insufficient, however, in situations when no information about individual mixture components is available. This problem is known as the complete deconvolution problem, where the composition is revealed without any a priori knowledge about cell types and their proportions. Here, we identify a previously unrecognized property of tissue-specific genes - their mutual linearity - and use it to reveal the structure of the topological space of mixed transcriptional profiles and provide a noise-robust approach to the complete deconvolution problem. Furthermore, our analysis reveals systematic bias of all deconvolution techniques due to differences in cell size or RNA-content, and we demonstrate how to address this bias at the experimental design level.
Collapse
|
43
|
Interleukin-8/CXCR2 signaling regulates therapy-induced plasticity and enhances tumorigenicity in glioblastoma. Cell Death Dis 2019; 10:292. [PMID: 30926789 PMCID: PMC6441047 DOI: 10.1038/s41419-019-1387-6] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 12/21/2018] [Accepted: 01/17/2019] [Indexed: 02/01/2023]
Abstract
Emerging evidence reveals enrichment of glioma-initiating cells (GICs) following therapeutic intervention. One factor known to contribute to this enrichment is cellular plasticity-the ability of glioma cells to attain multiple phenotypes. To elucidate the molecular mechanisms governing therapy-induced cellular plasticity, we performed genome-wide chromatin immunoprecipitation sequencing (ChIP-Seq) and gene expression analysis (gene microarray analysis) during treatment with standard of care temozolomide (TMZ) chemotherapy. Analysis revealed significant enhancement of open-chromatin marks in known astrocytic enhancers for interleukin-8 (IL-8) loci as well as elevated expression during anti-glioma chemotherapy. The Cancer Genome Atlas and Ivy Glioblastoma Atlas Project data demonstrated that IL-8 transcript expression is negatively correlated with GBM patient survival (p = 0.001) and positively correlated with that of genes associated with the GIC phenotypes, such as KLF4, c-Myc, and HIF2α (p < 0.001). Immunohistochemical analysis of patient samples demonstrated elevated IL-8 expression in about 60% of recurrent GBM tumors relative to matched primary tumors and this expression also positively correlates with time to recurrence. Exposure to IL-8 significantly enhanced the self-renewing capacity of PDX GBM (average threefold, p < 0.0005), as well as increasing the expression of GIC markers in the CXCR2 population. Furthermore, IL-8 knockdown significantly delayed PDX GBM tumor growth in vivo (p < 0.0005). Finally, guided by in silico analysis of TCGA data, we examined the effect of therapy-induced IL-8 expression on the epigenomic landscape of GBM cells and observed increased trimethylation of H3K9 and H3K27. Our results show that autocrine IL-8 alters cellular plasticity and mediates alterations in histone status. These findings suggest that IL-8 signaling participates in regulating GBM adaptation to therapeutic stress and therefore represents a promising target for combination with conventional chemotherapy in order to limit GBM recurrence.
Collapse
|
44
|
Recent Advances in Supervised Dimension Reduction: A Survey. MACHINE LEARNING AND KNOWLEDGE EXTRACTION 2019. [DOI: 10.3390/make1010020] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Recently, we have witnessed an explosive growth in both the quantity and dimension of data generated, which aggravates the high dimensionality challenge in tasks such as predictive modeling and decision support. Up to now, a large amount of unsupervised dimension reduction methods have been proposed and studied. However, there is no specific review focusing on the supervised dimension reduction problem. Most studies performed classification or regression after unsupervised dimension reduction methods. However, we recognize the following advantages if learning the low-dimensional representation and the classification/regression model simultaneously: high accuracy and effective representation. Considering classification or regression as being the main goal of dimension reduction, the purpose of this paper is to summarize and organize the current developments in the field into three main classes: PCA-based, Non-negative Matrix Factorization (NMF)-based, and manifold-based supervised dimension reduction methods, as well as provide elaborated discussions on their advantages and disadvantages. Moreover, we outline a dozen open problems that can be further explored to advance the development of this topic.
Collapse
|
45
|
Jiang S, Wen N, Li Z, Dube U, Del Aguila J, Budde J, Martinez R, Hsu S, Fernandez MV, Cairns NJ, Harari O, Cruchaga C, Karch CM. Integrative system biology analyses of CRISPR-edited iPSC-derived neurons and human brains reveal deficiencies of presynaptic signaling in FTLD and PSP. Transl Psychiatry 2018; 8:265. [PMID: 30546007 PMCID: PMC6293323 DOI: 10.1038/s41398-018-0319-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/05/2018] [Accepted: 11/13/2018] [Indexed: 01/12/2023] Open
Abstract
Mutations in the microtubule-associated protein tau (MAPT) gene cause autosomal dominant frontotemporal lobar degeneration with tau inclusions (FTLD-tau). MAPT p.R406W carriers present clinically with progressive memory loss and neuropathologically with neuronal and glial tauopathy. However, the pathogenic events triggered by the expression of the mutant tau protein remain poorly understood. To identify the genes and pathways that are dysregulated in FTLD-tau, we performed transcriptomic analyses in induced pluripotent stem cell (iPSC)-derived neurons carrying MAPT p.R406W and CRISPR/Cas9-corrected isogenic controls. We found that the expression of the MAPT p.R406W mutation was sufficient to create a significantly different transcriptomic profile compared with that of the isogeneic controls and to cause the differential expression of 328 genes. Sixty-one of these genes were also differentially expressed in the same direction between MAPT p.R406W carriers and pathology-free human control brains. We found that genes differentially expressed in the stem cell models and human brains were enriched for pathways involving gamma-aminobutyric acid (GABA) receptors and pre-synaptic function. The expression of GABA receptor genes, including GABRB2 and GABRG2, were consistently reduced in iPSC-derived neurons and brains from MAPT p.R406W carriers. Interestingly, we found that GABA receptor genes, including GABRB2 and GABRG2, are significantly lower in symptomatic mouse models of tauopathy, as well as in brains with progressive supranuclear palsy. Genome wide association analyses reveal that common variants within GABRB2 are associated with increased risk for frontotemporal dementia (P < 1 × 10-3). Thus, our systems biology approach, which leverages molecular data from stem cells, animal models, and human brain tissue can reveal novel disease mechanisms. Here, we demonstrate that MAPT p.R406W is sufficient to induce changes in GABA-mediated signaling and synaptic function, which may contribute to the pathogenesis of FTLD-tau and other primary tauopathies.
Collapse
Affiliation(s)
- Shan Jiang
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Natalie Wen
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Zeran Li
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Umber Dube
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Jorge Del Aguila
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - John Budde
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Rita Martinez
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Simon Hsu
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Maria V Fernandez
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA
| | - Nigel J Cairns
- Department of Pathology and Immunology, Washington University in St. Louis, School of Medicine, 660S. Euclid Ave, Campus Box 8118, Saint Louis, MO, 63110, USA
| | - Oscar Harari
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA.
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA.
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA.
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA.
| | - Celeste M Karch
- Department of Psychiatry, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8134, St. Louis, MO, 63110, USA.
- Hope Center for Neurological Disorders, Washington University School of Medicine, 660S. Euclid Ave. Campus Box 8111, St. Louis, MO, 63110, USA.
| |
Collapse
|
46
|
Hunt GJ, Freytag S, Bahlo M, Gagnon-Bartsch JA. dtangle: accurate and robust cell type deconvolution. Bioinformatics 2018; 35:2093-2099. [DOI: 10.1093/bioinformatics/bty926] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 10/20/2018] [Accepted: 11/06/2018] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Cell type composition of tissues is important in many biological processes. To help understand cell type composition using gene expression data, methods of estimating (deconvolving) cell type proportions have been developed. Such estimates are often used to adjust for confounding effects of cell type in differential expression analysis (DEA).
Results
We propose dtangle, a new cell type deconvolution method. dtangle works on a range of DNA microarray and bulk RNA-seq platforms. It estimates cell type proportions using publicly available, often cross-platform, reference data. We evaluate dtangle on 11 benchmark datasets showing that dtangle is competitive with published deconvolution methods, is robust to outliers and selection of tuning parameters, and is fast. As a case study, we investigate the human immune response to Lyme disease. dtangle’s estimates reveal a temporal trend consistent with previous findings and are important covariates for DEA across disease status.
Availability and implementation
dtangle is on CRAN (cran.r-project.org/package=dtangle) or github (dtangle.github.io).
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gregory J Hunt
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Saskia Freytag
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | - Melanie Bahlo
- Population Health and Immunity Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, VIC, Australia
- Department of Medical Biology, University of Melbourne, Parkville, VIC, Australia
| | | |
Collapse
|
47
|
Dimitrakopoulou K, Wik E, Akslen LA, Jonassen I. Deblender: a semi-/unsupervised multi-operational computational method for complete deconvolution of expression data from heterogeneous samples. BMC Bioinformatics 2018; 19:408. [PMID: 30404611 PMCID: PMC6223087 DOI: 10.1186/s12859-018-2442-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 10/22/2018] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Towards discovering robust cancer biomarkers, it is imperative to unravel the cellular heterogeneity of patient samples and comprehend the interactions between cancer cells and the various cell types in the tumor microenvironment. The first generation of 'partial' computational deconvolution methods required prior information either on the cell/tissue type proportions or the cell/tissue type-specific expression signatures and the number of involved cell/tissue types. The second generation of 'complete' approaches allowed estimating both of the cell/tissue type proportions and cell/tissue type-specific expression profiles directly from the mixed gene expression data, based on known (or automatically identified) cell/tissue type-specific marker genes. RESULTS We present Deblender, a flexible complete deconvolution tool operating in semi-/unsupervised mode based on the user's access to known marker gene lists and information about cell/tissue composition. In case of no prior knowledge, global gene expression variability is used in clustering the mixed data to substitute marker sets with cluster sets. In addition, we integrate a model selection criterion to predict the number of constituent cell/tissue types. Moreover, we provide a tailored algorithmic scheme to estimate mixture proportions for realistic experimental cases where the number of involved cell/tissue types exceeds the number of mixed samples. We assess the performance of Deblender and a set of state-of-the-art existing tools on a comprehensive set of benchmark and patient cancer mixture expression datasets (including TCGA). CONCLUSION Our results corroborate that Deblender can be a valuable tool to improve understanding of gene expression datasets with implications for prediction and clinical utilization. Deblender is implemented in MATLAB and is available from ( https://github.com/kondim1983/Deblender/ ).
Collapse
Affiliation(s)
- Konstantina Dimitrakopoulou
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway.,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Elisabeth Wik
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Lars A Akslen
- Centre for Cancer Biomarkers CCBIO, Department of Clinical Medicine, Section for Pathology, University of Bergen, Bergen, Norway.,Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Inge Jonassen
- Centre for Cancer Biomarkers CCBIO, Department of Informatics, University of Bergen, Bergen, Norway. .,Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway.
| |
Collapse
|
48
|
Petitprez F, Sun CM, Lacroix L, Sautès-Fridman C, de Reyniès A, Fridman WH. Quantitative Analyses of the Tumor Microenvironment Composition and Orientation in the Era of Precision Medicine. Front Oncol 2018; 8:390. [PMID: 30319963 PMCID: PMC6167550 DOI: 10.3389/fonc.2018.00390] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 08/30/2018] [Indexed: 11/20/2022] Open
Abstract
Tumors are formed by aggregates of cells of various origins including malignant, stromal and immune cells. The number of therapies targeting the microenvironment is increasing as the tumor microenvironment is more and more recognized as playing an essential role in tumor control. In the era of precision medicine, it is essential to precisely estimate the composition, organization and functionality of the individual patient tumor microenvironment and to find ways to therapeutically modulate it. To quantify the cell populations present in the tumor microenvironment, many tools are now available and the most recent approaches will be reviewed herein. We provide an overview of experimental and computational methodologies used to quantify tumor-associated cellular populations, including immunohistochemistry, flow and mass cytometry, bulk and single-cell transcriptomic approaches. We illustrate their respective contribution to characterize the microenvironment. We also discuss how these methods allow to guide therapeutic choices, in relation to the predictive value of some characteristics of the microenvironment.
Collapse
Affiliation(s)
- Florent Petitprez
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
| | - Cheng-Ming Sun
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Laetitia Lacroix
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Catherine Sautès-Fridman
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| | - Aurélien de Reyniès
- Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
| | - Wolf H Fridman
- INSERM, UMR_S 1138, Cordeliers Research Center, Team Cancer, Immune Control and Escape, Paris, France.,University Paris Descartes Paris 5, Sorbonne Paris Cite, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France.,Sorbonne University, UMR_S 1138, Centre de Recherche des Cordeliers, Paris, France
| |
Collapse
|
49
|
Finotello F, Trajanoski Z. Quantifying tumor-infiltrating immune cells from transcriptomics data. Cancer Immunol Immunother 2018; 67:1031-1040. [PMID: 29541787 PMCID: PMC6006237 DOI: 10.1007/s00262-018-2150-z] [Citation(s) in RCA: 269] [Impact Index Per Article: 38.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 03/09/2018] [Indexed: 12/22/2022]
Abstract
By exerting pro- and anti-tumorigenic actions, tumor-infiltrating immune cells can profoundly influence tumor progression, as well as the success of anti-cancer therapies. Therefore, the quantification of tumor-infiltrating immune cells holds the promise to unveil the multi-faceted role of the immune system in human cancers and its involvement in tumor escape mechanisms and response to therapy. Tumor-infiltrating immune cells can be quantified from RNA sequencing data of human tumors using bioinformatics approaches. In this review, we describe state-of-the-art computational methods for the quantification of immune cells from transcriptomics data and discuss the open challenges that must be addressed to accurately quantify immune infiltrates from RNA sequencing data of human bulk tumors.
Collapse
Affiliation(s)
- Francesca Finotello
- Biocenter, Division for Bioinformatics, Medical University of Innsbruck, Innrain 80, 6020, Innsbruck, Austria.
| | - Zlatko Trajanoski
- Biocenter, Division for Bioinformatics, Medical University of Innsbruck, Innrain 80, 6020, Innsbruck, Austria.
| |
Collapse
|
50
|
Li Z, Del-Aguila JL, Dube U, Budde J, Martinez R, Black K, Xiao Q, Cairns NJ, Dougherty JD, Lee JM, Morris JC, Bateman RJ, Karch CM, Cruchaga C, Harari O. Genetic variants associated with Alzheimer's disease confer different cerebral cortex cell-type population structure. Genome Med 2018; 10:43. [PMID: 29880032 PMCID: PMC5992755 DOI: 10.1186/s13073-018-0551-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Accepted: 05/15/2018] [Indexed: 12/13/2022] Open
Abstract
Background Alzheimer’s disease (AD) is characterized by neuronal loss and astrocytosis in the cerebral cortex. However, the specific effects that pathological mutations and coding variants associated with AD have on the cellular composition of the brain are often ignored. Methods We developed and optimized a cell-type-specific expression reference panel and employed digital deconvolution methods to determine brain cellular distribution in three independent transcriptomic studies. Results We found that neuronal and astrocyte relative proportions differ between healthy and diseased brains and also among AD cases that carry specific genetic risk variants. Brain carriers of pathogenic mutations in APP, PSEN1, or PSEN2 presented lower neuron and higher astrocyte relative proportions compared to sporadic AD. Similarly, the APOE ε4 allele also showed decreased neuronal and increased astrocyte relative proportions compared to AD non-carriers. In contrast, carriers of variants in TREM2 risk showed a lower degree of neuronal loss compared to matched AD cases in multiple independent studies. Conclusions These findings suggest that genetic risk factors associated with AD etiology have a specific imprinting in the cellular composition of AD brains. Our digital deconvolution reference panel provides an enhanced understanding of the fundamental molecular mechanisms underlying neurodegeneration, enabling the analysis of large bulk RNA-sequencing studies for cell composition and suggests that correcting for the cellular structure when performing transcriptomic analysis will lead to novel insights of AD. Electronic supplementary material The online version of this article (10.1186/s13073-018-0551-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zeran Li
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Jorge L Del-Aguila
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Umber Dube
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA.,Medical Scientist Training Program, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
| | - John Budde
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Rita Martinez
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Kathleen Black
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Qingli Xiao
- Department of Neurology, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
| | - Nigel J Cairns
- Department of Neurology, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA.,Department of Pathology & Immunology, Washington University in St. Louis, School of Medicine, 510 S. Kingshighway, MC 8131, Saint Louis, MO, 63110, USA.,Knight Alzheimer's Disease Research Center, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
| | | | - Joseph D Dougherty
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA.,Department of Genetics, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
| | - Jin-Moo Lee
- Department of Neurology, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
| | - John C Morris
- Department of Neurology, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA.,Knight Alzheimer's Disease Research Center, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA.,Hope Center for Neurological Disorders, Washington University School of Medicine, 660 S. Euclid Ave. B8111, St. Louis, MO, 63110, USA
| | - Randall J Bateman
- Department of Neurology, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA.,Knight Alzheimer's Disease Research Center, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA.,Hope Center for Neurological Disorders, Washington University School of Medicine, 660 S. Euclid Ave. B8111, St. Louis, MO, 63110, USA
| | - Celeste M Karch
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA
| | - Carlos Cruchaga
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA. .,Knight Alzheimer's Disease Research Center, Washington University School of Medicine, 660 S. Euclid Ave, St. Louis, MO, 63110, USA. .,Hope Center for Neurological Disorders, Washington University School of Medicine, 660 S. Euclid Ave. B8111, St. Louis, MO, 63110, USA.
| | - Oscar Harari
- Department of Psychiatry, Washington University School of Medicine, 660 S. Euclid Ave. B8134, St. Louis, MO, 63110, USA.
| |
Collapse
|