1
|
White BS, de Reyniès A, Newman AM, Waterfall JJ, Lamb A, Petitprez F, Lin Y, Yu R, Guerrero-Gimenez ME, Domanskyi S, Monaco G, Chung V, Banerjee J, Derrick D, Valdeolivas A, Li H, Xiao X, Wang S, Zheng F, Yang W, Catania CA, Lang BJ, Bertus TJ, Piermarocchi C, Caruso FP, Ceccarelli M, Yu T, Guo X, Bletz J, Coller J, Maecker H, Duault C, Shokoohi V, Patel S, Liliental JE, Simon S, Saez-Rodriguez J, Heiser LM, Guinney J, Gentles AJ. Community assessment of methods to deconvolve cellular composition from bulk gene expression. Nat Commun 2024; 15:7362. [PMID: 39191725 PMCID: PMC11350143 DOI: 10.1038/s41467-024-50618-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 07/11/2024] [Indexed: 08/29/2024] Open
Abstract
We evaluate deconvolution methods, which infer levels of immune infiltration from bulk expression of tumor samples, through a community-wide DREAM Challenge. We assess six published and 22 community-contributed methods using in vitro and in silico transcriptional profiles of admixed cancer and healthy immune cells. Several published methods predict most cell types well, though they either were not trained to evaluate all functional CD8+ T cell states or do so with low accuracy. Several community-contributed methods address this gap, including a deep learning-based approach, whose strong performance establishes the applicability of this paradigm to deconvolution. Despite being developed largely using immune cells from healthy tissues, deconvolution methods predict levels of tumor-derived immune cells well. Our admixed and purified transcriptional profiles will be a valuable resource for developing deconvolution methods, including in response to common challenges we observe across methods, such as sensitive identification of functional CD4+ T cell states.
Collapse
Affiliation(s)
- Brian S White
- Sage Bionetworks, Seattle, WA, USA
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Aurélien de Reyniès
- Centre de Recherche des Cordeliers, INSERM U1138, Université Paris Cité, Paris, France
| | - Aaron M Newman
- Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Joshua J Waterfall
- INSERM U830 and Translational Research Department, Institut Curie, PSL Research University, Paris, France
| | | | - Florent Petitprez
- Programme Cartes d'Identité des Tumeurs, Ligue Nationale Contre le Cancer, Paris, France
- MRC Centre for Reproductive Health, the Queen's Medical Research Institute, University of Edinburgh, Edinburgh, UK
| | - Yating Lin
- Xiamen University, Xiamen, Fujian, China
| | | | - Martin E Guerrero-Gimenez
- Institute of Biochemistry and Biotechnology, School of Medicine, National University of Cuyo, Mendoza, Argentina
| | | | - Gianni Monaco
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | | | | | - Daniel Derrick
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Alberto Valdeolivas
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Haojun Li
- Xiamen University, Xiamen, Fujian, China
| | - Xu Xiao
- Xiamen University, Xiamen, Fujian, China
| | - Shun Wang
- Department of Pathology, Cancer Hospital, Chinese Aacdemy of Medical Science, Beijing, China
| | | | | | - Carlos A Catania
- Laboratory of Intelligent Systems (LABSIN), Engineering School, National University of Cuyo, Mendoza, Argentina
| | - Benjamin J Lang
- Department of Radiation Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | | | | | - Francesca P Caruso
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
| | - Michele Ceccarelli
- BIOGEM Institute of Molecular Biology and Genetics, Ariano Irpino, AV, Italy
- Sylvester Comprehensive Cancer Center, Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, Florida, USA
| | | | | | | | - John Coller
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Holden Maecker
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Caroline Duault
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA, USA
| | - Vida Shokoohi
- Stanford Functional Genomics Facility, Stanford University School of Medicine, Stanford, CA, USA
| | - Shailja Patel
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | - Joanna E Liliental
- Translational Applications Service Center, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, and Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Laura M Heiser
- Department of Biomedical Engineering, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | | | - Andrew J Gentles
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Pathology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
2
|
Garmire LX, Li Y, Huang Q, Xu C, Teichmann SA, Kaminski N, Pellegrini M, Nguyen Q, Teschendorff AE. Challenges and perspectives in computational deconvolution of genomics data. Nat Methods 2024; 21:391-400. [PMID: 38374264 DOI: 10.1038/s41592-023-02166-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 12/26/2023] [Indexed: 02/21/2024]
Abstract
Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.
Collapse
Affiliation(s)
- Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
| | - Yijun Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Qianhui Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chuan Xu
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
| | | | - Naftali Kaminski
- Pulmonary, Critical Care & Sleep Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Matteo Pellegrini
- Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland and QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia
| | - Andrew E Teschendorff
- CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- UCL Cancer Institute, University College London, London, UK
| |
Collapse
|
3
|
Feng S, Calinawan A, Pugliese P, Wang P, Ceccarelli M, Petralia F, Gosline SJC. Decomprolute is a benchmarking platform designed for multiomics-based tumor deconvolution. CELL REPORTS METHODS 2024; 4:100708. [PMID: 38412834 PMCID: PMC10921018 DOI: 10.1016/j.crmeth.2024.100708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 10/23/2023] [Accepted: 01/18/2024] [Indexed: 02/29/2024]
Abstract
Tumor deconvolution enables the identification of diverse cell types that comprise solid tumors. To date, however, both the algorithms developed to deconvolve tumor samples, and the gold-standard datasets used to assess the algorithms are geared toward the analysis of gene expression (e.g., RNA sequencing) rather than protein levels. Despite the popularity of gene expression datasets, protein levels often provide a more accurate view of rare cell types. To facilitate the use, development, and reproducibility of multiomic deconvolution algorithms, we introduce Decomprolute, a Common Workflow Language framework that leverages containerization to compare tumor deconvolution algorithms across multiomic datasets. Decomprolute incorporates the large-scale multiomic datasets produced by the Clinical Proteomic Tumor Analysis Consortium (CPTAC), which include matched mRNA expression and proteomic data from thousands of tumors across multiple cancer types to build a fully open-source, containerized proteogenomic tumor deconvolution benchmarking platform. http://pnnl-compbio.github.io/decomprolute.
Collapse
Affiliation(s)
- Song Feng
- Pacific Northwest National Laboratory, Seattle, WA, USA
| | - Anna Calinawan
- Icahn School of Medicine at Mount Sinai School, New York, NY, USA
| | | | - Pei Wang
- Icahn School of Medicine at Mount Sinai School, New York, NY, USA
| | | | | | | |
Collapse
|
4
|
Xu Z, Escalera S, Pavão A, Richard M, Tu WW, Yao Q, Zhao H, Guyon I. Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform. PATTERNS 2022; 3:100543. [PMID: 35845844 PMCID: PMC9278500 DOI: 10.1016/j.patter.2022.100543] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/21/2022] [Accepted: 06/03/2022] [Indexed: 11/29/2022]
Abstract
Obtaining a standardized benchmark of computational methods is a major issue in data-science communities. Dedicated frameworks enabling fair benchmarking in a unified environment are yet to be developed. Here, we introduce Codabench, a meta-benchmark platform that is open sourced and community driven for benchmarking algorithms or software agents versus datasets or tasks. A public instance of Codabench is open to everyone free of charge and allows benchmark organizers to fairly compare submissions under the same setting (software, hardware, data, algorithms), with custom protocols and data formats. Codabench has unique features facilitating easy organization of flexible and reproducible benchmarks, such as the possibility of reusing templates of benchmarks and supplying compute resources on demand. Codabench has been used internally and externally on various applications, receiving more than 130 users and 2,500 submissions. As illustrative use cases, we introduce four diverse benchmarks covering graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Codabench facilitates flexible, easy, and reproducible benchmarking Organizers can customize benchmark design and submission format Organizers may host their own platform instance or use the public instance Four use cases in diverse domains are introduced to demonstrate the key features
In almost all communities working on data science, researchers face increasingly severe issues of reproducibility and fair comparison. Researchers work on their own version of hardware/software environment, code, and data, and consequently, the published results are hardly comparable. We introduce Codabench, a meta-benchmark platform, that is capable of flexible and easy benchmarking and supports reproducibility. Codabench is an important step toward benchmarking and reproducible research. It has been used in various communities including graph machine learning, cancer heterogeneity, clinical diagnosis, and reinforcement learning. Codabench is ready to help trendy research, e.g., artificial intelligence (AI) for science and data-centric AI.
Collapse
Affiliation(s)
- Zhen Xu
- 4Paradigm, Beijing 100085, China
- Corresponding author
| | - Sergio Escalera
- Computer Vision Center, Universitat de Barcelona, 08007 Barcelona, Spain
| | - Adrien Pavão
- LISN/CNRS/INRIA, University Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Magali Richard
- University Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, 38000 Grenoble, France
| | | | | | | | - Isabelle Guyon
- LISN/CNRS/INRIA, University Paris-Saclay, 91190 Gif-sur-Yvette, France
- ChaLearn, Berkeley, CA, USA
- Corresponding author
| |
Collapse
|