1
|
Garg T, Weiss CR, Sheth RA. Techniques for Profiling the Cellular Immune Response and Their Implications for Interventional Oncology. Cancers (Basel) 2022; 14:3628. [PMID: 35892890 PMCID: PMC9332307 DOI: 10.3390/cancers14153628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 12/07/2022] Open
Abstract
In recent years there has been increased interest in using the immune contexture of the primary tumors to predict the patient's prognosis. The tumor microenvironment of patients with cancers consists of different types of lymphocytes, tumor-infiltrating leukocytes, dendritic cells, and others. Different technologies can be used for the evaluation of the tumor microenvironment, all of which require a tissue or cell sample. Image-guided tissue sampling is a cornerstone in the diagnosis, stratification, and longitudinal evaluation of therapeutic efficacy for cancer patients receiving immunotherapies. Therefore, interventional radiologists (IRs) play an essential role in the evaluation of patients treated with systemically administered immunotherapies. This review provides a detailed description of different technologies used for immune assessment and analysis of the data collected from the use of these technologies. The detailed approach provided herein is intended to provide the reader with the knowledge necessary to not only interpret studies containing such data but also design and apply these tools for clinical practice and future research studies.
Collapse
Affiliation(s)
- Tushar Garg
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Clifford R. Weiss
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Rahul A. Sheth
- Department of Interventional Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
2
|
Del Barrio E, Inouzhe H, Loubes JM, Matrán C, Mayo-Íscar A. optimalFlow: optimal transport approach to flow cytometry gating and population matching. BMC Bioinformatics 2020; 21:479. [PMID: 33109072 PMCID: PMC7590740 DOI: 10.1186/s12859-020-03795-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 10/01/2020] [Indexed: 11/12/2022] Open
Abstract
Background Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating. Results We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow, a Bioconductor R package at https://bioconductor.org/packages/optimalFlow. Conclusions optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.
Collapse
Affiliation(s)
- Eustasio Del Barrio
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Hristo Inouzhe
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain. .,IMUVA, Calle Paseo de Belén, Valladolid, Spain.
| | - Jean-Michel Loubes
- Université Paul Sabatier, Route de Narbonne, Toulouse, France.,IMT, Route de Narbonne, Toulouse, France
| | - Carlos Matrán
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Agustín Mayo-Íscar
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| |
Collapse
|
3
|
Montante S, Brinkman RR. Flow cytometry data analysis: Recent tools and algorithms. Int J Lab Hematol 2019; 41 Suppl 1:56-62. [PMID: 31069980 DOI: 10.1111/ijlh.13016] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 02/25/2019] [Accepted: 02/26/2019] [Indexed: 12/21/2022]
Abstract
Flow cytometry (FCM) allows scientists to rapidly quantify up to 50 parameters for millions of cells per sample. The bottleneck in the application of the technology is data analysis, and the high number of parameters measured by the current generation of instruments requires the use of advanced computational algorithms to make full use of their capabilities. This review summarizes the main steps of FCM data analysis, focusing on the use of the most recent bioinformatic tools developed for an R-based programming environment. In particular, for each stage of the data analysis, libraries and packages currently available are listed, and a brief description of their functioning is included.
Collapse
Affiliation(s)
| | - Ryan R Brinkman
- Terry Fox Laboratory, BC Cancer, Vancouver, British Columbia, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
4
|
Azad A, Rajwa B, Pothen A. Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples. Front Oncol 2016; 6:188. [PMID: 27630823 PMCID: PMC5005935 DOI: 10.3389/fonc.2016.00188] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 08/08/2016] [Indexed: 01/22/2023] Open
Abstract
We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group of homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are available in the flowMatch package at www.bioconductor.org. It has been downloaded nearly 6,000 times since 2014.
Collapse
Affiliation(s)
- Ariful Azad
- Lawrence Berkeley National Laboratory, Computational Research Division , Berkeley, CA , USA
| | - Bartek Rajwa
- Bindley Bioscience Center, Purdue University , West Lafayette, IN , USA
| | - Alex Pothen
- Department of Computer Science, Purdue University , West Lafayette, IN , USA
| |
Collapse
|
5
|
Azad A, Rajwa B, Pothen A. flowVS: channel-specific variance stabilization in flow cytometry. BMC Bioinformatics 2016; 17:291. [PMID: 27465477 PMCID: PMC4964071 DOI: 10.1186/s12859-016-1083-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 05/14/2016] [Indexed: 01/21/2023] Open
Abstract
Background Comparing phenotypes of heterogeneous cell populations from multiple biological conditions is at the heart of scientific discovery based on flow cytometry (FC). When the biological signal is measured by the average expression of a biomarker, standard statistical methods require that variance be approximately stabilized in populations to be compared. Since the mean and variance of a cell population are often correlated in fluorescence-based FC measurements, a preprocessing step is needed to stabilize the within-population variances. Results We present a variance-stabilization algorithm, called flowVS, that removes the mean-variance correlations from cell populations identified in each fluorescence channel. flowVS transforms each channel from all samples of a data set by the inverse hyperbolic sine (asinh) transformation. For each channel, the parameters of the transformation are optimally selected by Bartlett’s likelihood-ratio test so that the populations attain homogeneous variances. The optimum parameters are then used to transform the corresponding channels in every sample. flowVS is therefore an explicit variance-stabilization method that stabilizes within-population variances in each channel by evaluating the homoskedasticity of clusters with a likelihood-ratio test. With two publicly available datasets, we show that flowVS removes the mean-variance dependence from raw FC data and makes the within-population variance relatively homogeneous. We demonstrate that alternative transformation techniques such as flowTrans, flowScape, logicle, and FCSTrans might not stabilize variance. Besides flow cytometry, flowVS can also be applied to stabilize variance in microarray data. With a publicly available data set we demonstrate that flowVS performs as well as the VSN software, a state-of-the-art approach developed for microarrays. Conclusions The homogeneity of variance in cell populations across FC samples is desirable when extracting features uniformly and comparing cell populations with different levels of marker expressions. The newly developed flowVS algorithm solves the variance-stabilization problem in FC and microarrays by optimally transforming data with the help of Bartlett’s likelihood-ratio test. On two publicly available FC datasets, flowVS stabilizes within-population variances more evenly than the available transformation and normalization techniques. flowVS-based variance stabilization can help in performing comparison and alignment of phenotypically identical cell populations across different samples. flowVS and the datasets used in this paper are publicly available in Bioconductor.
Collapse
Affiliation(s)
- Ariful Azad
- Computational Research Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Rd, Berkeley, 94720, CA, USA.
| | - Bartek Rajwa
- Bindley Bioscience Center, Purdue University, West Lafayette, 47907, IN, USA
| | - Alex Pothen
- Department of Computer Science, Purdue University, West Lafayette, 47907, IN, USA
| |
Collapse
|
6
|
Orlova DY, Zimmerman N, Meehan S, Meehan C, Waters J, Ghosn EEB, Filatenkov A, Kolyagin GA, Gernez Y, Tsuda S, Moore W, Moss RB, Herzenberg LA, Walther G. Earth Mover's Distance (EMD): A True Metric for Comparing Biomarker Expression Levels in Cell Populations. PLoS One 2016; 11:e0151859. [PMID: 27008164 PMCID: PMC4805242 DOI: 10.1371/journal.pone.0151859] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 03/04/2016] [Indexed: 01/26/2023] Open
Abstract
Changes in the frequencies of cell subsets that (co)express characteristic biomarkers, or levels of the biomarkers on the subsets, are widely used as indices of drug response, disease prognosis, stem cell reconstitution, etc. However, although the currently available computational "gating" tools accurately reveal subset frequencies and marker expression levels, they fail to enable statistically reliable judgements as to whether these frequencies and expression levels differ significantly between/among subject groups. Here we introduce flow cytometry data analysis pipeline which includes the Earth Mover's Distance (EMD) metric as solution to this problem. Well known as an informative quantitative measure of differences between distributions, we present three exemplary studies showing that EMD 1) reveals clinically-relevant shifts in two markers on blood basophils responding to an offending allergen; 2) shows that ablative tumor radiation induces significant changes in the murine colon cancer tumor microenvironment; and, 3) ranks immunological differences in mouse peritoneal cavity cells harvested from three genetically distinct mouse strains.
Collapse
Affiliation(s)
- Darya Y. Orlova
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Noah Zimmerman
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Stephen Meehan
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Connor Meehan
- Department of Mathematics, California Institute of Technology, Pasadena, California, United States of America
| | - Jeffrey Waters
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Eliver E. B. Ghosn
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Alexander Filatenkov
- Division of Immunology and Rheumatology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Gleb A. Kolyagin
- Independent Researcher, Menlo Park, California, United States of America
| | - Yael Gernez
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Allergy and Immunology, Mount Sinai Hospital, New York, New York, United States of America
| | - Shanel Tsuda
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Wayne Moore
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Richard B. Moss
- Center for Excellence in Pulmonary Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Leonore A. Herzenberg
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Guenther Walther
- Department of Statistics, Stanford University, Stanford, California, United States of America
| |
Collapse
|
7
|
Abstract
Multi-color flow cytometry has become a valuable and highly informative tool for diagnosis and therapeutic monitoring of patients with immune deficiencies or inflammatory disorders. However, the method complexity and error-prone conventional manual data analysis often result in a high variability between different analysts and research laboratories. Here, we provide strategies and guidelines aiming at a more standardized multi-color flow cytometric staining and unsupervised data analysis for whole blood patient samples.
Collapse
|
8
|
Hsiao C, Liu M, Stanton R, McGee M, Qian Y, Scheuermann RH. Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure. Cytometry A 2015; 89:71-88. [PMID: 26274018 PMCID: PMC5014134 DOI: 10.1002/cyto.a.22735] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 04/26/2015] [Accepted: 07/22/2015] [Indexed: 12/05/2022]
Abstract
Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry
Collapse
Affiliation(s)
- Chiaowen Hsiao
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland.,Applied Mathematics, Applied Statistics, and Scientific Computing, University of Maryland, College Park, Maryland
| | - Mengya Liu
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| | - Rick Stanton
- Department of Informatics, J. Craig Venter Institute, La Jolla, California
| | - Monnie McGee
- Department of Statistical Science, Southern Methodist University, Dallas, Texas
| | - Yu Qian
- Department of Informatics, J. Craig Venter Institute, La Jolla, California
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, California.,Department of Pathology, University of California, San Diego, California
| |
Collapse
|
9
|
Dundar M, Akova F, Yerebakan HZ, Rajwa B. A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects. BMC Bioinformatics 2014; 15:314. [PMID: 25248977 PMCID: PMC4262223 DOI: 10.1186/1471-2105-15-314] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 09/16/2014] [Indexed: 12/13/2022] Open
Abstract
Background Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems. The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way. Results We demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively. Conclusions These results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-314) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Murat Dundar
- Computer Science Department, IUPUI, 723 W, Michigan St,, 46037 Indianapolis IN, US.
| | | | | | | |
Collapse
|
10
|
Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R, Scheuermann RH. Critical assessment of automated flow cytometry data analysis techniques. Nat Methods 2013; 10:228-38. [PMID: 23396282 PMCID: PMC3906045 DOI: 10.1038/nmeth.2365] [Citation(s) in RCA: 361] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 01/14/2013] [Indexed: 12/14/2022]
Abstract
In this analysis, the authors directly compared the performance of flow cytometry data processing algorithms to manual gating approaches. The results offer information of practical utility about the performance of the algorithms as applied to different data sets and challenges. Traditional methods for flow cytometry (FCM) data processing rely on subjective manual gating. Recently, several groups have developed computational methods for identifying cell populations in multidimensional FCM data. The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of these methods on two tasks: (i) mammalian cell population identification, to determine whether automated algorithms can reproduce expert manual gating and (ii) sample classification, to determine whether analysis pipelines can identify characteristics that correlate with external variables (such as clinical outcome). This analysis presents the results of the first FlowCAP challenges. Several methods performed well as compared to manual gating or external variables using statistical performance measures, which suggests that automated methods have reached a sufficient level of maturity and accuracy for reliable use in FCM data analysis.
Collapse
Affiliation(s)
- Nima Aghaeepour
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, British Columbia, Canada
| | | | | | | | | | | | | | | | | |
Collapse
|