1
|
Zhang M, Zhang Y, Zhang J, Zhang J, Gao S, Li Z, Tao K, Liang X, Pan J, Zhu M. An automatic analysis and quality assurance method for lymphocyte subset identification. Clin Chem Lab Med 2024; 62:1411-1420. [PMID: 38217085 DOI: 10.1515/cclm-2023-1141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/20/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVES Lymphocyte subsets are the predictors of disease diagnosis, treatment, and prognosis. Determination of lymphocyte subsets is usually carried out by flow cytometry. Despite recent advances in flow cytometry analysis, most flow cytometry data can be challenging with manual gating, which is labor-intensive, time-consuming, and error-prone. This study aimed to develop an automated method to identify lymphocyte subsets. METHODS We propose a knowledge-driven combined with data-driven method which can gate automatically to achieve subset identification. To improve accuracy and stability, we have implemented a Loop Adjustment Gating to optimize the gating result of the lymphocyte population. Furthermore, we have incorporated an anomaly detection mechanism to issue warnings for samples that might not have been successfully analyzed, ensuring the quality of the results. RESULTS The evaluation showed a 99.2 % correlation between our method results and manual analysis with a dataset of 2,000 individual cases from lymphocyte subset assays. Our proposed method attained 97.7 % accuracy for all cases and 100 % for the high-confidence cases. With our automated method, 99.1 % of manual labor can be saved when reviewing only the low-confidence cases, while the average turnaround time required is only 29 s, reducing by 83.7 %. CONCLUSIONS Our proposed method can achieve high accuracy in flow cytometry data from lymphocyte subset assays. Additionally, it can save manual labor and reduce the turnaround time, making it have the potential for application in the laboratory.
Collapse
Affiliation(s)
- MinYang Zhang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - YaLi Zhang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JingWen Zhang
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JiaLi Zhang
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - SiYuan Gao
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - ZeChao Li
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - KangPei Tao
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - XiaoDan Liang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JianHua Pan
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - Min Zhu
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| |
Collapse
|
2
|
Caligola S, Giacobazzi L, Canè S, Vella A, Adamo A, Ugel S, Giugno R, Bronte V. GateMeClass: Gate Mining and Classification of cytometry data. Bioinformatics 2024; 40:btae322. [PMID: 38775676 PMCID: PMC11136448 DOI: 10.1093/bioinformatics/btae322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/28/2024] [Accepted: 05/17/2024] [Indexed: 05/31/2024] Open
Abstract
MOTIVATION Cytometry comprises powerful techniques for analyzing the cell heterogeneity of a biological sample by examining the expression of protein markers. These technologies impact especially the field of oncoimmunology, where cell identification is essential to analyze the tumor microenvironment. Several classification tools have been developed for the annotation of cytometry datasets, which include supervised tools that require a training set as a reference (i.e. reference-based) and semisupervised tools based on the manual definition of a marker table. The latter is closer to the traditional annotation of cytometry data based on manual gating. However, they require the manual definition of a marker table that cannot be extracted automatically in a reference-based fashion. Therefore, we are lacking methods that allow both classification approaches while maintaining the high biological interpretability given by the marker table. RESULTS We present a new tool called GateMeClass (Gate Mining and Classification) which overcomes the limitation of the current methods of classification of cytometry data allowing both semisupervised and supervised annotation based on a marker table that can be defined manually or extracted from an external annotated dataset. We measured the accuracy of GateMeClass for annotating three well-established benchmark mass cytometry datasets and one flow cytometry dataset. The performance of GateMeClass is comparable to reference-based methods and marker table-based techniques, offering greater flexibility and rapid execution times. AVAILABILITY AND IMPLEMENTATION GateMeClass is implemented in R language and is publicly available at https://github.com/simo1c/GateMeClass.
Collapse
Affiliation(s)
| | - Luca Giacobazzi
- Section of Immunology, Department of Medicine, University of Verona, Verona, Italy
| | - Stefania Canè
- Veneto Institute of Oncology IOV-IRCCS, Padova, Italy
| | - Antonio Vella
- Section of Immunology, Azienda Ospedaliera Universitaria Integrata (AOUI), Verona, Italy
| | - Annalisa Adamo
- Section of Immunology, Department of Medicine, University of Verona, Verona, Italy
| | - Stefano Ugel
- Section of Immunology, Department of Medicine, University of Verona, Verona, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
| | | |
Collapse
|
3
|
Chen Y, De Spiegelaere W, Trypsteen W, Gleerup D, Vandesompele J, Lievens A, Vynck M, Thas O. Benchmarking digital PCR partition classification methods with empirical and simulated duplex data. Brief Bioinform 2024; 25:bbae120. [PMID: 38555473 PMCID: PMC10981767 DOI: 10.1093/bib/bbae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/09/2024] [Accepted: 02/26/2024] [Indexed: 04/02/2024] Open
Abstract
Digital PCR (dPCR) is a highly accurate technique for the quantification of target nucleic acid(s). It has shown great potential in clinical applications, like tumor liquid biopsy and validation of biomarkers. Accurate classification of partitions based on end-point fluorescence intensities is crucial to avoid biased estimators of the concentration of the target molecules. We have evaluated many clustering methods, from general-purpose methods to specific methods for dPCR and flowcytometry, on both simulated and real-life data. Clustering method performance was evaluated by simulating various scenarios. Based on our extensive comparison of clustering methods, we describe the limits of these methods, and formulate guidelines for choosing an appropriate method. In addition, we have developed a novel method for simulating realistic dPCR data. The method is based on a mixture distribution of a Poisson point process and a skew-$t$ distribution, which enables the generation of irregularities of cluster shapes and randomness of partitions between clusters ('rain') as commonly observed in dPCR data. Users can fine-tune the model parameters and generate labeled datasets, using their own data as a template. Besides, the database of experimental dPCR data augmented with the labeled simulated data can serve as training and testing data for new clustering methods. The simulation method is available as an R Shiny app.
Collapse
Affiliation(s)
- Yao Chen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Ward De Spiegelaere
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Wim Trypsteen
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- Department of Internal Medicine, Ghent University and University Hospital, Belgium
| | - David Gleerup
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Jo Vandesompele
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- Department of Biomolecular Medicine, Ghent University and University Hospital, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent University and University Hospital, Belgium
- Pxlence, Belgium
| | - Antoon Lievens
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Matthijs Vynck
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Olivier Thas
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- I-BioStat, Data Science Institute, Hasselt University, Belgium
- National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Australia
| |
Collapse
|
4
|
Zhang J, Li J, Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum Vaccin Immunother 2023:2234792. [PMID: 37485833 PMCID: PMC10373621 DOI: 10.1080/21645515.2023.2234792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships and potentially developing personalized intervention strategies. Single-cell data have emerged as a promising source for identifying immune response biomarkers. In this review, we discuss the current state-of-the-art methods for immunoprofiling, including those for reducing the dimensionality of high-dimensional single-cell data and methods for clustering, classification, and prediction. We also draw attention to recent developments in data integration.
Collapse
Affiliation(s)
- Jingxuan Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jia Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
5
|
Robinson JP, Ostafe R, Iyengar SN, Rajwa B, Fischer R. Flow Cytometry: The Next Revolution. Cells 2023; 12:1875. [PMID: 37508539 PMCID: PMC10378642 DOI: 10.3390/cells12141875] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Unmasking the subtleties of the immune system requires both a comprehensive knowledge base and the ability to interrogate that system with intimate sensitivity. That task, to a considerable extent, has been handled by an iterative expansion in flow cytometry methods, both in technological capability and also in accompanying advances in informatics. As the field of fluorescence-based cytomics matured, it reached a technological barrier at around 30 parameter analyses, which stalled the field until spectral flow cytometry created a fundamental transformation that will likely lead to the potential of 100 simultaneous parameter analyses within a few years. The simultaneous advance in informatics has now become a watershed moment for the field as it competes with mature systematic approaches such as genomics and proteomics, allowing cytomics to take a seat at the multi-omics table. In addition, recent technological advances try to combine the speed of flow systems with other detection methods, in addition to fluorescence alone, which will make flow-based instruments even more indispensable in any biological laboratory. This paper outlines current approaches in cell analysis and detection methods, discusses traditional and microfluidic sorting approaches as well as next-generation instruments, and provides an early look at future opportunities that are likely to arise.
Collapse
Affiliation(s)
- J Paul Robinson
- Department of Basic Medical Sciences, Purdue University, West Lafayette, IN 47907, USA
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Raluca Ostafe
- Molecular Evolution, Protein Engineering and Production Facility (PI4D), Purdue University, West Lafayette, IN 47907, USA
| | | | - Bartek Rajwa
- Bindley Bioscience Center, Purdue University, West Lafayette, IN 47907, USA
| | - Rainer Fischer
- Department of Comparative Pathobiology, College of Veterinary Medicine, Purdue University, West Lafayette, IN 47907, USA
- Purdue Institute of Inflammation, Immunology and Infectious Diseases, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
6
|
Fuda F, Chen M, Chen W, Cox A. Artificial intelligence in clinical multiparameter flow cytometry and mass cytometry-key tools and progress. Semin Diagn Pathol 2023; 40:120-128. [PMID: 36894355 DOI: 10.1053/j.semdp.2023.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/07/2023]
Abstract
There are many research studies and emerging tools using artificial intelligence (AI) and machine learning to augment flow and mass cytometry workflows. Emerging AI tools can quickly identify common cell populations with continuous improvement of accuracy, uncover patterns in high-dimensional cytometric data that are undetectable by human analysis, facilitate the discovery of cell subpopulations, perform semi-automated immune cell profiling, and demonstrate potential to automate aspects of clinical multiparameter flow cytometric (MFC) diagnostic workflow. Utilizing AI in the analysis of cytometry samples can reduce subjective variability and assist in breakthroughs in understanding diseases. Here we review the diverse types of AI that are being applied to clinical cytometry data and how AI is driving advances in data analysis to improve diagnostic sensitivity and accuracy. We review supervised and unsupervised clustering algorithms for cell population identification, various dimensionality reduction techniques, and their utilities in visualization and machine learning pipelines, and supervised learning approaches for classifying entire cytometry samples.Understanding the AI landscape will enable pathologists to better utilize open source and commercially available tools, plan exploratory research projects to characterize diseases, and work with machine learning and data scientists to implement clinical data analysis pipelines.
Collapse
Affiliation(s)
- Franklin Fuda
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Mingyi Chen
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Weina Chen
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Andrew Cox
- Lyda Hill Department of Bioinformatics, University of Texas, Southwestern Medical Center, Dallas, Texas, USA; Department of Cell and Molecular Biology, University of Texas, Southwestern Medical Center, Dallas, Texas, USA.
| |
Collapse
|
7
|
Bruckmann C, Müller S, zu Siederdissen CH. Automatic, fast, hierarchical, and non-overlapping gating of flow cytometric data with flowEMMiv2. Comput Struct Biotechnol J 2022; 20:6473-6489. [DOI: 10.1016/j.csbj.2022.11.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
8
|
Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 2022; 13:5455. [PMID: 36114209 PMCID: PMC9481560 DOI: 10.1038/s41467-022-33136-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. It is widely used in computer science, bioscience, geoscience, and economics. Although the state-of-the-art partition-based and connectivity-based clustering methods have been developed, weak connectivity and heterogeneous density in data impede their effectiveness. In this work, we propose a boundary-seeking Clustering algorithm using the local Direction Centrality (CDC). It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We demonstrate the validity of CDC by detecting complex structured clusters in challenging synthetic datasets, identifying cell types from single-cell RNA sequencing (scRNA-seq) and mass cytometry (CyTOF) data, recognizing speakers on voice corpuses, and testifying on various types of real-world benchmarks. Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. Here the authors propose a local direction centrality clustering algorithm that copes with heterogeneous density and weak connectivity issues.
Collapse
|
9
|
Hu Z, Bhattacharya S, Butte AJ. Application of Machine Learning for Cytometry Data. Front Immunol 2022; 12:787574. [PMID: 35046945 PMCID: PMC8761933 DOI: 10.3389/fimmu.2021.787574] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/14/2021] [Indexed: 01/23/2023] Open
Abstract
Modern cytometry technologies present opportunities to profile the immune system at a single-cell resolution with more than 50 protein markers, and have been widely used in both research and clinical settings. The number of publicly available cytometry datasets is growing. However, the analysis of cytometry data remains a bottleneck due to its high dimensionality, large cell numbers, and heterogeneity between datasets. Machine learning techniques are well suited to analyze complex cytometry data and have been used in multiple facets of cytometry data analysis, including dimensionality reduction, cell population identification, and sample classification. Here, we review the existing machine learning applications for analyzing cytometry data and highlight the importance of publicly available cytometry data that enable researchers to develop and validate machine learning methods.
Collapse
Affiliation(s)
- Zicheng Hu
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, United States
| | - Sanchita Bhattacharya
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
10
|
Cheung M, Campbell JJ, Whitby L, Thomas RJ, Braybrook J, Petzing J. Current trends in flow cytometry automated data analysis software. Cytometry A 2021; 99:1007-1021. [PMID: 33606354 DOI: 10.1002/cyto.a.24320] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/21/2021] [Accepted: 01/28/2021] [Indexed: 12/16/2022]
Abstract
Automated flow cytometry (FC) data analysis tools for cell population identification and characterization are increasingly being used in academic, biotechnology, pharmaceutical, and clinical laboratories. The development of these computational methods is designed to overcome reproducibility and process bottleneck issues in manual gating, however, the take-up of these tools remains (anecdotally) low. Here, we performed a comprehensive literature survey of state-of-the-art computational tools typically published by research, clinical, and biomanufacturing laboratories for automated FC data analysis and identified popular tools based on literature citation counts. Dimensionality reduction methods ranked highly, such as generic t-distributed stochastic neighbor embedding (t-SNE) and its initial Matlab-based implementation for cytometry data viSNE. Software with graphical user interfaces also ranked highly, including PhenoGraph, SPADE1, FlowSOM, and Citrus, with unsupervised learning methods outnumbering supervised learning methods, and algorithm type popularity spread across K-Means, hierarchical, density-based, model-based, and other classes of clustering algorithms. Additionally, to illustrate the actual use typically within clinical spaces alongside frequent citations, a survey issued by UK NEQAS Leucocyte Immunophenotyping to identify software usage trends among clinical laboratories was completed. The survey revealed 53% of laboratories have not yet taken up automated cell population identification methods, though among those that have, Infinicyt software is the most frequently identified. Survey respondents considered data output quality to be the most important factor when using automated FC data analysis software, followed by software speed and level of technical support. This review found differences in software usage between biomedical institutions, with tools for discovery, data exploration, and visualization more popular in academia, whereas automated tools for specialized targeted analysis that apply supervised learning methods were more used in clinical settings.
Collapse
Affiliation(s)
- Melissa Cheung
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | | | - Liam Whitby
- UK NEQAS for Leucocyte Immunophenotyping, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Robert J Thomas
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | - Julian Braybrook
- National Measurement Laboratory, LGC, Teddington, United Kingdom
| | - Jon Petzing
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| |
Collapse
|
11
|
Liu P, Liu S, Fang Y, Xue X, Zou J, Tseng G, Konnikova L. Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data. Front Cell Dev Biol 2020; 8:234. [PMID: 32411698 PMCID: PMC7198724 DOI: 10.3389/fcell.2020.00234] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 03/20/2020] [Indexed: 11/13/2022] Open
Abstract
The progress in the field of high-dimensional cytometry has greatly increased the number of markers that can be simultaneously analyzed producing datasets with large numbers of parameters. Traditional biaxial manual gating might not be optimal for such datasets. To overcome this, a large number of automated tools have been developed to aid with cellular clustering of multi-dimensional datasets. Here were review two large categories of such tools; unsupervised and supervised clustering tools. After a thorough review of the popularity and use of each of the available unsupervised clustering tools, we focus on the top six tools to discuss their advantages and limitations. Furthermore, we employ a publicly available dataset to directly compare the usability, speed, and relative effectiveness of the available unsupervised and supervised tools. Finally, we discuss the current challenges for existing methods and future direction for the new generation of cell type identification approaches.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Silvia Liu
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yusi Fang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Xiangning Xue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jian Zou
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - George Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Liza Konnikova
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
12
|
Abstract
The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a novel approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the event that the observed data are generated by the model, one conditions on the event that the model generates data close to the observed data, in a distributional sense. When closeness is defined in terms of relative entropy, the resulting "coarsened" posterior can be approximated by simply tempering the likelihood-that is, by raising the likelihood to a fractional power-thus, inference can usually be implemented via standard algorithms, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data using mixture models and autoregressive models of unknown order.
Collapse
|
13
|
Minoura K, Abe K, Maeda Y, Nishikawa H, Shimamura T. Model-based cell clustering and population tracking for time-series flow cytometry data. BMC Bioinformatics 2019; 20:633. [PMID: 31881827 PMCID: PMC6933651 DOI: 10.1186/s12859-019-3294-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Background Modern flow cytometry technology has enabled the simultaneous analysis of multiple cell markers at the single-cell level, and it is widely used in a broad field of research. The detection of cell populations in flow cytometry data has long been dependent on “manual gating” by visual inspection. Recently, numerous software have been developed for automatic, computationally guided detection of cell populations; however, they are not designed for time-series flow cytometry data. Time-series flow cytometry data are indispensable for investigating the dynamics of cell populations that could not be elucidated by static time-point analysis. Therefore, there is a great need for tools to systematically analyze time-series flow cytometry data. Results We propose a simple and efficient statistical framework, named CYBERTRACK (CYtometry-Based Estimation and Reasoning for TRACKing cell populations), to perform clustering and cell population tracking for time-series flow cytometry data. CYBERTRACK assumes that flow cytometry data are generated from a multivariate Gaussian mixture distribution with its mixture proportion at the current time dependent on that at a previous timepoint. Using simulation data, we evaluate the performance of CYBERTRACK when estimating parameters for a multivariate Gaussian mixture distribution, tracking time-dependent transitions of mixture proportions, and detecting change-points in the overall mixture proportion. The CYBERTRACK performance is validated using two real flow cytometry datasets, which demonstrate that the population dynamics detected by CYBERTRACK are consistent with our prior knowledge of lymphocyte behavior. Conclusions Our results indicate that CYBERTRACK offers better understandings of time-dependent cell population dynamics to cytometry users by systematically analyzing time-series flow cytometry data.
Collapse
Affiliation(s)
- Kodai Minoura
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, 65 Trumumai-cho, Showa-ku, Nagoya, 4668550, Japan.,Division of Immunology, Graduate School of Medicine, Nagoya University, 65 Trumumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Ko Abe
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, 65 Trumumai-cho, Showa-ku, Nagoya, 4668550, Japan
| | - Yuka Maeda
- Division of Cancer Immunology, Research Institute/EPOC, National Cancer Center, Tokyo/Chiba, 1040045/2778577, Japan
| | - Hiroyoshi Nishikawa
- Division of Immunology, Graduate School of Medicine, Nagoya University, 65 Trumumai-cho, Showa-ku, Nagoya, 4668550, Japan.,Division of Cancer Immunology, Research Institute/EPOC, National Cancer Center, Tokyo/Chiba, 1040045/2778577, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, 65 Trumumai-cho, Showa-ku, Nagoya, 4668550, Japan.
| |
Collapse
|
14
|
Ludwig J, Zu Siederdissen CH, Liu Z, Stadler PF, Müller S. flowEMMi: an automated model-based clustering tool for microbial cytometric data. BMC Bioinformatics 2019; 20:643. [PMID: 31815609 PMCID: PMC6902487 DOI: 10.1186/s12859-019-3152-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 10/10/2019] [Indexed: 12/17/2022] Open
Abstract
Background Flow cytometry (FCM) is a powerful single-cell based measurement method to ascertain multidimensional optical properties of millions of cells. FCM is widely used in medical diagnostics and health research. There is also a broad range of applications in the analysis of complex microbial communities. The main concern in microbial community analyses is to track the dynamics of microbial subcommunities. So far, this can be achieved with the help of time-consuming manual clustering procedures that require extensive user-dependent input. In addition, several tools have recently been developed by using different approaches which, however, focus mainly on the clustering of medical FCM data or of microbial samples with a well-known background, while much less work has been done on high-throughput, online algorithms for two-channel FCM. Results We bridge this gap with flowEMMi, a model-based clustering tool based on multivariate Gaussian mixture models with subsampling and foreground/background separation. These extensions provide a fast and accurate identification of cell clusters in FCM data, in particular for microbial community FCM data that are often affected by irrelevant information like technical noise, beads or cell debris. flowEMMi outperforms other available tools with regard to running time and information content of the clustering results and provides near-online results and optional heuristics to reduce the running-time further. Conclusions flowEMMi is a useful tool for the automated cluster analysis of microbial FCM data. It overcomes the user-dependent and time-consuming manual clustering procedure and provides consistent results with ancillary information and statistical proof.
Collapse
Affiliation(s)
- Joachim Ludwig
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| | | | - Zishu Liu
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| | - Peter F Stadler
- Department of Computer Science, University Leipzig, Härtelstr. 16-18, Leipzig, 04107, Germany
| | - Susann Müller
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| |
Collapse
|
15
|
High throughput pSTAT signaling profiling by fluorescent cell barcoding and computational analysis. J Immunol Methods 2019; 477:112667. [PMID: 31726053 DOI: 10.1016/j.jim.2019.112667] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 07/08/2019] [Accepted: 09/12/2019] [Indexed: 12/31/2022]
Abstract
Fluorescent cell barcoding (FCB) is a multiplexing technique for high-throughput flow cytometry (FCM). Although powerful in minimizing staining variability, it remains a subjective FCM technique because of inter-operator variability and differences in data analysis. FCB was implemented by combining two-dye barcoding (DyLight 350 plus Pacific Orange) with five-color surface marker antibody and intracellular staining for phosphoprotein signaling analysis. We proposed a robust method to measure intra- and inter-assay variability of FCB in T/B cells and monocytes by combining range and ratio of variability to standard statistical analyses. Data analysis was carried out by conventional and semi-automated workflows and built with R software. Results obtained from both analyses were compared to assess feasibility and reproducibility of FCB data analysis by machine-learning methods. Our results showed efficient FCB using DyLight 350 and Pacific Orange at concentrations of 0, 15 or 30, and 250 μg/mL, and a high reproducibility of FCB in combination with surface marker and intracellular antibodies. Inter-operator variability was minimized by adding an internal control bridged across matrices used as rejection criterion if significant differences were present between runs. Computational workflows showed comparable results to conventional gating strategies. FCB can be used to study phosphoprotein signaling in T/B cells and monocytes with high reproducibility across operators, and the addition of bridge internal controls can further minimize inter-operator variability. This FCB protocol, which has high throughput analysis and low intra- and inter-assay variability, can be a powerful tool for clinical trial studies. Moreover, FCB data can be reliably analyzed using computational software.
Collapse
|
16
|
Quantification and isolation of Bacillus subtilis spores using cell sorting and automated gating. PLoS One 2019; 14:e0219892. [PMID: 31356641 PMCID: PMC6663000 DOI: 10.1371/journal.pone.0219892] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 07/04/2019] [Indexed: 01/22/2023] Open
Abstract
The Gram-positive bacterium Bacillus subtilis is able to form endospores which have a variety of biotechnological applications. Due to this ability, B. subtilis is as well a model organism for cellular differentiation processes. Sporulating cultures of B. subtilis form sub-populations which include vegetative cells, sporulating cells and spores. In order to readily and rapidly quantify spore formation we employed flow cytometric and fluorescence activated cell sorting techniques in combination with nucleic acid fluorescent staining in order to investigate the distribution of sporulating cultures on a single cell level. Automated gating procedures using Gaussian mixture modeling (GMM) were employed to avoid subjective gating and allow for the simultaneous measurement of controls. We utilized the presented method for monitoring sporulation over time in germination deficient strains harboring different genome modifications. A decrease in the sporulation efficiency of strain Bs02018, utilized for the display of sfGFP on the spores surface was observed. On the contrary, a double knock-out mutant of the phosphatase gene encoding Spo0E and of the spore killing factor SkfA (Bs02025) exhibited the highest sporulation efficiency, as within 24 h of cultivation in sporulation medium, cultures of BS02025 already consisted of 80% spores as opposed to 18% for the control strain. We confirmed the identity of the different subpopulations formed during sporulation by employing sorting and microscopy.
Collapse
|
17
|
Reiter M, Diem M, Schumich A, Maurer-Granofszky M, Karawajew L, Rossi JG, Ratei R, Groeneveld-Krentz S, Sajaroff EO, Suhendra S, Kampel M, Dworzak MN. Automated Flow Cytometric MRD Assessment in Childhood Acute B- Lymphoblastic Leukemia Using Supervised Machine Learning. Cytometry A 2019; 95:966-975. [PMID: 31282025 DOI: 10.1002/cyto.a.23852] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/30/2019] [Accepted: 05/28/2019] [Indexed: 12/22/2022]
Abstract
Minimal residual disease (MRD) as measured by multiparameter flow cytometry (FCM) is an independent and strong prognostic factor in B-cell acute lymphoblastic leukemia (B-ALL). However, reliable flow cytometric detection of MRD strongly depends on operator skills and expert knowledge. Hence, an objective, automated tool for reliable FCM-MRD quantification, able to overcome the technical diversity and analytical subjectivity, would be most helpful. We developed a supervised machine learning approach using a combination of multiple Gaussian Mixture Models (GMM) as a parametric density model. The approach was used for finding the weights of a linear combination of multiple GMMs to represent new, "unseen" samples by an interpolation of stored samples. The experimental data set contained FCM-MRD data of 337 bone marrow samples collected at day 15 of induction therapy in three different laboratories from pediatric patients with B-ALL for which accurate, expert-set gates existed. We compared MRD quantification by our proposed GMM approach to operator assessments, its performance on data from different laboratories, as well as to other state-of-the-art automated read-out methods. Our proposed GMM-combination approach proved superior over support vector machines, deep neural networks, and a single GMM approach in terms of precision and average F 1 -scores. A high correlation of expert operator-based and automated MRD assessment was achieved with reliable automated MRD quantification (F 1 -scores >0.5 in more than 95% of samples) in the clinically relevant range. Although best performance was found, if test and training samples were from the same system (i.e., flow cytometer and staining panel; lowest median F 1 -score 0.92), cross-system performance remained high with a median F 1 -score above 0.85 in all settings. In conclusion, our proposed automated approach could potentially be used to assess FCM-MRD in B-ALL in an objective and standardized manner across different laboratories. © 2019 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Michael Reiter
- Immunological Diagnostics, Children's Cancer Research Institute, Vienna, Austria.,Computer Vision Lab, Faculty of Informatics, Technical University of Vienna, Vienna, Austria
| | - Markus Diem
- Immunological Diagnostics, Children's Cancer Research Institute, Vienna, Austria.,Computer Vision Lab, Faculty of Informatics, Technical University of Vienna, Vienna, Austria
| | - Angela Schumich
- Immunological Diagnostics, Children's Cancer Research Institute, Vienna, Austria
| | | | - Leonid Karawajew
- Department of Pediatric Oncology/Hematology, Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Jorge G Rossi
- Cellular Immunology Laboratory, Hospital de Pediatria "Dr. Juan P. Garrahan", Buenos Aires, Argentina
| | - Richard Ratei
- Department of Hematology, Oncology and Tumor Immunology, HELIOS Klinikum Berlin-Buch, Berlin, Germany
| | | | - Elisa O Sajaroff
- Cellular Immunology Laboratory, Hospital de Pediatria "Dr. Juan P. Garrahan", Buenos Aires, Argentina
| | | | - Martin Kampel
- Computer Vision Lab, Faculty of Informatics, Technical University of Vienna, Vienna, Austria
| | - Michael N Dworzak
- Immunological Diagnostics, Children's Cancer Research Institute, Vienna, Austria.,Labdia Labordiagnostik GmbH, Vienna, Austria
| | | |
Collapse
|
18
|
Jimenez-Carretero D, Ligos JM, Martínez-López M, Sancho D, Montoya MC. Flow Cytometry Data Preparation Guidelines for Improved Automated Phenotypic Analysis. THE JOURNAL OF IMMUNOLOGY 2019; 200:3319-3331. [PMID: 29735643 DOI: 10.4049/jimmunol.1800446] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 03/23/2018] [Indexed: 12/22/2022]
Abstract
Advances in flow cytometry (FCM) increasingly demand adoption of computational analysis tools to tackle the ever-growing data dimensionality. In this study, we tested different data input modes to evaluate how cytometry acquisition configuration and data compensation procedures affect the performance of unsupervised phenotyping tools. An analysis workflow was set up and tested for the detection of changes in reference bead subsets and in a rare subpopulation of murine lymph node CD103+ dendritic cells acquired by conventional or spectral cytometry. Raw spectral data or pseudospectral data acquired with the full set of available detectors by conventional cytometry consistently outperformed datasets acquired and compensated according to FCM standards. Our results thus challenge the paradigm of one-fluorochrome/one-parameter acquisition in FCM for unsupervised cluster-based analysis. Instead, we propose to configure instrument acquisition to use all available fluorescence detectors and to avoid integration and compensation procedures, thereby using raw spectral or pseudospectral data for improved automated phenotypic analysis.
Collapse
Affiliation(s)
- Daniel Jimenez-Carretero
- Unidad de Celómica, Área de Biología Celular y del Desarrollo, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid E28029, Spain; and
| | - José M Ligos
- Unidad de Celómica, Área de Biología Celular y del Desarrollo, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid E28029, Spain; and
| | - María Martínez-López
- Laboratorio de Inmunobiología, Área de Fisiopatología del Miocardio, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid E28029, Spain
| | - David Sancho
- Laboratorio de Inmunobiología, Área de Fisiopatología del Miocardio, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid E28029, Spain
| | - María C Montoya
- Unidad de Celómica, Área de Biología Celular y del Desarrollo, Centro Nacional de Investigaciones Cardiovasculares Carlos III, Madrid E28029, Spain; and
| |
Collapse
|
19
|
Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R. Sequential Dirichlet process mixtures of multivariate skew $t$-distributions for model-based clustering of flow cytometry data. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Commenges D, Alkhassim C, Gottardo R, Hejblum B, Thiébaut R. cytometree: A binary tree algorithm for automatic gating in cytometry analysis. Cytometry A 2018; 93:1132-1140. [DOI: 10.1002/cyto.a.23601] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 07/19/2018] [Accepted: 08/20/2018] [Indexed: 12/13/2022]
Affiliation(s)
- Daniel Commenges
- Inserm, Bordeaux Population Health Research Center, UMR 1219, INRIA SISTM, University of Bordeaux, ISPED; 33000 Bordeaux France
- Vaccine Research Institute (VRI), Groupe Henri-Mondor Albert-Chenevier; 94010 Creteil France
| | - Chariff Alkhassim
- Inserm, Bordeaux Population Health Research Center, UMR 1219, INRIA SISTM, University of Bordeaux, ISPED; 33000 Bordeaux France
- Vaccine Research Institute (VRI), Groupe Henri-Mondor Albert-Chenevier; 94010 Creteil France
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division; Fred Hutchinson Cancer Research Center; 1100 Fairview Avenue N, Seattle Washington 98109 USA
| | - Boris Hejblum
- Inserm, Bordeaux Population Health Research Center, UMR 1219, INRIA SISTM, University of Bordeaux, ISPED; 33000 Bordeaux France
- Vaccine Research Institute (VRI), Groupe Henri-Mondor Albert-Chenevier; 94010 Creteil France
| | - Rodolphe Thiébaut
- Inserm, Bordeaux Population Health Research Center, UMR 1219, INRIA SISTM, University of Bordeaux, ISPED; 33000 Bordeaux France
- Vaccine Research Institute (VRI), Groupe Henri-Mondor Albert-Chenevier; 94010 Creteil France
| |
Collapse
|
21
|
Lee AJ, Chang I, Burel JG, Lindestam Arlehamn CS, Mandava A, Weiskopf D, Peters B, Sette A, Scheuermann RH, Qian Y. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data. Cytometry A 2018; 93:597-610. [PMID: 29665244 PMCID: PMC6030426 DOI: 10.1002/cyto.a.23371] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 02/05/2018] [Accepted: 03/15/2018] [Indexed: 11/10/2022]
Abstract
Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
| | - Ivan Chang
- J. Craig Venter Institute, La Jolla, California
| | - Julie G. Burel
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | | | | | - Daniela Weiskopf
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, La Jolla, California
- Department of Medicine, University of California, San Diego, California
| | - Richard H. Scheuermann
- J. Craig Venter Institute, La Jolla, California
- Department of Pathology, University of California, San Diego, California
| | - Yu Qian
- J. Craig Venter Institute, La Jolla, California
| |
Collapse
|
22
|
Automated analysis of acute myeloid leukemia minimal residual disease using a support vector machine. Oncotarget 2018; 7:71915-71921. [PMID: 27713120 PMCID: PMC5342132 DOI: 10.18632/oncotarget.12430] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 09/29/2016] [Indexed: 11/29/2022] Open
Abstract
We investigated the ability of support vector machines (SVM) to analyze minimal residual disease (MRD) in flow cytometry data from patients with acute myeloid leukemia (AML) automatically, objectively and standardly. The initial disease data and MRD review data in the form of 159 flow cytometry standard 3.0 files from 36 CD7-positive AML patients in whom MRD was detected more than once were exported. SVM was used for training with setting the initial disease data to 1 as the flag and setting 15 healthy persons to set 0 as the flag. Based on the two training groups, parameters were optimized, and a predictive model was built to analyze MRD data from each patient. The automated analysis results from the SVM model were compared to those obtained through conventional analysis to determine reliability. Automated analysis results based on the model did not differ from and were correlated with results obtained through conventional analysis (correlation coefficient c = 0.986, P > 0.05). Thus the SVM model could potentially be used to analyze flow cytometry-based AML MRD data automatically, objectively, and in a standardized manner.
Collapse
|
23
|
Rahim A, Meskas J, Drissler S, Yue A, Lorenc A, Laing A, Saran N, White J, Abeler-Dörner L, Hayday A, Brinkman RR. High throughput automated analysis of big flow cytometry data. Methods 2018; 134-135:164-176. [PMID: 29287915 PMCID: PMC5815930 DOI: 10.1016/j.ymeth.2017.12.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2017] [Revised: 12/07/2017] [Accepted: 12/15/2017] [Indexed: 11/20/2022] Open
Abstract
The rapid expansion of flow cytometry applications has outpaced the functionality of traditional manual analysis tools used to interpret flow cytometry data. Scientists are faced with the daunting prospect of manually identifying interesting cell populations in 50-dimensional datasets, equalling the complexity previously only reached in mass cytometry. Data can no longer be analyzed or interpreted fully by manual approaches. While automated gating has been the focus of intense efforts, there are many significant additional steps to the analytical pipeline (e.g., cleaning the raw files, event outlier detection, extracting immunophenotypes). We review the components of a customized automated analysis pipeline that can be generally applied to large scale flow cytometry data. We demonstrate these methodologies on data collected by the International Mouse Phenotyping Consortium (IMPC).
Collapse
Affiliation(s)
- Albina Rahim
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada; Department of Bioinformatics, University of British Columbia, Vancouver, BC, Canada
| | - Justin Meskas
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Sibyl Drissler
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Alice Yue
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada; School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
| | - Anna Lorenc
- Department of Immunobiology, King's College London, United Kingdom
| | - Adam Laing
- Department of Immunobiology, King's College London, United Kingdom
| | - Namita Saran
- Department of Immunobiology, King's College London, United Kingdom
| | - Jacqui White
- Wellcome Trust Sanger Institute, Hinxton, United Kingdom
| | | | - Adrian Hayday
- Department of Immunobiology, King's College London, United Kingdom; The Francis Crick Institute, London, United Kingdom
| | - Ryan R Brinkman
- Terry Fox Laboratory, British Columbia Cancer Agency, Vancouver, BC, Canada; Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
24
|
Cossarizza A, Chang HD, Radbruch A, Akdis M, Andrä I, Annunziato F, Bacher P, Barnaba V, Battistini L, Bauer WM, Baumgart S, Becher B, Beisker W, Berek C, Blanco A, Borsellino G, Boulais PE, Brinkman RR, Büscher M, Busch DH, Bushnell TP, Cao X, Cavani A, Chattopadhyay PK, Cheng Q, Chow S, Clerici M, Cooke A, Cosma A, Cosmi L, Cumano A, Dang VD, Davies D, De Biasi S, Del Zotto G, Della Bella S, Dellabona P, Deniz G, Dessing M, Diefenbach A, Di Santo J, Dieli F, Dolf A, Donnenberg VS, Dörner T, Ehrhardt GRA, Endl E, Engel P, Engelhardt B, Esser C, Everts B, Dreher A, Falk CS, Fehniger TA, Filby A, Fillatreau S, Follo M, Förster I, Foster J, Foulds GA, Frenette PS, Galbraith D, Garbi N, García-Godoy MD, Geginat J, Ghoreschi K, Gibellini L, Goettlinger C, Goodyear CS, Gori A, Grogan J, Gross M, Grützkau A, Grummitt D, Hahn J, Hammer Q, Hauser AE, Haviland DL, Hedley D, Herrera G, Herrmann M, Hiepe F, Holland T, Hombrink P, Houston JP, Hoyer BF, Huang B, Hunter CA, Iannone A, Jäck HM, Jávega B, Jonjic S, Juelke K, Jung S, Kaiser T, Kalina T, Keller B, Khan S, Kienhöfer D, Kroneis T, Kunkel D, Kurts C, Kvistborg P, Lannigan J, Lantz O, Larbi A, LeibundGut-Landmann S, Leipold MD, Levings MK, Litwin V, Liu Y, Lohoff M, Lombardi G, Lopez L, Lovett-Racke A, Lubberts E, Ludewig B, Lugli E, Maecker HT, Martrus G, Matarese G, Maueröder C, McGrath M, McInnes I, Mei HE, Melchers F, Melzer S, Mielenz D, Mills K, Mirrer D, Mjösberg J, Moore J, Moran B, Moretta A, Moretta L, Mosmann TR, Müller S, Müller W, Münz C, Multhoff G, Munoz LE, Murphy KM, Nakayama T, Nasi M, Neudörfl C, Nolan J, Nourshargh S, O'Connor JE, Ouyang W, Oxenius A, Palankar R, Panse I, Peterson P, Peth C, Petriz J, Philips D, Pickl W, Piconese S, Pinti M, Pockley AG, Podolska MJ, Pucillo C, Quataert SA, Radstake TRDJ, Rajwa B, Rebhahn JA, Recktenwald D, Remmerswaal EBM, Rezvani K, Rico LG, Robinson JP, Romagnani C, Rubartelli A, Ruckert B, Ruland J, Sakaguchi S, Sala-de-Oyanguren F, Samstag Y, Sanderson S, Sawitzki B, Scheffold A, Schiemann M, Schildberg F, Schimisky E, Schmid SA, Schmitt S, Schober K, Schüler T, Schulz AR, Schumacher T, Scotta C, Shankey TV, Shemer A, Simon AK, Spidlen J, Stall AM, Stark R, Stehle C, Stein M, Steinmetz T, Stockinger H, Takahama Y, Tarnok A, Tian Z, Toldi G, Tornack J, Traggiai E, Trotter J, Ulrich H, van der Braber M, van Lier RAW, Veldhoen M, Vento-Asturias S, Vieira P, Voehringer D, Volk HD, von Volkmann K, Waisman A, Walker R, Ward MD, Warnatz K, Warth S, Watson JV, Watzl C, Wegener L, Wiedemann A, Wienands J, Willimsky G, Wing J, Wurst P, Yu L, Yue A, Zhang Q, Zhao Y, Ziegler S, Zimmermann J. Guidelines for the use of flow cytometry and cell sorting in immunological studies. Eur J Immunol 2017; 47:1584-1797. [PMID: 29023707 PMCID: PMC9165548 DOI: 10.1002/eji.201646632] [Citation(s) in RCA: 397] [Impact Index Per Article: 56.7] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Andrea Cossarizza
- Department of Medical and Surgical Sciences for Children and Adults, Univ. of Modena and Reggio Emilia School of Medicine, Modena, Italy
| | - Hyun-Dong Chang
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Andreas Radbruch
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Mübeccel Akdis
- Swiss Institute of Allergy and Asthma Research (SIAF), University Zurich, Davos, Switzerland
| | - Immanuel Andrä
- Institut für Medizinische Mikrobiologie, Immunologie und Hygiene, Technische Universität München, Munich, Germany
| | | | | | - Vincenzo Barnaba
- Dipartimento di Medicina Interna e Specialità Mediche, Sapienza Università di Roma, Via Regina Elena 324, 00161 Rome, Italy
- Istituto Pasteur Italia-Fondazione Cenci Bolognetti, Rome, Italy
| | - Luca Battistini
- Neuroimmunology and Flow Cytometry Units, Santa Lucia Foundation, Rome, Italy
| | - Wolfgang M Bauer
- Division of Immunology, Allergy and Infectious Diseases, Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Sabine Baumgart
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Burkhard Becher
- University of Zurich, Institute of Experimental Immunology, Zürich, Switzerland
| | - Wolfgang Beisker
- Flow Cytometry Laboratory, Institute of Molecular Toxicology and Pharmacology, Helmholtz Zentrum München, German Research Center for Environmental Health
| | - Claudia Berek
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Alfonso Blanco
- Flow Cytometry Core Technologies, UCD Conway Institute, University College Dublin, Dublin, Ireland
| | - Giovanna Borsellino
- Neuroimmunology and Flow Cytometry Units, Santa Lucia Foundation, Rome, Italy
| | - Philip E Boulais
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, New York, USA
- The Ruth L. and David S. Gottesman Institute for Stem Cell and Regenerative Medicine Research, Bronx, New York, USA
| | - Ryan R Brinkman
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Martin Büscher
- Biopyhsics, R&D Engineering, Miltenyi Biotec GmbH, Bergisch Gladbach, Germany
| | - Dirk H Busch
- Institut für Medizinische Mikrobiologie, Immunologie und Hygiene, Technische Universität München, Munich, Germany
- DZIF - National Centre for Infection Research, Munich, Germany
- Focus Group ''Clinical Cell Processing and Purification", Institute for Advanced Study, Technische Universität München, Munich, Germany
| | - Timothy P Bushnell
- Department of Pediatrics and Shared Resource Laboratories, University of Rochester Medical Center, Rochester NY, United States of America
| | - Xuetao Cao
- Institute of Immunology, Zhejiang University School of Medicine, Hangzhou 310058, China
- National Key Laboratory of Medical Immunology & Institute of Immunology, Second Military Medical University, Shanghai 200433, China
- Department of Immunology & Center for Immunotherapy, Institute of Basic Medical Sciences, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing 100005, China
| | | | | | - Qingyu Cheng
- Medizinische Klinik mit Schwerpunkt Rheumatologie und Medizinische Immunolologie Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Sue Chow
- Divsion of Medical Oncology and Hematology, Princess Margaret Hospital, Toronto, Ontario, Canada
| | - Mario Clerici
- University of Milano and Don C Gnocchi Foundation IRCCS, Milano, Italy
| | - Anne Cooke
- Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Antonio Cosma
- CEA - Université Paris Sud - INSERM U, Immunology of viral infections and autoimmune diseases, France
| | - Lorenzo Cosmi
- Department of Experimental and Clinical Medicine, University of Firenze, Firenze, Italia
| | - Ana Cumano
- Lymphopoiesis Unit, Immunology Department Pasteur Institute, Paris, France
| | - Van Duc Dang
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Derek Davies
- Flow Cytometry Facility, The Francis Crick Institute, London, United Kingdom
| | - Sara De Biasi
- Department of Surgery, Medicine, Dentistry and Morphological Sciences, Univ. of Modena and Reggio Emilia, Modena, Italy
| | | | - Silvia Della Bella
- University of Milan, Department of Medical Biotechnologies and Translational Medicine
- Humanitas Clinical and Research Center, Lab of Clinical and Experimental Immunology, Rozzano, Milan, Italy
| | - Paolo Dellabona
- Experimental Immunology Unit, Head, Division of Immunology, Transplantation and Infectious Diseases, San Raffaele Scientific Institute, Milano, Italy
| | - Günnur Deniz
- Istanbul University, Aziz Sancar Institute of Experimental Medicine, Department of Immunology, Istanbul, Turkey
| | | | | | | | - Francesco Dieli
- University of Palermo, Department of Biopathology, Palermo, Italy
| | - Andreas Dolf
- Institute of Experimental Immunology, University Bonn, Bonn, Germany
| | - Vera S Donnenberg
- Department of Cardiothoracic Surgery, School of Medicine, University of Pittsburgh, PA
| | - Thomas Dörner
- Department of Medicine/Rheumatology and Clinical Immunology, Charite Universitätsmedizin Berlin, Germany
| | | | - Elmar Endl
- Department of Molecular Medicine and Experimental Immunology, (Core Facility Flow Cytometry) University of Bonn, Germany
| | - Pablo Engel
- Department of Biomedical Sciences, University of Barcelona, Barcelona, Spain
| | - Britta Engelhardt
- Professor for Immunobiology, Director, Theodor Kocher Institute, University of Bern, Bern, Switzerland
| | - Charlotte Esser
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Bart Everts
- Leiden University Medical Center, Department of Parasitology, Leiden, The Netherlands
| | - Anita Dreher
- Swiss Institute of Allergy and Asthma Research (SIAF), University Zurich, Davos, Switzerland
| | - Christine S Falk
- Institute of Transplant Immunology, IFB-Tx, MHH Hannover Medical School, Hannover, Germany
- German Center for Infectious diseases (DZIF), TTU-IICH, Hannover, Germany
| | - Todd A Fehniger
- Divisions of Hematology & Oncology, Department of Medicine, Washington University School of Medicine, St Louis, MO
| | - Andrew Filby
- The Flow Cytometry Core Facility, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| | - Simon Fillatreau
- Institut Necker-Enfants Malades (INEM), INSERM U-CNRS UMR, Paris, France
- Université Paris Descartes, Sorbonne Paris Cité, Faculté de Médecine, Paris, France
- Assistance Publique - Hôpitaux de Paris (AP-HP), Hôpital Necker Enfants Malades, Paris, France
| | - Marie Follo
- Department of Medicine I, Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Irmgard Förster
- Immunology and Environment, Life & Medical Sciences (LIMES) Institute, University of Bonn, Bonn, Germany
| | | | - Gemma A Foulds
- John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, UK
| | - Paul S Frenette
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, New York, USA
- Department of Medicine, Albert Einstein College of Medicine, Bronx, New York, USA
| | - David Galbraith
- University of Arizona, Bio Institute, School of Plant Sciences and Arizona Cancer Center, Tucson, Arizona, USA
| | - Natalio Garbi
- Institute of Experimental Immunology, University Bonn, Bonn, Germany
- Department of Molecular Immunology, Institute of Experimental Immunology, Bonn, Germany
| | | | - Jens Geginat
- INGM, Istituto Nazionale Genetica Molecolare "Romeo ed Enrica Invernizzi", Milan, Italy
| | - Kamran Ghoreschi
- Flow Cytometry Core Facility, Department of Dermatology, University Medical Center, Eberhard Karls University Tübingen, Germany
| | - Lara Gibellini
- Department of Surgery, Medicine, Dentistry and Morphological Sciences, Univ. of Modena and Reggio Emilia, Modena, Italy
| | | | - Carl S Goodyear
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow
| | - Andrea Gori
- Clinic of Infectious Diseases, "San Gerardo" Hospital - ASST Monza, University Milano-Bicocca, Monza, Italy
| | - Jane Grogan
- Genentech, Department of Cancer Immunology, South San Francisco, California, USA
| | - Mor Gross
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Andreas Grützkau
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | | | - Jonas Hahn
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Quirin Hammer
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Anja E Hauser
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
- Immundynamics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | - David Hedley
- Divsion of Medical Oncology and Hematology, Princess Margaret Hospital, Toronto, Ontario, Canada
| | - Guadalupe Herrera
- Cytometry Service, Incliva Foundation. Clinic Hospital and Faculty of Medicine, The University of Valencia. Av. Blasco Ibáñez, Valencia, Spain
| | - Martin Herrmann
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Falk Hiepe
- Medizinische Klinik mit Schwerpunkt Rheumatologie und Medizinische Immunolologie Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Tristan Holland
- Department of Molecular Immunology, Institute of Experimental Immunology, Bonn, Germany
| | - Pleun Hombrink
- Department of Hematopoiesis, Sanquin Research and Landsteiner Laboratory, Amsterdam, The Netherlands
| | - Jessica P Houston
- Chemical and Materials Engineering, New Mexico State University, Las Cruces, NM, 88003, USA
| | - Bimba F Hoyer
- Medizinische Klinik mit Schwerpunkt Rheumatologie und Medizinische Immunolologie Charité Universitätsmedizin Berlin, Berlin, Germany
| | - Bo Huang
- Department of Biochemistry and Molecular Biology, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Department of Immunology, Institute of Basic Medical Sciences & State Key Laboratory of Medical Molecular Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- Clinical Immunology Center, Chinese Academy of Medical Sciences, Beijing, China
| | - Christopher A Hunter
- Department of Pathobiology, School of Veterinary Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Anna Iannone
- Department of Diagnostic Medicine, Clinical and Public Health, Univ. of Modena and Reggio Emilia, Modena, Italy
| | - Hans-Martin Jäck
- Division of Molecular Immunology, Internal Medicine III, Nikolaus-Fiebiger-Center of MolecularMedicine, University Hospital Erlangen, Erlangen, Germany
| | - Beatriz Jávega
- Laboratory of Cytomics, Joint Research Unit CIPF-UVEG, Department of Biochemistry and Molecular Biology, The University of Valencia. Av. Blasco Ibáñez, Valencia, Spain
| | - Stipan Jonjic
- Faculty of Medicine, Center for Proteomics, University of Rijeka, Rijeka, Croatia
- Department for Histology and Embryology, Faculty of Medicine, University of Rijeka, Rijeka, Croatia
| | - Kerstin Juelke
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Steffen Jung
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - Toralf Kaiser
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Tomas Kalina
- Department of Paediatric Haematology and Oncology, Second Faculty of Medicine, Charles University and University Hospital Motol, Prague, Czech Republic
| | - Baerbel Keller
- Center for Chronic Immunodeficiency (CCI), Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Srijit Khan
- Department of Immunology, University of Toronto, Toronto, Canada
| | - Deborah Kienhöfer
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Thomas Kroneis
- Medical University of Graz, Institute of Cell Biology, Histology & Embryology, Graz, Austria
| | - Désirée Kunkel
- BCRT Flow Cytometry Lab, Berlin-Brandenburg Center for Regenerative Therapies, Charité - Universitätsmedizin Berlin
| | - Christian Kurts
- Institute of Experimental Immunology, University Bonn, Bonn, Germany
| | - Pia Kvistborg
- Division of immunology, the Netherlands Cancer Institute, Amsterdam
| | - Joanne Lannigan
- University of Virginia School of Medicine, Flow Cytometry Shared Resource, Charlottesville, VA, USA
| | - Olivier Lantz
- INSERM U932, Institut Curie, Paris 75005, France
- Laboratoire d'immunologie clinique, Institut Curie, Paris 75005, France
- Centre d'investigation Clinique en Biothérapie Gustave-Roussy Institut Curie (CIC-BT1428), Institut Curie, Paris 75005, France
| | - Anis Larbi
- Singapore Immunology Network (SIgN), Principal Investigator, Biology of Aging Program
- Director Flow Cytomerty Platform, Immunomonitoring Platform, Agency for Science Technology and Research (A*STAR), Singapore
- Department of Medicine, University of Sherbrooke, Qc, Canada
- Faculty of Sciences, ElManar University, Tunis, Tunisia
| | | | - Michael D Leipold
- The Human Immune Monitoring Center (HIMC), Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, CA, USA
| | - Megan K Levings
- Department of Surgery, University of British Columbia & British Columbia Children's Hospital Research Institute, Vancouver, BC, Canada
| | | | - Yanling Liu
- Department of Immunology, University of Toronto, Toronto, Canada
| | - Michael Lohoff
- Institute for Medical Microbiology and Hospital Hygiene, University of Marburg, Marburg 35043, Germany
| | - Giovanna Lombardi
- MRC Centre for Transplantation, King's College London, Guy's Hospital, SE1 9RT London, UK
| | | | - Amy Lovett-Racke
- Department of Microbial Infection and Immunity, Ohio State University, Columbus, OH, USA
| | - Erik Lubberts
- Erasmus MC, University Medical Center, Department of Rheumatology, Rotterdam, The Netherlands
| | - Burkhard Ludewig
- Institute of Immunobiology, Kantonsspital St. Gallen, St. Gallen, Switzerland
| | - Enrico Lugli
- Laboratory of Translational Immunology, Humanitas Clinical and Research Center, Rozzano, Milan, Italy
- Humanitas Flow Cytometry Core, Humanitas Clinical and Research Center, Rozzano, Milan, Italy
| | - Holden T Maecker
- The Human Immune Monitoring Center (HIMC), Institute for Immunity, Transplantation and Infection, Stanford University School of Medicine, CA, USA
| | - Glòria Martrus
- Department of Virus Immunology, Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
| | - Giuseppe Matarese
- Dipartimento di Medicina Molecolare e Biotecnologie Mediche, Università di Napoli Federico II, Napoli, Italy and Istituto per l'Endocrinologia e l'Oncologia Sperimentale, Consiglio Nazionale delle Ricerche (IEOS-CNR), Napoli, Italy
| | - Christian Maueröder
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Mairi McGrath
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Iain McInnes
- Institute of Infection, Immunity and Inflammation, College of Medical, Veterinary and Life Sciences, University of Glasgow
| | - Henrik E Mei
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Fritz Melchers
- Senior Group on Lymphocyte Development, Max Planck Institute for Infection Biology, Berlin, Germany
| | - Susanne Melzer
- Clinical Trial Center Leipzig, University Leipzig, Leipzig, Germany
| | - Dirk Mielenz
- Division of Molecular Immunology, Nikolaus-Fiebiger-Center, Dept. of Internal Medicine III, University of Erlangen-Nuremberg, Erlangen, Germany
| | - Kingston Mills
- Trinity Biomedical Sciences Institute, Trinity College Dublin, the University of Dublin, Dublin, Ireland
| | - David Mirrer
- Swiss Institute of Allergy and Asthma Research (SIAF), University Zurich, Davos, Switzerland
| | - Jenny Mjösberg
- Center for Infectious Medicine, Department of Medicine, Karolinska Institute Stockholm, Sweden
- Department of Clinical and Experimental Medicine, Linköping University, Sweden
| | - Jonni Moore
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine of the University of Pennsylvania, Philadelphia, Pennsylvania
| | - Barry Moran
- Trinity Biomedical Sciences Institute, Trinity College Dublin, the University of Dublin, Dublin, Ireland
| | - Alessandro Moretta
- Department of Experimental Medicine, University of Genova, Genova, Italy
- Centro di Eccellenza per la Ricerca Biomedica-CEBR, Genova, Italy
| | - Lorenzo Moretta
- Department of Immunology, IRCCS Bambino Gesu Children's Hospital, Rome, Italy
| | - Tim R Mosmann
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, NY, USA
| | - Susann Müller
- Centre for Environmental Research - UFZ, Department Environemntal Microbiology, Leipzig, Germany
| | - Werner Müller
- Bill Ford Chair in Cellular Immunology, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom
| | - Christian Münz
- University of Zurich, Institute of Experimental Immunology, Zürich, Switzerland
| | - Gabriele Multhoff
- Department of Radiation Oncology, Klinikum rechts der Isar, Technische Universität München (TUM), Munich, Germany
- Institute for Innovative Radiotherapy (iRT), Experimental Immune Biology, Helmholtz Zentrum München, Neuherberg, Germany
| | - Luis Enrique Munoz
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Kenneth M Murphy
- Department of Pathology and Immunology, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Howard Hughes Medical Institute, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Toshinori Nakayama
- Department of Immunology, Graduate School of Medicine, Chiba University, 1-8-1 Inohana, Chuo-ku, Chiba, 260-8670, Japan
| | - Milena Nasi
- Department of Surgery, Medicine, Dentistry and Morphological Sciences, Univ. of Modena and Reggio Emilia, Modena, Italy
| | - Christine Neudörfl
- Institute of Transplant Immunology, IFB-Tx, MHH Hannover Medical School, Hannover, Germany
| | - John Nolan
- The Scintillon Institute, Nancy Ridge Drive, San Diego, CA, USA
| | - Sussan Nourshargh
- Centre for Microvascular Research, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - José-Enrique O'Connor
- Laboratory of Cytomics, Joint Research Unit CIPF-UVEG, Department of Biochemistry and Molecular Biology, The University of Valencia. Av. Blasco Ibáñez, Valencia, Spain
| | - Wenjun Ouyang
- Department of Inflammation and Oncology, Amgen Inc., South San Francisco, CA, USA
| | | | - Raghav Palankar
- Institute for Immunology and Transfusion Medicine, University Medicine Greifswald, Ferdinand-Sauerbruch-Straße, 17489, Greifswald, Germany
| | - Isabel Panse
- Kennedy Institute of Rheumatology, University of Oxford, Oxford, United Kingdom
| | - Pärt Peterson
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Christian Peth
- Biopyhsics, R&D Engineering, Miltenyi Biotec GmbH, Bergisch Gladbach, Germany
| | - Jordi Petriz
- Josep Carreras Leukemia Research Institute, Barcelona, Spain
| | - Daisy Philips
- Division of immunology, the Netherlands Cancer Institute, Amsterdam
| | - Winfried Pickl
- Institute of Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna, Austria
| | - Silvia Piconese
- Dipartimento di Medicina Interna e Specialità Mediche, Sapienza Università di Roma, Via Regina Elena 324, 00161 Rome, Italy
- Istituto Pasteur Italia-Fondazione Cenci Bolognetti, Rome, Italy
| | - Marcello Pinti
- Department of Life Sciences, Univ. of Modena and Reggio Emilia, Modena, Italy
| | - A Graham Pockley
- John van Geest Cancer Research Centre, Nottingham Trent University, Nottingham, UK
- Chromocyte Limited, Electric Works, Sheffield, UK
| | - Malgorzata Justyna Podolska
- Friedrich-Alexander-University Erlangen-Nürnberg (FAU), Department of Internal Medicine, Rheumatology and Immunology, Universitätsklinikum Erlangen, Erlangen
| | - Carlo Pucillo
- Univeristy of Udine - Department of Medicine, Lab of Immunology, Udine, Italy
| | - Sally A Quataert
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, NY, USA
| | - Timothy R D J Radstake
- Department of Rheumatology and Clinical Immunology, University Medical Center Utrecht, Utrecht, The Netherlands; Laboratory of Translational Immunology, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Bartek Rajwa
- Bindley Biosciences Center, Purdue University, West Lafayette, In, USA
| | - Jonathan A Rebhahn
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, NY, USA
| | | | - Ester B M Remmerswaal
- Department of Experimental Immunology and Renal Transplant Unit, Division of Internal Medicine, Academic Medical Centre, The Netherlands
| | - Katy Rezvani
- Department of Stem Cell Transplantation and Cellular Therapy, The University of Texas M. D. Anderson Cancer Center, Houston, TX, USA
| | - Laura G Rico
- Josep Carreras Leukemia Research Institute, Barcelona, Spain
| | - J Paul Robinson
- The SVM Professor of Cytomics & Professor of Biomedical Engineering, Purdue University Cytometry Laboratories, Purdue University, West Lafayette, IN, USA
| | - Chiara Romagnani
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | | | - Beate Ruckert
- Swiss Institute of Allergy and Asthma Research (SIAF), University Zurich, Davos, Switzerland
| | - Jürgen Ruland
- Institut für Klinische Chemie und Pathobiochemie, Klinikum rechts der Isar, Technische Universität München, Munich, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany
- German Center for Infection Research (DZIF), partner site Munich, Munich, Germany
| | - Shimon Sakaguchi
- Laboratory of Experimental Immunology, WPI Immunology Frontier Research Center (IFReC), Osaka University, Suita 565-0871, Japan
- Department of Experimental Pathology, Institute for Frontier Medical Sciences, Kyoto University, Kyoto 606-8507, Japan
| | - Francisco Sala-de-Oyanguren
- Laboratory of Cytomics, Joint Research Unit CIPF-UVEG, Department of Biochemistry and Molecular Biology, The University of Valencia. Av. Blasco Ibáñez, Valencia, Spain
| | - Yvonne Samstag
- Institute of Immunology, Section Molecular Immunology, Ruprecht-Karls-University, D-69120, Heidelberg, Germany
| | - Sharon Sanderson
- Translational Immunology Laboratory, NIHR BRC, University of Oxford, Kennedy Institute of Rheumatology,Oxford, United Kingdom
| | - Birgit Sawitzki
- Charité-Universitaetsmedizin Berlin, Corporate Member of Freie Universitaet Berlin, Humboldt-Universitaet zu Berlin
- Berlin Institute of Health, Institute of Medical Immunology, Augustenburger Platz 1, 13353 Berlin, Germany
| | - Alexander Scheffold
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
- Charité - Universitätsmedizin Berlin, Germany
| | - Matthias Schiemann
- Institut für Medizinische Mikrobiologie, Immunologie und Hygiene, Technische Universität München, Munich, Germany
| | - Frank Schildberg
- Harvard Medical School, Department of Microbiology and Immunobiology, Boston, MA, USA
| | | | - Stephan A Schmid
- Klinik und Poliklinik für Innere Medizin I, Universitätsklinikum Regensburg, Regensburg, Germany
| | - Steffen Schmitt
- Imaging and Cytometry Core Facility, Flow Cytometry Unit, German Cancer Research Centre (DKFZ), Heidelberg, Germany
| | - Kilian Schober
- Institut für Medizinische Mikrobiologie, Immunologie und Hygiene, Technische Universität München, Munich, Germany
| | - Thomas Schüler
- Institute of Molecular and Clinical Immunology, Otto-von-Guericke University, Magdeburg, Germany
| | - Axel Ronald Schulz
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Ton Schumacher
- Division of immunology, the Netherlands Cancer Institute, Amsterdam
| | - Cristiano Scotta
- MRC Centre for Transplantation, King's College London, Guy's Hospital, SE1 9RT London, UK
| | | | - Anat Shemer
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | | | - Josef Spidlen
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, Canada
| | | | - Regina Stark
- Department of Hematopoiesis, Sanquin Research and Landsteiner Laboratory, Amsterdam, The Netherlands
| | - Christina Stehle
- Deutsches Rheuma-Forschungszentrum (DRFZ), an Institute of the Leibniz Association, Berlin, Germany
| | - Merle Stein
- Division of Molecular Immunology, Nikolaus-Fiebiger-Center, Dept. of Internal Medicine III, University of Erlangen-Nuremberg, Erlangen, Germany
| | - Tobit Steinmetz
- Division of Molecular Immunology, Nikolaus-Fiebiger-Center, Dept. of Internal Medicine III, University of Erlangen-Nuremberg, Erlangen, Germany
| | - Hannes Stockinger
- Institute for Hygiene and Applied Immunology, Center for Pathophysiology, Infectiology and Immunology, Medical University of Vienna, Vienna, Austria
| | - Yousuke Takahama
- Division of Experimental Immunology, Institute of Advanced Medical Sciences, University of Tokushima, Tokushima, Japan
| | - Attila Tarnok
- Departement for Therapy Validation, Fraunhofer Institute for Cell Therapy and Immunology IZI, Leipzig, Germany
- Institute for Medical Informatics, IMISE, Leipzig, Germany
| | - ZhiGang Tian
- School of Life Sciences and Medical Center, Institute of Immunology, Key Laboratory of Innate Immunity and Chronic Disease of Chinese Academy of Science, University of Science and Technology of China, Hefei, China
- Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
| | - Gergely Toldi
- University of Birmingham, Institute of Immunology and Immunotherapy, Birmingham, UK
| | - Julia Tornack
- Senior Group on Lymphocyte Development, Max Planck Institute for Infection Biology, Berlin, Germany
| | | | | | - Henning Ulrich
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo
| | | | - René A W van Lier
- Department of Hematopoiesis, Sanquin Research and Landsteiner Laboratory, Amsterdam, The Netherlands
| | | | | | - Paulo Vieira
- Unité Lymphopoiese, Institut Pasteur, Paris, France
| | - David Voehringer
- Department of Infection Biology, University Hospital Erlangen, Wasserturmstr. 3/5, 91054 Erlangen, Germany
| | | | | | - Ari Waisman
- Institute for Molecular Medicine, University Medical Center of the Johannes Gutenberg University of Mainz, Mainz, Germany
| | | | | | - Klaus Warnatz
- Center for Chronic Immunodeficiency (CCI), Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Sarah Warth
- BCRT Flow Cytometry Lab, Berlin-Brandenburg Center for Regenerative Therapies, Charité - Universitätsmedizin Berlin
| | | | - Carsten Watzl
- Leibniz Research Centre for Working Environment and Human Factors at TU Dortmund, IfADo, Department of Immunology, Dortmund, Germany
| | - Leonie Wegener
- Biopyhsics, R&D Engineering, Miltenyi Biotec GmbH, Bergisch Gladbach, Germany
| | - Annika Wiedemann
- Department of Medicine/Rheumatology and Clinical Immunology, Charite Universitätsmedizin Berlin, Germany
| | - Jürgen Wienands
- Universitätsmedizin Göttingen, Georg-August-Universität, Abt. Zelluläre und Molekulare Immunologie, Humboldtallee 34, 37073 Göttingen, Germany
| | - Gerald Willimsky
- Cooperation Unit for Experimental and Translational Cancer Immunology, Institute of Immunology (Charité - Universitätsmedizin Berlin) and German Cancer Research Center (DKFZ), Berlin, Germany
| | - James Wing
- Laboratory of Experimental Immunology, WPI Immunology Frontier Research Center (IFReC), Osaka University, Suita 565-0871, Japan
- Department of Experimental Pathology, Institute for Frontier Medical Sciences, Kyoto University, Kyoto 606-8507, Japan
| | - Peter Wurst
- Institute of Experimental Immunology, University Bonn, Bonn, Germany
| | | | - Alice Yue
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | | | - Yi Zhao
- Department of Rheumatology & Immunology, West China Hospital, Sichuan University, Chengdu, China
| | - Susanne Ziegler
- Department of Virus Immunology, Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany
| | - Jakob Zimmermann
- Maurice Müller Laboratories (DKF), Universitätsklinik für Viszerale Chirurgie und Medizin Inselspital, University of Bern, Murtenstrasse, Bern
| |
Collapse
|
25
|
Pouyan MB, Nourani M. Identifying Cell Populations in Flow Cytometry Data Using Phenotypic Signatures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:880-891. [PMID: 27076456 DOI: 10.1109/tcbb.2016.2550428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Single-cell flow cytometry is a technology that measures the expression of several cellular markers simultaneously for a large number of cells. Identification of homogeneous cell populations, currently done by manual biaxial gating, is highly subjective and time consuming. To overcome the shortcomings of manual gating, automatic algorithms have been proposed. However, the performance of these methods highly depends on the shape of populations and the dimension of the data. In this paper, we have developed a time-efficient method that accurately identifies cellular populations. This is done based on a novel technique that estimates the initial number of clusters in high dimension and identifies the final clusters by merging clusters using their phenotypic signatures in low dimension. The proposed method is called SigClust. We have applied SigClust to four public datasets and compared it with five well known methods in the field. The results are promising and indicate higher performance and accuracy compared to similar approaches reported in literature.
Collapse
|
26
|
Pouyan MB, Jindal V, Nourani M. Clinical Outcome Prediction Using Single-Cell Data. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2016; 10:1012-1022. [PMID: 27654975 DOI: 10.1109/tbcas.2016.2577641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Single-cell technologies like flow cytometry (FCM) provide valuable biological data for knowledge discovery in complex cellular systems like tissues and organs. FCM data contains multi-dimensional information about the cellular heterogeneity of intricate cellular systems. It is possible to correlate single-cell markers with phenotypic properties of those systems. Cell population identification and clinical outcome prediction from single-cell measurements are challenging problems in the field of single cell analysis. In this paper, we propose a hybrid learning approach to predict clinical outcome using samples' single-cell FCM data. The proposed method is efficient in both i) identification of cellular clusters in each sample's FCM data and ii) predict clinical outcome (healthy versus unhealthy) for each subject. Our method is robust and the experimental results indicate promising performance.
Collapse
|
27
|
Pouyan MB, Jindal V, Birjandtalab J, Nourani M. Single and multi-subject clustering of flow cytometry data for cell-type identification and anomaly detection. BMC Med Genomics 2016; 9 Suppl 2:41. [PMID: 27510222 PMCID: PMC4980779 DOI: 10.1186/s12920-016-0201-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Measurement of various markers of single cells using flow cytometry has several biological applications. These applications include improving our understanding of behavior of cellular systems, identifying rare cell populations and personalized medication. A common critical issue in the existing methods is identification of the number of cellular populations which heavily affects the accuracy of results. Furthermore, anomaly detection is crucial in flow cytometry experiments. In this work, we propose a two-stage clustering technique for cell type identification in single subject flow cytometry data and extend it for anomaly detection among multiple subjects. RESULTS Our experimentation on 42 flow cytometry datasets indicates high performance and accurate clustering (F-measure > 91 %) in identifying main cellular populations. Furthermore, our anomaly detection technique evaluated on Acute Myeloid Leukemia dataset results in only <2 % false positives.
Collapse
Affiliation(s)
- Maziyar Baran Pouyan
- Quality of Life Technology Laboratory, The University of Texas at Dallas, Richardson, Texas USA
| | - Vasu Jindal
- Quality of Life Technology Laboratory, The University of Texas at Dallas, Richardson, Texas USA
- Department of Computer Science, The University of Texas at Dallas, RichardsonTexas, USA
| | - Javad Birjandtalab
- Quality of Life Technology Laboratory, The University of Texas at Dallas, Richardson, Texas USA
| | - Mehrdad Nourani
- Quality of Life Technology Laboratory, The University of Texas at Dallas, Richardson, Texas USA
| |
Collapse
|
28
|
Saeys Y, Van Gassen S, Lambrecht BN. Computational flow cytometry: helping to make sense of high-dimensional immunology data. Nat Rev Immunol 2016; 16:449-62. [PMID: 27320317 DOI: 10.1038/nri.2016.56] [Citation(s) in RCA: 305] [Impact Index Per Article: 38.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Recent advances in flow cytometry allow scientists to measure an increasing number of parameters per cell, generating huge and high-dimensional datasets. To analyse, visualize and interpret these data, newly available computational techniques should be adopted, evaluated and improved upon by the immunological community. Computational flow cytometry is emerging as an important new field at the intersection of immunology and computational biology; it allows new biological knowledge to be extracted from high-throughput single-cell data. This Review provides non-experts with a broad and practical overview of the many recent developments in computational flow cytometry.
Collapse
Affiliation(s)
- Yvan Saeys
- VIB Inflammation Research Center, Technologiepark 927, Ghent B-9052, Belgium.,Department of Internal Medicine, Ghent University, De Pintelaan 185, Ghent B-9000, Belgium
| | - Sofie Van Gassen
- VIB Inflammation Research Center, Technologiepark 927, Ghent B-9052, Belgium.,Department of Information Technology, Technologiepark 15, Ghent B-9052, Belgium
| | - Bart N Lambrecht
- VIB Inflammation Research Center, Technologiepark 927, Ghent B-9052, Belgium.,Department of Internal Medicine, Ghent University, De Pintelaan 185, Ghent B-9000, Belgium.,Department of Pulmonary Medicine, Erasmus MC Rotterdam, Dr Molewaterplein 50, Rotterdam 3015 GE, The Netherlands
| |
Collapse
|
29
|
Automated mapping of phenotype space with single-cell data. Nat Methods 2016; 13:493-6. [PMID: 27183440 PMCID: PMC4896314 DOI: 10.1038/nmeth.3863] [Citation(s) in RCA: 245] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2016] [Accepted: 04/12/2016] [Indexed: 01/20/2023]
Abstract
Accurate and rapid identification of cell populations is key to discovering novelty in multidimensional single cell experiments. We present a population finding algorithm X-shift that can process large datasets using fast KNN estimation of cell event density and automatically arranges populations by a marker-based classification system. X-shift analysis of mouse bone marrow data resolved the majority of known and several previously undescribed cell populations. Interestingly, previously known cell populations, as well as intermediate cell populations in early hematopoietic development, were described via novel marker combinations that were defined via routes to their locations in expressed marker space. X-shift provides a rapid, reliable approach to managed cell subset analysis that maximizes automation that not only best mimics human intuition, but as we show provides access to novel insights that “prior knowledge” might prevent the researcher from visualizing.
Collapse
|
30
|
Pouyan MB, Nourani M. Clustering Single-Cell Expression Data Using Random Forest Graphs. IEEE J Biomed Health Inform 2016; 21:1172-1181. [PMID: 28113735 DOI: 10.1109/jbhi.2016.2565561] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Complex tissues such as brain and bone marrow are made up of multiple cell types. As the study of biological tissue structure progresses, the role of cell-type-specific research becomes increasingly important. Novel sequencing technology such as single-cell cytometry provides researchers access to valuable biological data. Applying machine-learning techniques to these high-throughput datasets provides deep insights into the cellular landscape of the tissue where those cells are a part of. In this paper, we propose the use of random-forest-based single-cell profiling, a new machine-learning-based technique, to profile different cell types of intricate tissues using single-cell cytometry data. Our technique utilizes random forests to capture cell marker dependences and model the cellular populations using the cell network concept. This cellular network helps us discover what cell types are in the tissue. Our experimental results on public-domain datasets indicate promising performance and accuracy of our technique in extracting cell populations of complex tissues.
Collapse
|
31
|
BayesFlow: latent modeling of flow cytometry cell populations. BMC Bioinformatics 2016; 17:25. [PMID: 26755197 PMCID: PMC4709953 DOI: 10.1186/s12859-015-0862-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 12/17/2015] [Indexed: 12/23/2022] Open
Abstract
Background Flow cytometry is a widespread single-cell measurement technology with a multitude of clinical and research applications. Interpretation of flow cytometry data is hard; the instrumentation is delicate and can not render absolute measurements, hence samples can only be interpreted in relation to each other while at the same time comparisons are confounded by inter-sample variation. Despite this, most automated flow cytometry data analysis methods either treat samples individually or ignore the variation by for example pooling the data. A key requirement for models that include multiple samples is the ability to visualize and assess inferred variation, since what could be technical variation in one setting would be different phenotypes in another. Results We introduce BayesFlow, a pipeline for latent modeling of flow cytometry cell populations built upon a Bayesian hierarchical model. The model systematizes variation in location as well as shape. Expert knowledge can be incorporated through informative priors and the results can be supervised through compact and comprehensive visualizations. BayesFlow is applied to two synthetic and two real flow cytometry data sets. For the first real data set, taken from the FlowCAP I challenge, BayesFlow does not only give a gating which would place it among the top performers in FlowCAP I for this dataset, it also gives a more consistent treatment of different samples than either manual gating or other automated gating methods. The second real data set contains replicated flow cytometry measurements of samples from healthy individuals. BayesFlow gives here cell populations with clear expression patterns and small technical intra-donor variation as compared to biological inter-donor variation. Conclusions Modeling latent relations between samples through BayesFlow enables a systematic analysis of inter-sample variation. As opposed to other joint gating methods, effort is put at ensuring that the obtained partition of the data corresponds to actual cell populations, and the result is therefore directly biologically interpretable. BayesFlow is freely available at GitHub. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0862-z) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
Abstract
Multi-color flow cytometry has become a valuable and highly informative tool for diagnosis and therapeutic monitoring of patients with immune deficiencies or inflammatory disorders. However, the method complexity and error-prone conventional manual data analysis often result in a high variability between different analysts and research laboratories. Here, we provide strategies and guidelines aiming at a more standardized multi-color flow cytometric staining and unsupervised data analysis for whole blood patient samples.
Collapse
|
33
|
Mair F, Hartmann FJ, Mrdjen D, Tosevski V, Krieg C, Becher B. The end of gating? An introduction to automated analysis of high dimensional cytometry data. Eur J Immunol 2015; 46:34-43. [DOI: 10.1002/eji.201545774] [Citation(s) in RCA: 164] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 10/15/2015] [Accepted: 11/03/2015] [Indexed: 11/06/2022]
Affiliation(s)
- Florian Mair
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| | - Felix J. Hartmann
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| | - Dunja Mrdjen
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| | - Vinko Tosevski
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| | - Carsten Krieg
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| | - Burkhard Becher
- Institute of Experimental Immunology; University of Zurich; Zurich Switzerland
| |
Collapse
|
34
|
Hyrkas J, Clayton S, Ribalet F, Halperin D, Armbrust EV, Howe B. Scalable clustering algorithms for continuous environmental flow cytometry. Bioinformatics 2015; 32:417-23. [PMID: 26476780 DOI: 10.1093/bioinformatics/btv594] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 10/12/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. RESULTS We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data. AVAILABILITY AND IMPLEMENTATION Source code available for download at https://github.com/jhyrkas/seaflow_cluster, implemented in Java for use with Hadoop. CONTACT hyrkas@cs.washington.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Daniel Halperin
- Department of Computer Science and Engineering, eScience Institute, University of Washington, Seattle, WA 98195, USA
| | - E Virginia Armbrust
- School of Oceanography and eScience Institute, University of Washington, Seattle, WA 98195, USA
| | - Bill Howe
- Department of Computer Science and Engineering, eScience Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
35
|
Rebhahn JA, Roumanes DR, Qi Y, Khan A, Thakar J, Rosenberg A, Lee FEH, Quataert SA, Sharma G, Mosmann TR. Competitive SWIFT cluster templates enhance detection of aging changes. Cytometry A 2015; 89:59-70. [PMID: 26441030 PMCID: PMC4737406 DOI: 10.1002/cyto.a.22740] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 04/21/2015] [Accepted: 08/05/2015] [Indexed: 12/17/2022]
Abstract
Clustering‐based algorithms for automated analysis of flow cytometry datasets have achieved more efficient and objective analysis than manual processing. Clustering organizes flow cytometry data into subpopulations with substantially homogenous characteristics but does not directly address the important problem of identifying the salient differences in subpopulations between subjects and groups. Here, we address this problem by augmenting SWIFT—a mixture model based clustering algorithm reported previously. First, we show that SWIFT clustering using a “template” mixture model, in which all subpopulations are represented, identifies small differences in cell numbers per subpopulation between samples. Second, we demonstrate that resolution of inter‐sample differences is increased by “competition” wherein a joint model is formed by combining the mixture model templates obtained from different groups. In the joint model, clusters from individual groups compete for the assignment of cells, sharpening differences between samples, particularly differences representing subpopulation shifts that are masked under clustering with a single template model. The benefit of competition was demonstrated first with a semisynthetic dataset obtained by deliberately shifting a known subpopulation within an actual flow cytometry sample. Single templates correctly identified changes in the number of cells in the subpopulation, but only the competition method detected small changes in median fluorescence. In further validation studies, competition identified a larger number of significantly altered subpopulations between young and elderly subjects. This enrichment was specific, because competition between templates from consensus male and female samples did not improve the detection of age‐related differences. Several changes between the young and elderly identified by SWIFT template competition were consistent with known alterations in the elderly, and additional altered subpopulations were also identified. Alternative algorithms detected far fewer significantly altered clusters. Thus SWIFT template competition is a powerful approach to sharpen comparisons between selected groups in flow cytometry datasets. © 2015 The Authors. Published Wiley Periodicals Inc.
Collapse
Affiliation(s)
- Jonathan A Rebhahn
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, New York
| | - David R Roumanes
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, New York
| | - Yilin Qi
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, New York
| | - Atif Khan
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York
| | - Juilee Thakar
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York.,Department of Microbiology and Immunology, University of Rochester
| | | | - F Eun-Hyung Lee
- Department of Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Sally A Quataert
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, New York
| | - Gaurav Sharma
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, New York.,Department of Electrical and Computer Engineering, University of Rochester
| | - Tim R Mosmann
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, Rochester, New York.,Department of Microbiology and Immunology, University of Rochester
| |
Collapse
|
36
|
Verschoor CP, Lelic A, Bramson JL, Bowdish DME. An Introduction to Automated Flow Cytometry Gating Tools and Their Implementation. Front Immunol 2015; 6:380. [PMID: 26284066 PMCID: PMC4515551 DOI: 10.3389/fimmu.2015.00380] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Accepted: 07/12/2015] [Indexed: 12/18/2022] Open
Abstract
Current flow cytometry (FCM) reagents and instrumentation allow for the measurement of an unprecedented number of parameters for any given cell within a homogenous or heterogeneous population. While this provides a great deal of power for hypothesis testing, it also generates a vast amount of data, which is typically analyzed manually through a processing called “gating.” For large experiments, such as high-content screens, in which many parameters are measured, the time required for manual analysis as well as the technical variability inherent to manual gating can increase dramatically, even becoming prohibitive depending on the clinical or research goal. In the following article, we aim to provide the reader an overview of automated FCM analysis as well as an example of the implementation of FLOw Clustering without K, a tool that we consider accessible to researchers of all levels of computational expertise. In most cases, computational assistance methods are more reproducible and much faster than manual gating, and for some, also allow for the discovery of cellular populations that might not be expected or evident to the researcher. We urge any researcher who is planning or has previously performed large FCM experiments to consider implementing computational assistance into their analysis pipeline.
Collapse
Affiliation(s)
- Chris P Verschoor
- Department of Pathology and Molecular Medicine, McMaster Immunology Research Centre (MIRC), McMaster University , Hamilton, ON , Canada
| | - Alina Lelic
- Department of Pathology and Molecular Medicine, McMaster Immunology Research Centre (MIRC), McMaster University , Hamilton, ON , Canada
| | - Jonathan L Bramson
- Department of Pathology and Molecular Medicine, McMaster Immunology Research Centre (MIRC), McMaster University , Hamilton, ON , Canada
| | - Dawn M E Bowdish
- Department of Pathology and Molecular Medicine, McMaster Immunology Research Centre (MIRC), McMaster University , Hamilton, ON , Canada
| |
Collapse
|
37
|
Lin L, Frelinger J, Jiang W, Finak G, Seshadri C, Bart PA, Pantaleo G, McElrath J, DeRosa S, Gottardo R. Identification and visualization of multidimensional antigen-specific T-cell populations in polychromatic cytometry data. Cytometry A 2015; 87:675-82. [PMID: 25908275 PMCID: PMC4482785 DOI: 10.1002/cyto.a.22623] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 10/24/2014] [Accepted: 12/10/2014] [Indexed: 11/08/2022]
Abstract
An important aspect of immune monitoring for vaccine development, clinical trials, and research is the detection, measurement, and comparison of antigen-specific T-cells from subject samples under different conditions. Antigen-specific T-cells compose a very small fraction of total T-cells. Developments in cytometry technology over the past five years have enabled the measurement of single-cells in a multivariate and high-throughput manner. This growth in both dimensionality and quantity of data continues to pose a challenge for effective identification and visualization of rare cell subsets, such as antigen-specific T-cells. Dimension reduction and feature extraction play pivotal role in both identifying and visualizing cell populations of interest in large, multi-dimensional cytometry datasets. However, the automated identification and visualization of rare, high-dimensional cell subsets remains challenging. Here we demonstrate how a systematic and integrated approach combining targeted feature extraction with dimension reduction can be used to identify and visualize biological differences in rare, antigen-specific cell populations. By using OpenCyto to perform semi-automated gating and features extraction of flow cytometry data, followed by dimensionality reduction with t-SNE we are able to identify polyfunctional subpopulations of antigen-specific T-cells and visualize treatment-specific differences between them.
Collapse
Affiliation(s)
- Lin Lin
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Jacob Frelinger
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Wenxin Jiang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Greg Finak
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Chetan Seshadri
- Division of Allergy and Infectious Diseases, University of Washington, Seattle, Washington
| | | | | | - Julie McElrath
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Steve DeRosa
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington
| |
Collapse
|
38
|
Lin L, Chan C, West M. Discriminative variable subsets in Bayesian classification with mixture models, with application in flow cytometry studies. Biostatistics 2015; 17:40-53. [PMID: 26040910 PMCID: PMC4679067 DOI: 10.1093/biostatistics/kxv021] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 04/20/2015] [Indexed: 11/14/2022] Open
Abstract
We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of design and selection of variables in biomolecular studies, particularly involving widely used assays of large-scale single-cell data generated using flow cytometry technology. For such studies and for mixture modeling generally, we define discriminative analysis that overlays fitted mixture models using a natural measure of concordance between mixture component densities, and define an effective and computationally feasible method for assessing and prioritizing subsets of variables according to their roles in discrimination of one or more mixture components. We relate the new discriminative information measures to Bayesian classification probabilities and error rates, and exemplify their use in Bayesian analysis of Dirichlet process mixture models fitted via Markov chain Monte Carlo methods as well as using a novel Bayesian expectation–maximization algorithm. We present a series of theoretical and simulated data examples to fix concepts and exhibit the utility of the approach, and compare with prior approaches. We demonstrate application in the context of automatic classification and discriminative variable selection in high-throughput systems biology using large flow cytometry datasets.
Collapse
Affiliation(s)
- Lin Lin
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Cliburn Chan
- Biostatistics & Bioinformatics, Duke University Medical Center, Durham, NC 27710-2721, and Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA
| | - Mike West
- Department of Statistical Science, Duke University, Durham, NC 27708-0251, USA
| |
Collapse
|
39
|
Van Gassen S, Callebaut B, Van Helden MJ, Lambrecht BN, Demeester P, Dhaene T, Saeys Y. FlowSOM: Using self-organizing maps for visualization and interpretation of cytometry data. Cytometry A 2015; 87:636-45. [PMID: 25573116 DOI: 10.1002/cyto.a.22625] [Citation(s) in RCA: 1088] [Impact Index Per Article: 120.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The number of markers measured in both flow and mass cytometry keeps increasing steadily. Although this provides a wealth of information, it becomes infeasible to analyze these datasets manually. When using 2D scatter plots, the number of possible plots increases exponentially with the number of markers and therefore, relevant information that is present in the data might be missed. In this article, we introduce a new visualization technique, called FlowSOM, which analyzes Flow or mass cytometry data using a Self-Organizing Map. Using a two-level clustering and star charts, our algorithm helps to obtain a clear overview of how all markers are behaving on all cells, and to detect subsets that might be missed otherwise. R code is available at https://github.com/SofieVG/FlowSOM and will be made available at Bioconductor.
Collapse
Affiliation(s)
- Sofie Van Gassen
- Department of Information Technology, Ghent University, iMinds, Ghent, Belgium.,Inflammation Research Center, VIB, Ghent, Belgium.,Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium
| | - Britt Callebaut
- Department of Information Technology, Ghent University, iMinds, Ghent, Belgium
| | - Mary J Van Helden
- Inflammation Research Center, VIB, Ghent, Belgium.,Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium
| | - Bart N Lambrecht
- Inflammation Research Center, VIB, Ghent, Belgium.,Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium
| | - Piet Demeester
- Department of Information Technology, Ghent University, iMinds, Ghent, Belgium
| | - Tom Dhaene
- Department of Information Technology, Ghent University, iMinds, Ghent, Belgium
| | - Yvan Saeys
- Inflammation Research Center, VIB, Ghent, Belgium.,Department of Respiratory Medicine, Ghent University Hospital, Ghent, Belgium
| |
Collapse
|
40
|
Dundar M, Akova F, Yerebakan HZ, Rajwa B. A non-parametric Bayesian model for joint cell clustering and cluster matching: identification of anomalous sample phenotypes with random effects. BMC Bioinformatics 2014; 15:314. [PMID: 25248977 PMCID: PMC4262223 DOI: 10.1186/1471-2105-15-314] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 09/16/2014] [Indexed: 12/13/2022] Open
Abstract
Background Flow cytometry (FC)-based computer-aided diagnostics is an emerging technique utilizing modern multiparametric cytometry systems. The major difficulty in using machine-learning approaches for classification of FC data arises from limited access to a wide variety of anomalous samples for training. In consequence, any learning with an abundance of normal cases and a limited set of specific anomalous cases is biased towards the types of anomalies represented in the training set. Such models do not accurately identify anomalies, whether previously known or unknown, that may exist in future samples tested. Although one-class classifiers trained using only normal cases would avoid such a bias, robust sample characterization is critical for a generalizable model. Owing to sample heterogeneity and instrumental variability, arbitrary characterization of samples usually introduces feature noise that may lead to poor predictive performance. Herein, we present a non-parametric Bayesian algorithm called ASPIRE (anomalous sample phenotype identification with random effects) that identifies phenotypic differences across a batch of samples in the presence of random effects. Our approach involves simultaneous clustering of cellular measurements in individual samples and matching of discovered clusters across all samples in order to recover global clusters using probabilistic sampling techniques in a systematic way. Results We demonstrate the performance of the proposed method in identifying anomalous samples in two different FC data sets, one of which represents a set of samples including acute myeloid leukemia (AML) cases, and the other a generic 5-parameter peripheral-blood immunophenotyping. Results are evaluated in terms of the area under the receiver operating characteristics curve (AUC). ASPIRE achieved AUCs of 0.99 and 1.0 on the AML and generic blood immunophenotyping data sets, respectively. Conclusions These results demonstrate that anomalous samples can be identified by ASPIRE with almost perfect accuracy without a priori access to samples of anomalous subtypes in the training set. The ASPIRE approach is unique in its ability to form generalizations regarding normal and anomalous states given only very weak assumptions regarding sample characteristics and origin. Thus, ASPIRE could become highly instrumental in providing unique insights about observed biological phenomena in the absence of full information about the investigated samples. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-314) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Murat Dundar
- Computer Science Department, IUPUI, 723 W, Michigan St,, 46037 Indianapolis IN, US.
| | | | | | | |
Collapse
|
41
|
Di Palma S, Bodenmiller B. Unraveling cell populations in tumors by single-cell mass cytometry. Curr Opin Biotechnol 2014; 31:122-9. [PMID: 25123841 DOI: 10.1016/j.copbio.2014.07.004] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2014] [Accepted: 07/22/2014] [Indexed: 11/26/2022]
Abstract
The development of new biotechnologies for the analysis of individual cells in heterogeneous populations is an important direction of life science research. This review provides a critical overview of relevant and recent advances in the field of single-cell mass cytometry, focusing on the latest applications in the study of cell heterogeneity. New approaches for multiparameter single-cell imaging, alongside advanced computational tools for deep mining of high-dimensional mass cytometric data, are facilitating the visualization of specific cell types and their interactions in complex cellular assemblies, such as tumors, potentially revealing new insights into cancer biology.
Collapse
Affiliation(s)
- Serena Di Palma
- Institute of Molecular Life Sciences, University of Zürich, Zürich, Switzerland
| | - Bernd Bodenmiller
- Institute of Molecular Life Sciences, University of Zürich, Zürich, Switzerland.
| |
Collapse
|
42
|
Finak G, Frelinger J, Jiang W, Newell EW, Ramey J, Davis MM, Kalams SA, De Rosa SC, Gottardo R. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput Biol 2014; 10:e1003806. [PMID: 25167361 PMCID: PMC4148203 DOI: 10.1371/journal.pcbi.1003806] [Citation(s) in RCA: 141] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 07/10/2014] [Indexed: 12/13/2022] Open
Abstract
Flow cytometry is used increasingly in clinical research for cancer, immunology and vaccines. Technological advances in cytometry instrumentation are increasing the size and dimensionality of data sets, posing a challenge for traditional data management and analysis. Automated analysis methods, despite a general consensus of their importance to the future of the field, have been slow to gain widespread adoption. Here we present OpenCyto, a new BioConductor infrastructure and data analysis framework designed to lower the barrier of entry to automated flow data analysis algorithms by addressing key areas that we believe have held back wider adoption of automated approaches. OpenCyto supports end-to-end data analysis that is robust and reproducible while generating results that are easy to interpret. We have improved the existing, widely used core BioConductor flow cytometry infrastructure by allowing analysis to scale in a memory efficient manner to the large flow data sets that arise in clinical trials, and integrating domain-specific knowledge as part of the pipeline through the hierarchical relationships among cell populations. Pipelines are defined through a text-based csv file, limiting the need to write data-specific code, and are data agnostic to simplify repetitive analysis for core facilities. We demonstrate how to analyze two large cytometry data sets: an intracellular cytokine staining (ICS) data set from a published HIV vaccine trial focused on detecting rare, antigen-specific T-cell populations, where we identify a new subset of CD8 T-cells with a vaccine-regimen specific response that could not be identified through manual analysis, and a CyTOF T-cell phenotyping data set where a large staining panel and many cell populations are a challenge for traditional analysis. The substantial improvements to the core BioConductor flow cytometry packages give OpenCyto the potential for wide adoption. It can rapidly leverage new developments in computational cytometry and facilitate reproducible analysis in a unified environment.
Collapse
Affiliation(s)
- Greg Finak
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Jacob Frelinger
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Wenxin Jiang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Evan W. Newell
- Agency for Science Technology and Research, Singapore Immunology Network, Singapore
| | - John Ramey
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Mark M. Davis
- Department of Microbiology and Immunology, Stanford University, Stanford, California, United States of America
- Institute for Immunity, Transplantation and Infection, Stanford University, Stanford, California, United States of America
- The Howard Hughes Medical Institute, Stanford University, Stanford, California, United States of America
| | - Spyros A. Kalams
- Infectious Diseases Division, Department of Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Pathology, Microbiology, and Immunology, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Stephen C. De Rosa
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Laboratory Medicine, University of Washington, Seattle, Washington, United States of America
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
- Department of Statistics, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
43
|
Anchang B, Do MT, Zhao X, Plevritis SK. CCAST: a model-based gating strategy to isolate homogeneous subpopulations in a heterogeneous population of single cells. PLoS Comput Biol 2014; 10:e1003664. [PMID: 25078380 PMCID: PMC4117418 DOI: 10.1371/journal.pcbi.1003664] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 04/25/2014] [Indexed: 12/12/2022] Open
Abstract
A model-based gating strategy is developed for sorting cells and analyzing populations of single cells. The strategy, named CCAST, for Clustering, Classification and Sorting Tree, identifies a gating strategy for isolating homogeneous subpopulations from a heterogeneous population of single cells using a data-derived decision tree representation that can be applied to cell sorting. Because CCAST does not rely on expert knowledge, it removes human bias and variability when determining the gating strategy. It combines any clustering algorithm with silhouette measures to identify underlying homogeneous subpopulations, then applies recursive partitioning techniques to generate a decision tree that defines the gating strategy. CCAST produces an optimal strategy for cell sorting by automating the selection of gating markers, the corresponding gating thresholds and gating sequence; all of these parameters are typically manually defined. Even though CCAST is optimized for cell sorting, it can be applied for the identification and analysis of homogeneous subpopulations among heterogeneous single cell data. We apply CCAST on single cell data from both breast cancer cell lines and normal human bone marrow. On the SUM159 breast cancer cell line data, CCAST indicates at least five distinct cell states based on two surface markers (CD24 and EPCAM) and provides a gating sorting strategy that produces more homogeneous subpopulations than previously reported. When applied to normal bone marrow data, CCAST reveals an efficient strategy for gating T-cells without prior knowledge of the major T-cell subtypes and the markers that best define them. On the normal bone marrow data, CCAST also reveals two major mature B-cell subtypes, namely CD123+ and CD123- cells, which were not revealed by manual gating but show distinct intracellular signaling responses. More generally, the CCAST framework could be used on other biological and non-biological high dimensional data types that are mixtures of unknown homogeneous subpopulations.
Collapse
Affiliation(s)
- Benedict Anchang
- Department of Radiology, Center for Cancer Systems Biology, Stanford University, Stanford, California, United States of America
| | - Mary T. Do
- Department of Radiology, Center for Cancer Systems Biology, Stanford University, Stanford, California, United States of America
| | - Xi Zhao
- Department of Radiology, Center for Cancer Systems Biology, Stanford University, Stanford, California, United States of America
| | - Sylvia K. Plevritis
- Department of Radiology, Center for Cancer Systems Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
44
|
Automated identification of stratifying signatures in cellular subpopulations. Proc Natl Acad Sci U S A 2014; 111:E2770-7. [PMID: 24979804 DOI: 10.1073/pnas.1408792111] [Citation(s) in RCA: 325] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Elucidation and examination of cellular subpopulations that display condition-specific behavior can play a critical contributory role in understanding disease mechanism, as well as provide a focal point for development of diagnostic criteria linking such a mechanism to clinical prognosis. Despite recent advancements in single-cell measurement technologies, the identification of relevant cell subsets through manual efforts remains standard practice. As new technologies such as mass cytometry increase the parameterization of single-cell measurements, the scalability and subjectivity inherent in manual analyses slows both analysis and progress. We therefore developed Citrus (cluster identification, characterization, and regression), a data-driven approach for the identification of stratifying subpopulations in multidimensional cytometry datasets. The methodology of Citrus is demonstrated through the identification of known and unexpected pathway responses in a dataset of stimulated peripheral blood mononuclear cells measured by mass cytometry. Additionally, the performance of Citrus is compared with that of existing methods through the analysis of several publicly available datasets. As the complexity of flow cytometry datasets continues to increase, methods such as Citrus will be needed to aid investigators in the performance of unbiased--and potentially more thorough--correlation-based mining and inspection of cell subsets nested within high-dimensional datasets.
Collapse
|
45
|
Richards AJ, Staats J, Enzor J, McKinnon K, Frelinger J, Denny TN, Weinhold KJ, Chan C. Setting objective thresholds for rare event detection in flow cytometry. J Immunol Methods 2014; 409:54-61. [PMID: 24727143 DOI: 10.1016/j.jim.2014.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Revised: 03/05/2014] [Accepted: 04/01/2014] [Indexed: 12/11/2022]
Abstract
The accurate identification of rare antigen-specific cytokine positive cells from peripheral blood mononuclear cells (PBMC) after antigenic stimulation in an intracellular staining (ICS) flow cytometry assay is challenging, as cytokine positive events may be fairly diffusely distributed and lack an obvious separation from the negative population. Traditionally, the approach by flow operators has been to manually set a positivity threshold to partition events into cytokine-positive and cytokine-negative. This approach suffers from subjectivity and inconsistency across different flow operators. The use of statistical clustering methods does not remove the need to find an objective threshold between between positive and negative events since consistent identification of rare event subsets is highly challenging for automated algorithms, especially when there is distributional overlap between the positive and negative events ("smear"). We present a new approach, based on the Fβ measure, that is similar to manual thresholding in providing a hard cutoff, but has the advantage of being determined objectively. The performance of this algorithm is compared with results obtained by expert visual gating. Several ICS data sets from the External Quality Assurance Program Oversight Laboratory (EQAPOL) proficiency program were used to make the comparisons. We first show that visually determined thresholds are difficult to reproduce and pose a problem when comparing results across operators or laboratories, as well as problems that occur with the use of commonly employed clustering algorithms. In contrast, a single parameterization for the Fβ method performs consistently across different centers, samples, and instruments because it optimizes the precision/recall tradeoff by using both negative and positive controls.
Collapse
Affiliation(s)
- Adam J Richards
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA; Duke Center for AIDS Research, Duke University, Durham, NC, USA; Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA.
| | - Janet Staats
- Duke Center for AIDS Research, Duke University, Durham, NC, USA; Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA; Department of Surgery, Duke University Medical Center, Durham, NC, USA
| | - Jennifer Enzor
- Duke Center for AIDS Research, Duke University, Durham, NC, USA; Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA; Department of Surgery, Duke University Medical Center, Durham, NC, USA
| | | | - Jacob Frelinger
- Institute for Genome Sciences and Policy, Duke University, NC, USA
| | - Thomas N Denny
- Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA; Duke Human Vaccine Institute, Duke University, Durham, NC, USA
| | - Kent J Weinhold
- Duke Center for AIDS Research, Duke University, Durham, NC, USA; Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA; Department of Surgery, Duke University Medical Center, Durham, NC, USA
| | - Cliburn Chan
- Department of Biostatistics & Bioinformatics, Duke University, Durham, NC, USA; Duke Center for AIDS Research, Duke University, Durham, NC, USA; Duke External Quality Assurance Program Oversight Laboratory, Duke University, Durham, NC, USA
| |
Collapse
|
46
|
Naim I, Datta S, Rebhahn J, Cavenaugh JS, Mosmann TR, Sharma G. SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 1: algorithm design. Cytometry A 2014; 85:408-21. [PMID: 24677621 PMCID: PMC4238829 DOI: 10.1002/cyto.a.22446] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 11/08/2013] [Accepted: 01/02/2013] [Indexed: 01/05/2023]
Abstract
We present a model-based clustering method, SWIFT (Scalable Weighted Iterative Flow-clustering Technique), for digesting high-dimensional large-sized datasets obtained via modern flow cytometry into more compact representations that are well-suited for further automated or manual analysis. Key attributes of the method include the following: (a) the analysis is conducted in the multidimensional space retaining the semantics of the data, (b) an iterative weighted sampling procedure is utilized to maintain modest computational complexity and to retain discrimination of extremely small subpopulations (hundreds of cells from datasets containing tens of millions), and (c) a splitting and merging procedure is incorporated in the algorithm to preserve distinguishability between biologically distinct populations, while still providing a significant compaction relative to the original data. This article presents a detailed algorithmic description of SWIFT, outlining the application-driven motivations for the different design choices, a discussion of computational complexity of the different steps, and results obtained with SWIFT for synthetic data and relatively simple experimental data that allow validation of the desirable attributes. A companion paper (Part 2) highlights the use of SWIFT, in combination with additional computational tools, for more challenging biological problems.
Collapse
Affiliation(s)
- Iftekhar Naim
- Department of Computer Science, University of Rochester, Rochester, New York
| | | | | | | | | | | |
Collapse
|
47
|
Mosmann TR, Naim I, Rebhahn J, Datta S, Cavenaugh JS, Weaver JM, Sharma G. SWIFT-scalable clustering for automated identification of rare cell populations in large, high-dimensional flow cytometry datasets, part 2: biological evaluation. Cytometry A 2014; 85:422-33. [PMID: 24532172 PMCID: PMC4238823 DOI: 10.1002/cyto.a.22445] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 11/15/2013] [Accepted: 01/02/2014] [Indexed: 01/27/2023]
Abstract
A multistage clustering and data processing method, SWIFT (detailed in a companion manuscript), has been developed to detect rare subpopulations in large, high-dimensional flow cytometry datasets. An iterative sampling procedure initially fits the data to multidimensional Gaussian distributions, then splitting and merging stages use a criterion of unimodality to optimize the detection of rare subpopulations, to converge on a consistent cluster number, and to describe non-Gaussian distributions. Probabilistic assignment of cells to clusters, visualization, and manipulation of clusters by their cluster medians, facilitate application of expert knowledge using standard flow cytometry programs. The dual problems of rigorously comparing similar complex samples, and enumerating absent or very rare cell subpopulations in negative controls, were solved by assigning cells in multiple samples to a cluster template derived from a single or combined sample. Comparison of antigen-stimulated and control human peripheral blood cell samples demonstrated that SWIFT could identify biologically significant subpopulations, such as rare cytokine-producing influenza-specific T cells. A sensitivity of better than one part per million was attained in very large samples. Results were highly consistent on biological replicates, yet the analysis was sensitive enough to show that multiple samples from the same subject were more similar than samples from different subjects. A companion manuscript (Part 1) details the algorithmic development of SWIFT. © 2014 The Authors. Published by Wiley Periodicals Inc.
Collapse
Affiliation(s)
- Tim R Mosmann
- David H. Smith Center for Vaccine Biology and Immunology, University of Rochester Medical Center, University of Rochester, Rochester, New York
| | | | | | | | | | | | | |
Collapse
|
48
|
Finak G, Jiang W, Krouse K, Wei C, Sanz I, Phippard D, Asare A, De Rosa SC, Self S, Gottardo R. High-throughput flow cytometry data normalization for clinical trials. Cytometry A 2013; 85:277-86. [PMID: 24382714 DOI: 10.1002/cyto.a.22433] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Revised: 11/18/2013] [Accepted: 12/13/2013] [Indexed: 01/08/2023]
Abstract
Flow cytometry datasets from clinical trials generate very large datasets and are usually highly standardized, focusing on endpoints that are well defined apriori. Staining variability of individual makers is not uncommon and complicates manual gating, requiring the analyst to adapt gates for each sample, which is unwieldy for large datasets. It can lead to unreliable measurements, especially if a template-gating approach is used without further correction to the gates. In this article, a computational framework is presented for normalizing the fluorescence intensity of multiple markers in specific cell populations across samples that is suitable for high-throughput processing of large clinical trial datasets. Previous approaches to normalization have been global and applied to all cells or data with debris removed. They provided no mechanism to handle specific cell subsets. This approach integrates tightly with the gating process so that normalization is performed during gating and is local to the specific cell subsets exhibiting variability. This improves peak alignment and the performance of the algorithm. The performance of this algorithm is demonstrated on two clinical trial datasets from the HIV Vaccine Trials Network (HVTN) and the Immune Tolerance Network (ITN). In the ITN data set we show that local normalization combined with template gating can account for sample-to-sample variability as effectively as manual gating. In the HVTN dataset, it is shown that local normalization mitigates false-positive vaccine response calls in an intracellular cytokine staining assay. In both datasets, local normalization performs better than global normalization. The normalization framework allows the use of template gates even in the presence of sample-to-sample staining variability, mitigates the subjectivity and bias of manual gating, and decreases the time necessary to analyze large datasets.
Collapse
Affiliation(s)
- Greg Finak
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109
| | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer T cells. Proc Natl Acad Sci U S A 2013; 110:19030-5. [PMID: 24191009 DOI: 10.1073/pnas.1318322110] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Defining and characterizing pathologies of the immune system requires precise and accurate quantification of abundances and functions of cellular subsets via cytometric studies. At this time, data analysis relies on manual gating, which is a major source of variability in large-scale studies. We devised an automated, user-guided method, X-Cyt, which specializes in rapidly and robustly identifying targeted populations of interest in large data sets. We first applied X-Cyt to quantify CD4(+) effector and central memory T cells in 236 samples, demonstrating high concordance with manual analysis (r = 0.91 and 0.95, respectively) and superior performance to other available methods. We then quantified the rare mucosal associated invariant T cell population in 35 samples, achieving manual concordance of 0.98. Finally we characterized the population dynamics of invariant natural killer T (iNKT) cells, a particularly rare peripheral lymphocyte, in 110 individuals by assaying 19 markers. We demonstrated that although iNKT cell numbers and marker expression are highly variable in the population, iNKT abundance correlates with sex and age, and the expression of phenotypic and functional markers correlates closely with CD4 expression.
Collapse
|
50
|
Cron A, Gouttefangeas C, Frelinger J, Lin L, Singh SK, Britten CM, Welters MJP, van der Burg SH, West M, Chan C. Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples. PLoS Comput Biol 2013; 9:e1003130. [PMID: 23874174 PMCID: PMC3708855 DOI: 10.1371/journal.pcbi.1003130] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Accepted: 05/17/2013] [Indexed: 11/21/2022] Open
Abstract
Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing enrichment, and the ability to align cell subsets across multiple data samples for comparative analysis. In this manuscript, we develop hierarchical modeling extensions to the Dirichlet Process Gaussian Mixture Model (DPGMM) approach we have previously described for cell subset identification, and show that the hierarchical DPGMM (HDPGMM) naturally generates an aligned data model that captures both commonalities and variations across multiple samples. HDPGMM also increases the sensitivity to extremely low frequency events by sharing information across multiple samples analyzed simultaneously. We validate the accuracy and reproducibility of HDPGMM estimates of antigen-specific T cells on clinically relevant reference peripheral blood mononuclear cell (PBMC) samples with known frequencies of antigen-specific T cells. These cell samples take advantage of retrovirally TCR-transduced T cells spiked into autologous PBMC samples to give a defined number of antigen-specific T cells detectable by HLA-peptide multimer binding. We provide open source software that can take advantage of both multiple processors and GPU-acceleration to perform the numerically-demanding computations. We show that hierarchical modeling is a useful probabilistic approach that can provide a consistent labeling of cell subsets and increase the sensitivity of rare event detection in the context of quantifying antigen-specific immune responses.
Collapse
Affiliation(s)
- Andrew Cron
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
| | - Cécile Gouttefangeas
- Interfaculty Institute for Cell Biology, Department of Immunology, Eberhard Karls University, Tuebingen, Germany
| | - Jacob Frelinger
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Lin Lin
- Population Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Satwinder K. Singh
- Department of Clinical Oncology, Leiden University Medical Center, Leiden, The Netherlands
| | - Cedrik M. Britten
- Translational Oncology at the University Medical Center of the Johannes Gutenberg-University Mainz gGmbH, Mainz, Germany
| | - Marij J. P. Welters
- Department of Clinical Oncology, Leiden University Medical Center, Leiden, The Netherlands
| | - Sjoerd H. van der Burg
- Department of Clinical Oncology, Leiden University Medical Center, Leiden, The Netherlands
| | - Mike West
- Department of Statistical Science, Duke University, Durham, North Carolina, United States of America
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
| | - Cliburn Chan
- Program in Computational Biology and Bioinformatics, Duke University, Durham, North Carolina, United States of America
- Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina, United States of America
| |
Collapse
|