1
|
Liu P, Pan Y, Chang HC, Wang W, Fang Y, Xue X, Zou J, Toothaker JM, Olaloye O, Santiago EG, McCourt B, Mitsialis V, Presicce P, Kallapur SG, Snapper SB, Liu JJ, Tseng GC, Konnikova L, Liu S. Comprehensive evaluation and practical guideline of gating methods for high-dimensional cytometry data: manual gating, unsupervised clustering, and auto-gating. Brief Bioinform 2024; 26:bbae633. [PMID: 39656848 DOI: 10.1093/bib/bbae633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/13/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
Cytometry is an advanced technique for simultaneously identifying and quantifying many cell surface and intracellular proteins at a single-cell resolution. Analyzing high-dimensional cytometry data involves identifying and quantifying cell populations based on their marker expressions. This study provided a quantitative review and comparison of various ways to phenotype cellular populations within the cytometry data, including manual gating, unsupervised clustering, and supervised auto-gating. Six datasets from diverse species and sample types were included in the study, and manual gating with two hierarchical layers was used as the truth for evaluation. For manual gating, results from five researchers were compared to illustrate the gating consistency among different raters. For unsupervised clustering, 23 tools were quantitatively compared in terms of accuracy with the truth and computing cost. While no method outperformed all others, several tools, including PAC-MAN, CCAST, FlowSOM, flowClust, and DEPECHE, generally demonstrated strong performance. For supervised auto-gating methods, four algorithms were evaluated, where DeepCyTOF and CyTOF Linear Classifier performed the best. We further provided practical recommendations on prioritizing gating methods based on different application scenarios. This study offers comprehensive insights for biologists to understand diverse gating methods and choose the best-suited ones for their applications.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yuchen Pan
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX 77030, US
| | - Hung-Ching Chang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Wenjia Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yusi Fang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Xiangning Xue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jian Zou
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jessica M Toothaker
- Department of Immunology, University of Pittsburgh, 5051 Centre Avenue, Pittsburgh, PA 15213, US
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Oluwabunmi Olaloye
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | | | - Black McCourt
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Vanessa Mitsialis
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women's Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Pietro Presicce
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Suhas G Kallapur
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Scott B Snapper
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women's Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Jia-Jun Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
| | - George C Tseng
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
| | - Liza Konnikova
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
- Department of Obstetrics, Gynecology and Reproductive Sciences, Yale University, 333 Cedar Street, New Haven, CT 06510, US
- Department of Immunobiology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Human and Translational Immunology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Translational Biomedicine, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Center for Systems and Engineering Immunology, Yale University, 100 College St., New Haven, CT 06510, US
| | - Silvia Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
- Department of Pharmacology and Chemical Biology, School of Medicine, University of Pittsburgh, 200 Lothrop St., Pittsburgh, PA 15261, US
- Hillman Cancer Center, University of Pittsburgh, 5150 Centre Ave., Pittsburgh, PA 15232, US
| |
Collapse
|
2
|
Sun J, Choy D, Sompairac N, Jamshidi S, Mishto M, Kordasti S. ImmCellTyper facilitates systematic mass cytometry data analysis for deep immune profiling. eLife 2024; 13:RP95494. [PMID: 39240985 PMCID: PMC11379455 DOI: 10.7554/elife.95494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2024] Open
Abstract
Mass cytometry is a cutting-edge high-dimensional technology for profiling marker expression at the single-cell level, advancing clinical research in immune monitoring. Nevertheless, the vast data generated by cytometry by time-of-flight (CyTOF) poses a significant analytical challenge. To address this, we describe ImmCellTyper (https://github.com/JingAnyaSun/ImmCellTyper), a novel toolkit for CyTOF data analysis. This framework incorporates BinaryClust, an in-house developed semi-supervised clustering tool that automatically identifies main cell types. BinaryClust outperforms existing clustering tools in accuracy and speed, as shown in benchmarks with two datasets of approximately 4 million cells, matching the precision of manual gating by human experts. Furthermore, ImmCellTyper offers various visualisation and analytical tools, spanning from quality control to differential analysis, tailored to users' specific needs for a comprehensive CyTOF data analysis solution. The workflow includes five key steps: (1) batch effect evaluation and correction, (2) data quality control and pre-processing, (3) main cell lineage characterisation and quantification, (4) in-depth investigation of specific cell types; and (5) differential analysis of cell abundance and functional marker expression across study groups. Overall, ImmCellTyper combines expert biological knowledge in a semi-supervised approach to accurately deconvolute well-defined main cell lineages, while maintaining the potential of unsupervised methods to discover novel cell subsets, thus facilitating high-dimensional immune profiling.
Collapse
Affiliation(s)
- Jing Sun
- Centre for Inflammation Biology and Cancer Immunology & Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom
| | - Desmond Choy
- School of Cancer and Pharmaceutical Sciences, King's College London, London, United Kingdom
| | - Nicolas Sompairac
- School of Cancer and Pharmaceutical Sciences, King's College London, London, United Kingdom
| | - Shirin Jamshidi
- School of Cancer and Pharmaceutical Sciences, King's College London, London, United Kingdom
| | - Michele Mishto
- Centre for Inflammation Biology and Cancer Immunology & Peter Gorer Department of Immunobiology, King's College London, London, United Kingdom
- Research Group of Molecular Immunology, Francis Crick Institute, London, United Kingdom
| | - Shahram Kordasti
- School of Cancer and Pharmaceutical Sciences, King's College London, London, United Kingdom
- Haematology Department, Guy's Hospital, London, United Kingdom
- Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
3
|
Magrill J, Moldoveanu D, Gu J, Lajoie M, Watson IR. Mapping the single cell spatial immune landscapes of the melanoma microenvironment. Clin Exp Metastasis 2024; 41:301-312. [PMID: 38217840 PMCID: PMC11374855 DOI: 10.1007/s10585-023-10252-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 11/27/2023] [Indexed: 01/15/2024]
Abstract
Melanoma is a highly immunogenic malignancy with an elevated mutational burden, diffuse lymphocytic infiltration, and one of the highest response rates to immune checkpoint inhibitors (ICIs). However, over half of all late-stage patients treated with ICIs will either not respond or develop progressive disease. Spatial imaging technologies are being increasingly used to study the melanoma tumor microenvironment (TME). The goal of such studies is to understand the complex interplay between the stroma, melanoma cells, and immune cell-types as well as their association with treatment response. Investigators seeking a better understanding of the role of cell location within the TME and the importance of spatial expression of biomarkers are increasingly turning to highly multiplexed imaging approaches to more accurately measure immune infiltration as well as to quantify receptor-ligand interactions (such as PD-1 and PD-L1) and cell-cell contacts. CyTOF-IMC (Cytometry by Time of Flight - Imaging Mass Cytometry) has enabled high-dimensional profiling of melanomas, allowing researchers to identify complex cellular subpopulations and immune cell interactions with unprecedented resolution. Other spatial imaging technologies, such as multiplexed immunofluorescence and spatial transcriptomics, have revealed distinct patterns of immune cell infiltration, highlighting the importance of spatial relationships, and their impact in modulating immunotherapy responses. Overall, spatial imaging technologies are just beginning to transform our understanding of melanoma biology, providing new avenues for biomarker discovery and therapeutic development. These technologies hold great promise for advancing personalized medicine to improve patient outcomes in melanoma and other solid malignancies.
Collapse
Affiliation(s)
- Jamie Magrill
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montréal, QC, Canada
- Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Dan Moldoveanu
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montréal, QC, Canada
| | - Jiayao Gu
- Department of Human Genetics, McGill University, Montréal, QC, Canada
| | - Mathieu Lajoie
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montréal, QC, Canada
| | - Ian R Watson
- Rosalind and Morris Goodman Cancer Institute, McGill University, Montréal, QC, Canada.
- Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Department of Biochemistry, McGill University, Montréal, QC, Canada.
- Research Institute of the McGill University Health Centre, Montréal, QC, Canada.
| |
Collapse
|
4
|
Barbetta A, Bangerth S, Lee JTC, Rocque B, Roussos Torres ET, Kohli R, Akbari O, Emamaullee J. IMmuneCite: an integrated workflow for analysis of immune enriched spatial proteomic data. RESEARCH SQUARE 2024:rs.3.rs-4571625. [PMID: 39041033 PMCID: PMC11261960 DOI: 10.21203/rs.3.rs-4571625/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
Spatial proteomics enable detailed analysis of tissue at single cell resolution. However, creating reliable segmentation masks and assigning accurate cell phenotypes to discrete cellular phenotypes can be challenging. We introduce IMmuneCite, a computational framework for comprehensive image pre-processing and single-cell dataset creation, focused on defining complex immune landscapes when using spatial proteomics platforms. We demonstrate that IMmuneCite facilitates the identification of 32 discrete immune cell phenotypes using data from human liver samples while substantially reducing nonbiological cell clusters arising from co-localization of markers for different cell lineages. We established its versatility and ability to accommodate any antibody panel and different species by applying IMmuneCite to data from murine liver tissue. This approach enabled deep characterization of different functional states in each immune compartment, uncovering key features of the immune microenvironment in clinical liver transplantation and murine hepatocellular carcinoma. In conclusion, we demonstrated that IMmuneCite is a user-friendly, integrated computational platform that facilitates investigation of the immune microenvironment across species, while ensuring the creation of an immune focused, spatially resolved single-cell proteomic dataset to provide high fidelity, biologically relevant analyses.
Collapse
|
5
|
Dinalankara W, Ng DP, Marchionni L, Simonson PD. Comparison of three machine learning algorithms for classification of B-cell neoplasms using clinical flow cytometry data. CYTOMETRY. PART B, CLINICAL CYTOMETRY 2024; 106:282-293. [PMID: 38721890 DOI: 10.1002/cyto.b.22177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/22/2024] [Accepted: 04/12/2024] [Indexed: 05/18/2024]
Abstract
Multiparameter flow cytometry data is visually inspected by expert personnel as part of standard clinical disease diagnosis practice. This is a demanding and costly process, and recent research has demonstrated that it is possible to utilize artificial intelligence (AI) algorithms to assist in the interpretive process. Here we report our examination of three previously published machine learning methods for classification of flow cytometry data and apply these to a B-cell neoplasm dataset to obtain predicted disease subtypes. Each of the examined methods classifies samples according to specific disease categories using ungated flow cytometry data. We compare and contrast the three algorithms with respect to their architectures, and we report the multiclass classification accuracies and relative required computation times. Despite different architectures, two of the methods, flowCat and EnsembleCNN, had similarly good accuracies with relatively fast computational times. We note a speed advantage for EnsembleCNN, particularly in the case of addition of training data and retraining of the classifier.
Collapse
Affiliation(s)
- Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| | - David P Ng
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| | - Paul D Simonson
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
6
|
Zhang M, Zhang Y, Zhang J, Zhang J, Gao S, Li Z, Tao K, Liang X, Pan J, Zhu M. An automatic analysis and quality assurance method for lymphocyte subset identification. Clin Chem Lab Med 2024; 62:1411-1420. [PMID: 38217085 DOI: 10.1515/cclm-2023-1141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/20/2023] [Indexed: 01/15/2024]
Abstract
OBJECTIVES Lymphocyte subsets are the predictors of disease diagnosis, treatment, and prognosis. Determination of lymphocyte subsets is usually carried out by flow cytometry. Despite recent advances in flow cytometry analysis, most flow cytometry data can be challenging with manual gating, which is labor-intensive, time-consuming, and error-prone. This study aimed to develop an automated method to identify lymphocyte subsets. METHODS We propose a knowledge-driven combined with data-driven method which can gate automatically to achieve subset identification. To improve accuracy and stability, we have implemented a Loop Adjustment Gating to optimize the gating result of the lymphocyte population. Furthermore, we have incorporated an anomaly detection mechanism to issue warnings for samples that might not have been successfully analyzed, ensuring the quality of the results. RESULTS The evaluation showed a 99.2 % correlation between our method results and manual analysis with a dataset of 2,000 individual cases from lymphocyte subset assays. Our proposed method attained 97.7 % accuracy for all cases and 100 % for the high-confidence cases. With our automated method, 99.1 % of manual labor can be saved when reviewing only the low-confidence cases, while the average turnaround time required is only 29 s, reducing by 83.7 %. CONCLUSIONS Our proposed method can achieve high accuracy in flow cytometry data from lymphocyte subset assays. Additionally, it can save manual labor and reduce the turnaround time, making it have the potential for application in the laboratory.
Collapse
Affiliation(s)
- MinYang Zhang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - YaLi Zhang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JingWen Zhang
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JiaLi Zhang
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - SiYuan Gao
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - ZeChao Li
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - KangPei Tao
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - XiaoDan Liang
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - JianHua Pan
- Department of Clinical Hematology and Flow Cytometry Lab, Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| | - Min Zhu
- Department of Digital Management Center, Guangzhou KingMed Diagnostics Group Co., Ltd., Guangzhou Kingmed Center for Clinical Laboratory Co., Ltd., Guangzhou, Guandong, P.R. China
| |
Collapse
|
7
|
Ali KA, Shah RD, Dhar A, Myers NM, Nguyen C, Paul A, Mancuso JE, Scott Patterson A, Brody JP, Heiser D. Ex vivo discovery of synergistic drug combinations for hematologic malignancies. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100129. [PMID: 38101570 DOI: 10.1016/j.slasd.2023.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 11/13/2023] [Accepted: 12/09/2023] [Indexed: 12/17/2023]
Abstract
Combination therapies have improved outcomes for patients with acute myeloid leukemia (AML). However, these patients still have poor overall survival. Although many combination therapies are identified with high-throughput screening (HTS), these approaches are constrained to disease models that can be grown in large volumes (e.g., immortalized cell lines), which have limited translational utility. To identify more effective and personalized treatments, we need better strategies for screening and exploring potential combination therapies. Our objective was to develop an HTS platform for identifying effective combination therapies with highly translatable ex vivo disease models that use size-limited, primary samples from patients with leukemia (AML and myelodysplastic syndrome). We developed a system, ComboFlow, that comprises three main components: MiniFlow, ComboPooler, and AutoGater. MiniFlow conducts ex vivo drug screening with a miniaturized flow-cytometry assay that uses minimal amounts of patient sample to maximize throughput. ComboPooler incorporates computational methods to design efficient screens of pooled drug combinations. AutoGater is an automated gating classifier for flow cytometry that uses machine learning to rapidly analyze the large datasets generated by the assay. We used ComboFlow to efficiently screen more than 3000 drug combinations across 20 patient samples using only 6 million cells per patient sample. In this screen, ComboFlow identified the known synergistic combination of bortezomib and panobinostat. ComboFlow also identified a novel drug combination, dactinomycin and fludarabine, that synergistically killed leukemic cells in 35 % of AML samples. This combination also had limited effects in normal, hematopoietic progenitors. In conclusion, ComboFlow enables exploration of massive landscapes of drug combinations that were previously inaccessible in ex vivo models. We envision that ComboFlow can be used to discover more effective and personalized combination therapies for cancers amenable to ex vivo models.
Collapse
Affiliation(s)
- Kamran A Ali
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA; Department of Biomedical Engineering, University of California, Irvine, 3120 Natural Sciences II, Irvine, CA, 92697, USA.
| | - Reecha D Shah
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA
| | - Anukriti Dhar
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA
| | - Nina M Myers
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA
| | | | - Arisa Paul
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA
| | | | | | - James P Brody
- Department of Biomedical Engineering, University of California, Irvine, 3120 Natural Sciences II, Irvine, CA, 92697, USA
| | - Diane Heiser
- Notable Labs, 320 Hatch Dr, Foster City, CA, 94404, USA
| |
Collapse
|
8
|
Na S, Choo Y, Yoon TH, Paek E. CyGate Provides a Robust Solution for Automatic Gating of Single Cell Cytometry Data. Anal Chem 2023; 95:16918-16926. [PMID: 37946317 PMCID: PMC10666088 DOI: 10.1021/acs.analchem.3c03006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/12/2023] [Accepted: 10/23/2023] [Indexed: 11/12/2023]
Abstract
To gain a better understanding of the complex human immune system, it is necessary to measure and interpret numerous cellular protein expressions at the single cell level. Mass cytometry is a relatively new technology that offers unprecedented information about the protein expression of a single cell. Conversely, the analysis of high-dimensional and multiparametric mass cytometric data sets presents a new computational challenge. For instance, conventional "manual gating" analysis was inefficient and unreliable for multiparametric phenotyping of the heterogeneous immune cellular system; consequently, automated methods have been developed to address the high dimensionality of mass cytometry data and enhance the reproducibility of the analysis. Here, we present CyGate, a semiautomated method for classifying single cells into their respective cell types. CyGate learns a gating strategy from a reference data set, trains a model for cell classification, and then automatically analyzes additional data sets using the trained model. CyGate also supports the machine learning framework for the classification of "ungated" cells, which are typically disregarded by automated methods. CyGate's utility was demonstrated by its high performance in cell type classification and the lowest generalization error on various public data sets when compared to the state-of-the-art semiautomated methods. Notably, CyGate had the shortest execution time, allowing it to scale with a growing number of samples. CyGate is available at https://github.com/seungjinna/cygate.
Collapse
Affiliation(s)
- Seungjin Na
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
| | - Yujin Choo
- Department
of Artificial Intelligence, Hanyang University, Seoul 04763, Republic of Korea
| | - Tae Hyun Yoon
- Department
of Chemistry, College of Natural Sciences, Hanyang University, Seoul 04763, Republic
of Korea
- Institute
of Next Generation Material Design, Hanyang
University, Seoul 04763, Republic of Korea
- Yoon
Idea
Lab Co., Ltd., Seoul 04763, Republic of Korea
| | - Eunok Paek
- Institute
for Artificial Intelligence Research, Hanyang
University, Seoul 04763, Republic
of Korea
- Department
of Computer Science, Hanyang University, Seoul 04763, Republic of Korea
- Department
of Artificial Intelligence, Hanyang University, Seoul 04763, Republic of Korea
| |
Collapse
|
9
|
Yang Y, Wang K, Lu Z, Wang T, Wang X. Cytomulate: accurate and efficient simulation of CyTOF data. Genome Biol 2023; 24:262. [PMID: 37974276 PMCID: PMC10652542 DOI: 10.1186/s13059-023-03099-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 10/24/2023] [Indexed: 11/19/2023] Open
Abstract
Recently, many analysis tools have been devised to offer insights into data generated via cytometry by time-of-flight (CyTOF). However, objective evaluations of these methods remain absent as most evaluations are conducted against real data where the ground truth is generally unknown. In this paper, we develop Cytomulate, a reproducible and accurate simulation algorithm of CyTOF data, which could serve as a foundation for future method development and evaluation. We demonstrate that Cytomulate can capture various characteristics of CyTOF data and is superior in learning overall data distributions than single-cell RNA-seq-oriented methods such as scDesign2, Splatter, and generative models like LAMBDA.
Collapse
Affiliation(s)
- Yuqiu Yang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Kaiwen Wang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
| | - Zeyu Lu
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Tao Wang
- Quantitative Biomedical Research Center, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
- Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, TX, 75390, USA.
| | - Xinlei Wang
- Department of Statistics and Data Science, Southern Methodist University, Dallas, TX, 75275, USA.
- Department of Mathematics, University of Texas at Arlington, Arlington, 76019, USA.
- Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, 76019, USA.
| |
Collapse
|
10
|
Dutta S, Box AC, Li Y, Sardiu ME. Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach. Proteomics 2023; 23:e2200290. [PMID: 36852539 PMCID: PMC11503472 DOI: 10.1002/pmic.202200290] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 01/30/2023] [Accepted: 02/17/2023] [Indexed: 03/01/2023]
Abstract
The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a "wisdom of the crowd" approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Andrew C. Box
- Stowers Institute for Medical Research, Kansas City, Missouri, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, Kansas, USA
| | - Mihaela E. Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, Kansas, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
11
|
Robles EE, Jin Y, Smyth P, Scheuermann RH, Bui JD, Wang HY, Oak J, Qian Y. A cell-level discriminative neural network model for diagnosis of blood cancers. Bioinformatics 2023; 39:btad585. [PMID: 37756695 PMCID: PMC10563151 DOI: 10.1093/bioinformatics/btad585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 09/12/2023] [Accepted: 09/22/2023] [Indexed: 09/29/2023] Open
Abstract
MOTIVATION Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. RESULTS We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes sample-level training data and predicts the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. AVAILABILITY AND IMPLEMENTATION The source code of CSNN and datasets used in the experiments are publicly available on GitHub (http://github.com/erobl/csnn). Raw FCS files can be downloaded from FlowRepository (ID: FR-FCM-Z6YK).
Collapse
Affiliation(s)
- Edgar E Robles
- Department of Computer Science, University of California, Irvine, CA 92697, United States
| | - Ye Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Padhraic Smyth
- Department of Computer Science, University of California, Irvine, CA 92697, United States
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, United States
- Department of Pathology, University of California, San Diego, CA 92093, United States
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA 92037, United States
| | - Jack D Bui
- Department of Pathology, University of California, San Diego, CA 92093, United States
| | - Huan-You Wang
- Department of Pathology, University of California, San Diego, CA 92093, United States
| | - Jean Oak
- Department of Pathology, Stanford University, Stanford, CA 94305, United States
| | - Yu Qian
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, United States
| |
Collapse
|
12
|
Puccio S, Grillo G, Alvisi G, Scirgolea C, Galletti G, Mazza EMC, Consiglio A, De Simone G, Licciulli F, Lugli E. CRUSTY: a versatile web platform for the rapid analysis and visualization of high-dimensional flow cytometry data. Nat Commun 2023; 14:5102. [PMID: 37666818 PMCID: PMC10477295 DOI: 10.1038/s41467-023-40790-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 08/10/2023] [Indexed: 09/06/2023] Open
Abstract
Flow cytometry (FCM) can investigate dozens of parameters from millions of cells and hundreds of specimens in a short time and at a reasonable cost, but the amount of data that is generated is considerable. Computational approaches are useful to identify novel subpopulations and molecular biomarkers, but generally require deep expertize in bioinformatics and the use of different platforms. To overcome these limitations, we introduce CRUSTY, an interactive, user-friendly webtool incorporating the most popular algorithms for FCM data analysis, and capable of visualizing graphical and tabular results and automatically generating publication-quality figures within minutes. CRUSTY also hosts an interactive interface for the exploration of results in real time. Thus, CRUSTY enables a large number of users to mine complex datasets and reduce the time required for data exploration and interpretation. CRUSTY is accessible at https://crusty.humanitas.it/ .
Collapse
Affiliation(s)
- Simone Puccio
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
- Institute of Genetic and Biomedical Research, UoS Milan, National Research Council, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| | - Giorgio Grillo
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Giorgia Alvisi
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Caterina Scirgolea
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Giovanni Galletti
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
- School of Biological Sciences, Department of Molecular Biology, University of California San Diego, San Diego, CA, USA
| | - Emilia Maria Cristina Mazza
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Arianna Consiglio
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Gabriele De Simone
- Flow Cytometry Core, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Enrico Lugli
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| |
Collapse
|
13
|
Berson E, Gajera CR, Phongpreecha T, Perna A, Bukhari SA, Becker M, Chang AL, De Francesco D, Espinosa C, Ravindra NG, Postupna N, Latimer CS, Shively CA, Register TC, Craft S, Montine KS, Fox EJ, Keene CD, Bendall SC, Aghaeepour N, Montine TJ. Cross-species comparative analysis of single presynapses. Sci Rep 2023; 13:13849. [PMID: 37620363 PMCID: PMC10449792 DOI: 10.1038/s41598-023-40683-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 08/16/2023] [Indexed: 08/26/2023] Open
Abstract
Comparing brain structure across species and regions enables key functional insights. Leveraging publicly available data from a novel mass cytometry-based method, synaptometry by time of flight (SynTOF), we applied an unsupervised machine learning approach to conduct a comparative study of presynapse molecular abundance across three species and three brain regions. We used neural networks and their attractive properties to model complex relationships among high dimensional data to develop a unified, unsupervised framework for comparing the profile of more than 4.5 million single presynapses among normal human, macaque, and mouse samples. An extensive validation showed the feasibility of performing cross-species comparison using SynTOF profiling. Integrative analysis of the abundance of 20 presynaptic proteins revealed near-complete separation between primates and mice involving synaptic pruning, cellular energy, lipid metabolism, and neurotransmission. In addition, our analysis revealed a strong overlap between the presynaptic composition of human and macaque in the cerebral cortex and neostriatum. Our unique approach illuminates species- and region-specific variation in presynapse molecular composition.
Collapse
Affiliation(s)
- Eloïse Berson
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Chandresh R Gajera
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
| | - Thanaphong Phongpreecha
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Amalia Perna
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
| | - Syed A Bukhari
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
| | - Martin Becker
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Alan L Chang
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Davide De Francesco
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Camilo Espinosa
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Neal G Ravindra
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Nadia Postupna
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Caitlin S Latimer
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Carol A Shively
- Department of Pathology/Comparative Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Thomas C Register
- Department of Pathology/Comparative Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Suzanne Craft
- Department of Internal Medicine-Geriatrics, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Kathleen S Montine
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
| | - Edward J Fox
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA
| | - C Dirk Keene
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA, USA
| | - Sean C Bendall
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Stanford University, Stanford, CA, USA
- Department of Pediatrics, Stanford University, Stanford, CA, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Thomas J Montine
- Department of Pathology, Stanford University, 300 Pasteur Dr., Stanford, CA, 94304, USA.
| |
Collapse
|
14
|
Abstract
Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about biology and disease. The advent of these technologies has prompted the development of computational tools to process and visualize the complex data. In this review, we outline the steps of single-cell and spatial proteomics analysis pipelines. In addition to describing available methods, we highlight benchmarking studies that have identified advantages and pitfalls of the currently available computational toolkits. As these technologies continue to advance, robust analysis tools should be developed in tandem to take full advantage of the potential biological insights provided by these data.
Collapse
Affiliation(s)
- Sophia M Guldberg
- Department of Otolaryngology-Head and Neck Surgery and Department of Microbiology and Immunology, University of California, San Francisco, California, USA;
- Biomedical Sciences Graduate Program, University of California, San Francisco, California, USA
- Gladstone-UCSF Institute for Genomic Immunology, San Francisco, California, USA
| | - Trine Line Hauge Okholm
- Department of Otolaryngology-Head and Neck Surgery and Department of Microbiology and Immunology, University of California, San Francisco, California, USA;
- Gladstone-UCSF Institute for Genomic Immunology, San Francisco, California, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, USA
| | - Elizabeth E McCarthy
- Department of Otolaryngology-Head and Neck Surgery and Department of Microbiology and Immunology, University of California, San Francisco, California, USA;
- Biomedical Sciences Graduate Program, University of California, San Francisco, California, USA
- Institute for Human Genetics; Division of Rheumatology, Department of Medicine; Medical Scientist Training Program; and Biological and Medical Informatics Graduate Program, University of California, San Francisco, California, USA
| | - Matthew H Spitzer
- Department of Otolaryngology-Head and Neck Surgery and Department of Microbiology and Immunology, University of California, San Francisco, California, USA;
- Gladstone-UCSF Institute for Genomic Immunology, San Francisco, California, USA
- Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, California, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, California, USA
- Chan Zuckerberg Biohub, San Francisco, California, USA
| |
Collapse
|
15
|
Robinson JP, Ostafe R, Iyengar SN, Rajwa B, Fischer R. Flow Cytometry: The Next Revolution. Cells 2023; 12:1875. [PMID: 37508539 PMCID: PMC10378642 DOI: 10.3390/cells12141875] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/06/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
Unmasking the subtleties of the immune system requires both a comprehensive knowledge base and the ability to interrogate that system with intimate sensitivity. That task, to a considerable extent, has been handled by an iterative expansion in flow cytometry methods, both in technological capability and also in accompanying advances in informatics. As the field of fluorescence-based cytomics matured, it reached a technological barrier at around 30 parameter analyses, which stalled the field until spectral flow cytometry created a fundamental transformation that will likely lead to the potential of 100 simultaneous parameter analyses within a few years. The simultaneous advance in informatics has now become a watershed moment for the field as it competes with mature systematic approaches such as genomics and proteomics, allowing cytomics to take a seat at the multi-omics table. In addition, recent technological advances try to combine the speed of flow systems with other detection methods, in addition to fluorescence alone, which will make flow-based instruments even more indispensable in any biological laboratory. This paper outlines current approaches in cell analysis and detection methods, discusses traditional and microfluidic sorting approaches as well as next-generation instruments, and provides an early look at future opportunities that are likely to arise.
Collapse
Affiliation(s)
- J Paul Robinson
- Department of Basic Medical Sciences, Purdue University, West Lafayette, IN 47907, USA
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN 47907, USA
| | - Raluca Ostafe
- Molecular Evolution, Protein Engineering and Production Facility (PI4D), Purdue University, West Lafayette, IN 47907, USA
| | | | - Bartek Rajwa
- Bindley Bioscience Center, Purdue University, West Lafayette, IN 47907, USA
| | - Rainer Fischer
- Department of Comparative Pathobiology, College of Veterinary Medicine, Purdue University, West Lafayette, IN 47907, USA
- Purdue Institute of Inflammation, Immunology and Infectious Diseases, Purdue University, West Lafayette, IN 47907, USA
| |
Collapse
|
16
|
Baldzhieva A, Burnusuzov HA, Murdjeva MA, Dimcheva TD, Taskov HB. A concise review of flow cytometric methods for minimal residual disease assessment in childhood B-cell precursor acute lymphoblastic leukemia. Folia Med (Plovdiv) 2023; 65:355-361. [PMID: 38351809 DOI: 10.3897/folmed.65.e96440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/04/2023] [Indexed: 02/16/2024] Open
Abstract
Minimal residual disease refers to a leukemia cell population that is resistant to chemotherapy or radiotherapy and leads to disease relapse. The assessment of MRD is crucial for making an accurate prognosis of the disease and for the choice of optimal treatment strategy. Here, we review the advantages and disadvantages of the available genetic and phenotypic methods and focus on the multiparametric flow cytometry as a promising method with greater sensitivity, speed, and standardization options. In addition, we discuss how the application of automated data analysis outweighs the use of complex combinations of windows and gates in classical analysis, thus eliminating subjective evaluation.
Collapse
|
17
|
Lui A, Lee J, Thall PF, Daher M, Rezvani K, Basar R. A Bayesian feature allocation model for identifying cell subpopulations using CyTOF data. J R Stat Soc Ser C Appl Stat 2023; 72:718-738. [PMID: 37325776 PMCID: PMC10264057 DOI: 10.1093/jrsssc/qlad029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 04/02/2023] [Indexed: 06/17/2023]
Abstract
A Bayesian feature allocation model (FAM) is presented for identifying cell subpopulations based on multiple samples of cell surface or intracellular marker expression level data obtained by cytometry by time of flight (CyTOF). Cell subpopulations are characterized by differences in marker expression patterns, and cells are clustered into subpopulations based on their observed expression levels. A model-based method is used to construct cell clusters within each sample by modeling subpopulations as latent features, using a finite Indian buffet process. Non-ignorable missing data due to technical artifacts in mass cytometry instruments are accounted for by defining a static missingship mechanism. In contrast with conventional cell clustering methods, which cluster observed marker expression levels separately for each sample, the FAM-based method can be applied simultaneously to multiple samples, and also identify important cell subpopulations likely to be otherwise missed. The proposed FAM-based method is applied to jointly analyse three CyTOF datasets to study natural killer (NK) cells. Because the subpopulations identified by the FAM may define novel NK cell subsets, this statistical analysis may provide useful information about the biology of NK cells and their potential role in cancer immunotherapy which may lead, in turn, to development of improved NK cell therapies.
Collapse
Affiliation(s)
- Arthur Lui
- Department of Statistics, Baskin School of Engineering, University of California Santa Cruz, 1156 High Street, Santa Cruz, CA, 95064, USA
| | - Juhee Lee
- Department of Statistics, University of California at Santa Cruz, Santa Cruz, CA, USA
| | - Peter F Thall
- Department of Biostatistics, M.D. Anderson Cancer Center, Houston, TX, USA
| | - May Daher
- Department of Stem Cell Transplantation and Cellular Therapy, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Katy Rezvani
- Department of Stem Cell Transplantation and Cellular Therapy, M.D. Anderson Cancer Center, Houston, TX, USA
| | - Rafet Basar
- Department of Stem Cell Transplantation and Cellular Therapy, M.D. Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
18
|
Gazeau S, Deng X, Ooi HK, Mostefai F, Hussin J, Heffernan J, Jenner AL, Craig M. The race to understand immunopathology in COVID-19: Perspectives on the impact of quantitative approaches to understand within-host interactions. IMMUNOINFORMATICS (AMSTERDAM, NETHERLANDS) 2023; 9:100021. [PMID: 36643886 PMCID: PMC9826539 DOI: 10.1016/j.immuno.2023.100021] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/16/2022] [Accepted: 01/03/2023] [Indexed: 01/09/2023]
Abstract
The COVID-19 pandemic has revealed the need for the increased integration of modelling and data analysis to public health, experimental, and clinical studies. Throughout the first two years of the pandemic, there has been a concerted effort to improve our understanding of the within-host immune response to the SARS-CoV-2 virus to provide better predictions of COVID-19 severity, treatment and vaccine development questions, and insights into viral evolution and the impacts of variants on immunopathology. Here we provide perspectives on what has been accomplished using quantitative methods, including predictive modelling, population genetics, machine learning, and dimensionality reduction techniques, in the first 26 months of the COVID-19 pandemic approaches, and where we go from here to improve our responses to this and future pandemics.
Collapse
Affiliation(s)
- Sonia Gazeau
- Department of Mathematics and Statistics, Université de Montréal, Montréal, Canada
- Sainte-Justine University Hospital Research Centre, Montréal, Canada
| | - Xiaoyan Deng
- Department of Mathematics and Statistics, Université de Montréal, Montréal, Canada
- Sainte-Justine University Hospital Research Centre, Montréal, Canada
| | - Hsu Kiang Ooi
- Digital Technologies Research Centre, National Research Council Canada, Toronto, Canada
| | - Fatima Mostefai
- Montréal Heart Institute Research Centre, Montréal, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Canada
| | - Julie Hussin
- Montréal Heart Institute Research Centre, Montréal, Canada
- Department of Medicine, Faculty of Medicine, Université de Montréal, Montréal, Canada
| | - Jane Heffernan
- Modelling Infection and Immunity Lab, Mathematics Statistics, York University, Toronto, Canada
- Centre for Disease Modelling (CDM), Mathematics Statistics, York University, Toronto, Canada
| | - Adrianne L Jenner
- School of Mathematical Sciences, Queensland University of Technology, Brisbane Australia
| | - Morgan Craig
- Department of Mathematics and Statistics, Université de Montréal, Montréal, Canada
- Sainte-Justine University Hospital Research Centre, Montréal, Canada
| |
Collapse
|
19
|
Rao NS, Ermann Lundberg L, Tomasson J, Tullberg C, Brink DP, Palmkron SB, van Niel EWJ, Håkansson S, Carlquist M. Non-inhibitory levels of oxygen during cultivation increase freeze-drying stress tolerance in Limosilactobacillus reuteri DSM 17938. Front Microbiol 2023; 14:1152389. [PMID: 37125176 PMCID: PMC10140318 DOI: 10.3389/fmicb.2023.1152389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/22/2023] [Indexed: 05/02/2023] Open
Abstract
The physiological effects of oxygen on Limosilactobacillus reuteri DSM 17938 during cultivation and the ensuing properties of the freeze-dried probiotic product was investigated. On-line flow cytometry and k-means clustering gating was used to follow growth and viability in real time during cultivation. The bacterium tolerated aeration at 500 mL/min, with a growth rate of 0.74 ± 0.13 h-1 which demonstrated that low levels of oxygen did not influence the growth kinetics of the bacterium. Modulation of the redox metabolism was, however, seen already at non-inhibitory oxygen levels by 1.5-fold higher production of acetate and 1.5-fold lower ethanol production. A significantly higher survival rate in the freeze-dried product was observed for cells cultivated in presence of oxygen compared to absence of oxygen (61.8% ± 2.4% vs. 11.5% ± 4.3%), coinciding with a higher degree of unsaturated fatty acids (UFA:SFA ratio of 10 for air sparged vs. 3.59 for N2 sparged conditions.). Oxygen also resulted in improved bile tolerance and boosted 5'nucleotidase activity (370 U/L vs. 240 U/L in N2 sparged conditions) but lower tolerance to acidic conditions compared bacteria grown under complete anaerobic conditions which survived up to 90 min of exposure at pH 2. Overall, our results indicate the controlled supply of oxygen during production may be used as means for probiotic activity optimization of L. reuteri DSM 17938.
Collapse
Affiliation(s)
- Nikhil Seshagiri Rao
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- *Correspondence: Nikhil Seshagiri Rao,
| | - Ludwig Ermann Lundberg
- The Department of Molecular Sciences, Uppsala BioCenter, Swedish University of Agricultural Sciences, Uppsala, Sweden
- BioGaia, SE-103 64, Stockholm, Sweden
| | | | - Cecilia Tullberg
- Division of Biotechnology, Department of Chemistry, Lund University, Lund, Sweden
| | - Daniel P. Brink
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | - Shuai Bai Palmkron
- Department of Food Technology, Engineering and Nutrition, Department of Chemistry, Lund University, Lund, Sweden
| | - Ed W. J. van Niel
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | - Sebastian Håkansson
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- BioGaia, SE-241 38, Eslöv, Sweden
| | - Magnus Carlquist
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- Magnus Carlquist,
| |
Collapse
|
20
|
Barbetta A, Rocque B, Sarode D, Bartlett JA, Emamaullee J. Revisiting transplant immunology through the lens of single-cell technologies. Semin Immunopathol 2023; 45:91-109. [PMID: 35980400 PMCID: PMC9386203 DOI: 10.1007/s00281-022-00958-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 07/06/2022] [Indexed: 11/03/2022]
Abstract
Solid organ transplantation (SOT) is the standard of care for end-stage organ disease. The most frequent complication of SOT involves allograft rejection, which may occur via T cell- and/or antibody-mediated mechanisms. Diagnosis of rejection in the clinical setting requires an invasive biopsy as there are currently no reliable biomarkers to detect rejection episodes. Likewise, it is virtually impossible to identify patients who exhibit operational tolerance and may be candidates for reduced or complete withdrawal of immunosuppression. Emerging single-cell technologies, including cytometry by time-of-flight (CyTOF), imaging mass cytometry, and single-cell RNA sequencing, represent a new opportunity for deep characterization of pathogenic immune populations involved in both allograft rejection and tolerance in clinical samples. These techniques enable examination of both individual cellular phenotypes and cell-to-cell interactions, ultimately providing new insights into the complex pathophysiology of allograft rejection. However, working with these large, highly dimensional datasets requires expertise in advanced data processing and analysis using computational biology techniques. Machine learning algorithms represent an optimal strategy to analyze and create predictive models using these complex datasets and will likely be essential for future clinical application of patient level results based on single-cell data. Herein, we review the existing literature on single-cell techniques in the context of SOT.
Collapse
Affiliation(s)
- Arianna Barbetta
- Department of Surgery, Division of Abdominal Organ Transplant, University of Southern California, 1510 San Pablo St. Suite 412, Los Angeles, CA, 90033, USA
- University of Southern California, Los Angeles, CA, USA
| | - Brittany Rocque
- Department of Surgery, Division of Abdominal Organ Transplant, University of Southern California, 1510 San Pablo St. Suite 412, Los Angeles, CA, 90033, USA
- University of Southern California, Los Angeles, CA, USA
| | - Deepika Sarode
- Department of Surgery, Division of Abdominal Organ Transplant, University of Southern California, 1510 San Pablo St. Suite 412, Los Angeles, CA, 90033, USA
- University of Southern California, Los Angeles, CA, USA
| | - Johanna Ascher Bartlett
- Pediatric Gastroenterology, Hepatology and Nutrition, Children's Hospital of Los Angeles, Los Angeles, CA, USA
| | - Juliet Emamaullee
- Department of Surgery, Division of Abdominal Organ Transplant, University of Southern California, 1510 San Pablo St. Suite 412, Los Angeles, CA, 90033, USA.
- University of Southern California, Los Angeles, CA, USA.
- Division of Hepatobiliary and Abdominal Organ Transplantation Surgery, Children's Hospital Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
21
|
Bruckmann C, Müller S, zu Siederdissen CH. Automatic, fast, hierarchical, and non-overlapping gating of flow cytometric data with flowEMMiv2. Comput Struct Biotechnol J 2022; 20:6473-6489. [DOI: 10.1016/j.csbj.2022.11.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
22
|
Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 2022; 13:5455. [PMID: 36114209 PMCID: PMC9481560 DOI: 10.1038/s41467-022-33136-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. It is widely used in computer science, bioscience, geoscience, and economics. Although the state-of-the-art partition-based and connectivity-based clustering methods have been developed, weak connectivity and heterogeneous density in data impede their effectiveness. In this work, we propose a boundary-seeking Clustering algorithm using the local Direction Centrality (CDC). It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We demonstrate the validity of CDC by detecting complex structured clusters in challenging synthetic datasets, identifying cell types from single-cell RNA sequencing (scRNA-seq) and mass cytometry (CyTOF) data, recognizing speakers on voice corpuses, and testifying on various types of real-world benchmarks. Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. Here the authors propose a local direction centrality clustering algorithm that copes with heterogeneous density and weak connectivity issues.
Collapse
|
23
|
Kwong HS, Nadarajah S. Finite mixtures of multivariate skew Student’s t distributions with independent logistic skewing functions. BRAZ J PROBAB STAT 2022. [DOI: 10.1214/22-bjps542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Hok Shing Kwong
- Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
| | - Saralees Nadarajah
- Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
24
|
Wang X, Xu Z, Hu H, Zhou X, Zhang Y, Lafyatis R, Chen K, Huang H, Ding Y, Duerr RH, Chen W. SECANT: a biology-guided semi-supervised method for clustering, classification, and annotation of single-cell multi-omics. PNAS NEXUS 2022; 1:pgac165. [PMID: 36157595 PMCID: PMC9491696 DOI: 10.1093/pnasnexus/pgac165] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 08/16/2022] [Indexed: 01/29/2023]
Abstract
The recent advance of single cell sequencing (scRNA-seq) technology such as Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq) allows researchers to quantify cell surface protein abundance and RNA expression simultaneously at single cell resolution. Although CITE-seq and other similar technologies have gained enormous popularity, novel methods for analyzing this type of single cell multi-omics data are in urgent need. A limited number of available tools utilize data-driven approach, which may undermine the biological importance of surface protein data. In this study, we developed SECANT, a biology-guided SEmi-supervised method for Clustering, classification, and ANnoTation of single-cell multi-omics. SECANT is used to analyze CITE-seq data, or jointly analyze CITE-seq and scRNA-seq data. The novelties of SECANT include (1) using confident cell type label identified from surface protein data as guidance for cell clustering, (2) providing general annotation of confident cell types for each cell cluster, (3) utilizing cells with uncertain or missing cell type label to increase performance, and (4) accurate prediction of confident cell types for scRNA-seq data. Besides, as a model-based approach, SECANT can quantify the uncertainty of the results through easily interpretable posterior probability, and our framework can be potentially extended to handle other types of multi-omics data. We successfully demonstrated the validity and advantages of SECANT via simulation studies and analysis of public and in-house datasets from multiple tissues. We believe this new method will be complementary to existing tools for characterizing novel cell types and make new biological discoveries using single-cell multi-omics data.
Collapse
Affiliation(s)
- Xinjun Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zhongli Xu
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA 15224, USA
- School of Medicine, Tsinghua University, Beijing 100084, China
| | - Haoran Hu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Xueping Zhou
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Yanfu Zhang
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Robert Lafyatis
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Kong Chen
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Heng Huang
- Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Richard H Duerr
- Department of Medicine, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Wei Chen
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213, USA
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA 15224, USA
| |
Collapse
|
25
|
Garg T, Weiss CR, Sheth RA. Techniques for Profiling the Cellular Immune Response and Their Implications for Interventional Oncology. Cancers (Basel) 2022; 14:3628. [PMID: 35892890 PMCID: PMC9332307 DOI: 10.3390/cancers14153628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/19/2022] [Accepted: 07/20/2022] [Indexed: 12/07/2022] Open
Abstract
In recent years there has been increased interest in using the immune contexture of the primary tumors to predict the patient's prognosis. The tumor microenvironment of patients with cancers consists of different types of lymphocytes, tumor-infiltrating leukocytes, dendritic cells, and others. Different technologies can be used for the evaluation of the tumor microenvironment, all of which require a tissue or cell sample. Image-guided tissue sampling is a cornerstone in the diagnosis, stratification, and longitudinal evaluation of therapeutic efficacy for cancer patients receiving immunotherapies. Therefore, interventional radiologists (IRs) play an essential role in the evaluation of patients treated with systemically administered immunotherapies. This review provides a detailed description of different technologies used for immune assessment and analysis of the data collected from the use of these technologies. The detailed approach provided herein is intended to provide the reader with the knowledge necessary to not only interpret studies containing such data but also design and apply these tools for clinical practice and future research studies.
Collapse
Affiliation(s)
- Tushar Garg
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Clifford R. Weiss
- Division of Vascular and Interventional Radiology, Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; (T.G.); (C.R.W.)
| | - Rahul A. Sheth
- Department of Interventional Radiology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
26
|
Zhang W, Li I, Reticker-Flynn NE, Good Z, Chang S, Samusik N, Saumyaa S, Li Y, Zhou X, Liang R, Kong CS, Le QT, Gentles AJ, Sunwoo JB, Nolan GP, Engleman EG, Plevritis SK. Identification of cell types in multiplexed in situ images by combining protein expression and spatial information using CELESTA. Nat Methods 2022; 19:759-769. [PMID: 35654951 PMCID: PMC9728133 DOI: 10.1038/s41592-022-01498-z] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 04/15/2022] [Indexed: 12/21/2022]
Abstract
Advances in multiplexed in situ imaging are revealing important insights in spatial biology. However, cell type identification remains a major challenge in imaging analysis, with most existing methods involving substantial manual assessment and subjective decisions for thousands of cells. We developed an unsupervised machine learning algorithm, CELESTA, which identifies the cell type of each cell, individually, using the cell's marker expression profile and, when needed, its spatial information. We demonstrate the performance of CELESTA on multiplexed immunofluorescence images of colorectal cancer and head and neck squamous cell carcinoma (HNSCC). Using the cell types identified by CELESTA, we identify tissue architecture associated with lymph node metastasis in HNSCC, and validate our findings in an independent cohort. By coupling our spatial analysis with single-cell RNA-sequencing data on proximal sections of the same specimens, we identify cell-cell crosstalk associated with lymph node metastasis, demonstrating the power of CELESTA to facilitate identification of clinically relevant interactions.
Collapse
Affiliation(s)
- Weiruo Zhang
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Irene Li
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
- Cancer Biology Program, School of Medicine, Stanford University, Stanford, CA, USA
| | | | - Zinaida Good
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Serena Chang
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Division of Head and Neck Surgery, Department of Otolaryngology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Nikolay Samusik
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Saumyaa Saumyaa
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Division of Head and Neck Surgery, Department of Otolaryngology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Yuanyuan Li
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Xin Zhou
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Rachel Liang
- Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Christina S Kong
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
| | - Quynh-Thu Le
- Department of Radiation Oncology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Andrew J Gentles
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA
- Division of Head and Neck Surgery, Department of Otolaryngology, School of Medicine, Stanford University, Stanford, CA, USA
- Department of Medicine, Quantitative Sciences Unit, Stanford University, Stanford, CA, USA
| | - John B Sunwoo
- Stanford Cancer Institute, Stanford University, Stanford, CA, USA
- Division of Head and Neck Surgery, Department of Otolaryngology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Garry P Nolan
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Edgar G Engleman
- Department of Pathology, School of Medicine, Stanford University, Stanford, CA, USA
| | - Sylvia K Plevritis
- Department of Biomedical Data Science, School of Medicine, Stanford University, Stanford, CA, USA.
- Department of Radiology, School of Medicine, Stanford University, Stanford, CA, USA.
| |
Collapse
|
27
|
Cheung M, Campbell JJ, Thomas RJ, Braybrook J, Petzing J. Assessment of Automated Flow Cytometry Data Analysis Tools within Cell and Gene Therapy Manufacturing. Int J Mol Sci 2022; 23:ijms23063224. [PMID: 35328645 PMCID: PMC8955358 DOI: 10.3390/ijms23063224] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/03/2022] [Accepted: 03/11/2022] [Indexed: 12/21/2022] Open
Abstract
Flow cytometry is widely used within the manufacturing of cell and gene therapies to measure and characterise cells. Conventional manual data analysis relies heavily on operator judgement, presenting a major source of variation that can adversely impact the quality and predictive potential of therapies given to patients. Computational tools have the capacity to minimise operator variation and bias in flow cytometry data analysis; however, in many cases, confidence in these technologies has yet to be fully established mirrored by aspects of regulatory concern. Here, we employed synthetic flow cytometry datasets containing controlled population characteristics of separation, and normal/skew distributions to investigate the accuracy and reproducibility of six cell population identification tools, each of which implement different unsupervised clustering algorithms: Flock2, flowMeans, FlowSOM, PhenoGraph, SPADE3 and SWIFT (density-based, k-means, self-organising map, k-nearest neighbour, deterministic k-means, and model-based clustering, respectively). We found that outputs from software analysing the same reference synthetic dataset vary considerably and accuracy deteriorates as the cluster separation index falls below zero. Consequently, as clusters begin to merge, the flowMeans and Flock2 software platforms struggle to identify target clusters more than other platforms. Moreover, the presence of skewed cell populations resulted in poor performance from SWIFT, though FlowSOM, PhenoGraph and SPADE3 were relatively unaffected in comparison. These findings illustrate how novel flow cytometry synthetic datasets can be utilised to validate a range of automated cell identification methods, leading to enhanced confidence in the data quality of automated cell characterisations and enumerations.
Collapse
Affiliation(s)
- Melissa Cheung
- Centre for Biological Engineering, Loughborough University, Loughborough LE11 3TU, Leicestershire, UK; (R.J.T.); (J.P.)
- Correspondence:
| | - Jonathan J. Campbell
- National Measurement Laboratory, LGC, Queens Road, Teddington TW11 0LY, Middlesex, UK; (J.J.C.); (J.B.)
| | - Robert J. Thomas
- Centre for Biological Engineering, Loughborough University, Loughborough LE11 3TU, Leicestershire, UK; (R.J.T.); (J.P.)
| | - Julian Braybrook
- National Measurement Laboratory, LGC, Queens Road, Teddington TW11 0LY, Middlesex, UK; (J.J.C.); (J.B.)
| | - Jon Petzing
- Centre for Biological Engineering, Loughborough University, Loughborough LE11 3TU, Leicestershire, UK; (R.J.T.); (J.P.)
| |
Collapse
|
28
|
Heins A, Hoang MD, Weuster‐Botz D. Advances in automated real-time flow cytometry for monitoring of bioreactor processes. Eng Life Sci 2022; 22:260-278. [PMID: 35382548 PMCID: PMC8961054 DOI: 10.1002/elsc.202100082] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 10/22/2021] [Accepted: 10/27/2021] [Indexed: 12/18/2022] Open
Abstract
Flow cytometry and its technological possibilities have greatly advanced in the past decade as analysis tool for single cell properties and population distributions of different cell types in bioreactors. Along the way, some solutions for automated real-time flow cytometry (ART-FCM) were developed for monitoring of bioreactor processes without operator interference over extended periods with variable sampling frequency. However, there is still great potential for ART-FCM to evolve and possibly become a standard application in bioprocess monitoring and process control. This review first addresses different components of an ART-FCM, including the sampling device, the sample-processing unit, the unit for sample delivery to the flow cytometer and the settings for measurement of pre-processed samples. Also, available algorithms are presented for automated data analysis of multi-parameter fluorescence datasets derived from ART-FCM experiments. Furthermore, challenges are discussed for integration of fluorescence-activated cell sorting into an ART-FCM setup for isolation and separation of interesting subpopulations that can be further characterized by for instance omics-methods. As the application of ART-FCM is especially of interest for bioreactor process monitoring, including investigation of population heterogeneity and automated process control, a summary of already existing setups for these purposes is given. Additionally, the general future potential of ART-FCM is addressed.
Collapse
Affiliation(s)
- Anna‐Lena Heins
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| | - Manh Dat Hoang
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| | - Dirk Weuster‐Botz
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| |
Collapse
|
29
|
Hu Z, Bhattacharya S, Butte AJ. Application of Machine Learning for Cytometry Data. Front Immunol 2022; 12:787574. [PMID: 35046945 PMCID: PMC8761933 DOI: 10.3389/fimmu.2021.787574] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/14/2021] [Indexed: 01/23/2023] Open
Abstract
Modern cytometry technologies present opportunities to profile the immune system at a single-cell resolution with more than 50 protein markers, and have been widely used in both research and clinical settings. The number of publicly available cytometry datasets is growing. However, the analysis of cytometry data remains a bottleneck due to its high dimensionality, large cell numbers, and heterogeneity between datasets. Machine learning techniques are well suited to analyze complex cytometry data and have been used in multiple facets of cytometry data analysis, including dimensionality reduction, cell population identification, and sample classification. Here, we review the existing machine learning applications for analyzing cytometry data and highlight the importance of publicly available cytometry data that enable researchers to develop and validate machine learning methods.
Collapse
Affiliation(s)
- Zicheng Hu
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
- Department of Microbiology and Immunology, University of California, San Francisco, San Francisco, CA, United States
| | - Sanchita Bhattacharya
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| | - Atul J. Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
30
|
Greene E, Finak G, D'Amico LA, Bhardwaj N, Church CD, Morishima C, Ramchurren N, Taube JM, Nghiem PT, Cheever MA, Fling SP, Gottardo R. New interpretable machine-learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy. PATTERNS (NEW YORK, N.Y.) 2021; 2:100372. [PMID: 34950900 PMCID: PMC8672150 DOI: 10.1016/j.patter.2021.100372] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 08/09/2021] [Accepted: 09/30/2021] [Indexed: 12/14/2022]
Abstract
We introduce a new method for single-cell cytometry studies, FAUST, which performs unbiased cell population discovery and annotation. FAUST processes experimental data on a per-sample basis and returns biologically interpretable cell phenotypes, making it well suited for the analysis of complex datasets. We provide simulation studies that compare FAUST with existing methodology, exemplifying its strength. We apply FAUST to data from a Merkel cell carcinoma anti-PD-1 trial and discover pre-treatment effector memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. Using FAUST, we then validate these correlates in cryopreserved peripheral blood mononuclear cell samples from the same study, as well as an independent CyTOF dataset from a published metastatic melanoma trial. Finally, we show how FAUST's phenotypes can be used to perform cross-study data integration in the presence of diverse staining panels. Together, these results establish FAUST as a powerful new approach for unbiased discovery in single-cell cytometry. An interpretable machine-learning method for cytometry data analysis is developed Using this, candidate biomarkers of response to therapy are identified and visualized The method is used to validate our findings on two additional cytometry datasets It is shown how to integrate findings across datasets with heterogeneous marker panels
Our article introduces a new method, FAUST, which combines novel algorithms for clustering, cluster matching, variable selection, and feature selection. While these algorithms were developed for application to high-dimensional single-cell data—and our article validates this application area with multiple case studies—they are general purpose and can be applied to any collection of related real-valued matrices one wishes to partition. Some useful features of these algorithms to the broader data science community include the following: they estimate the number of clusters across a dataset, they can be applied independently to each matrix in the set of matrices one wishes to cluster, they match clusters across matrices on the basis of data-driven annotations, and the annotations are interpretable in relation to the initial measurement variables. We provide an open-source implementation of our method, https://github.com/RGLab/FAUST, targeting data structures optimized for use in cytometry data analysis.
Collapse
Affiliation(s)
- Evan Greene
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Biostatistics Bioinformatics and Epidemiology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Greg Finak
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Biostatistics Bioinformatics and Epidemiology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Leonard A D'Amico
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Cancer Immunotherapy Trials Network, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Nina Bhardwaj
- Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai New York, NY, USA
| | - Candice D Church
- Division of Dermatology, Department of Medicine University of Washington, Seattle, WA, USA
| | - Chihiro Morishima
- Division of Dermatology, Department of Medicine University of Washington, Seattle, WA, USA
| | - Nirasha Ramchurren
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Cancer Immunotherapy Trials Network, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Janis M Taube
- Bloomberg Kimmel Institute for Cancer Immunotherapy and the Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Paul T Nghiem
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Division of Dermatology, Department of Medicine University of Washington, Seattle, WA, USA
| | - Martin A Cheever
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Cancer Immunotherapy Trials Network, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Steven P Fling
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Cancer Immunotherapy Trials Network, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Biostatistics Bioinformatics and Epidemiology Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.,Centre Hospitalier Universitaire Vaudois et Université de Lausanne, Lausanne, Switzerland
| |
Collapse
|
31
|
Chang CC, Huang TH, Shueng PW, Chen SH, Chen CC, Lu CJ, Tseng YJ. Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:12499. [PMID: 34886225 PMCID: PMC8657249 DOI: 10.3390/ijerph182312499] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Revised: 11/21/2021] [Accepted: 11/25/2021] [Indexed: 12/16/2022]
Abstract
Despite a considerable expansion in the present therapeutic repertoire for other malignancy managements, mortality from head and neck cancer (HNC) has not significantly improved in recent decades. Moreover, the second primary cancer (SPC) diagnoses increased in patients with HNC, but studies providing evidence to support SPCs prediction in HNC are lacking. Several base classifiers are integrated forming an ensemble meta-classifier using a stacked ensemble method to predict SPCs and find out relevant risk features in patients with HNC. The balanced accuracy and area under the curve (AUC) are over 0.761 and 0.847, with an approximately 2% and 3% increase, respectively, compared to the best individual base classifier. Our study found the top six ensemble risk features, such as body mass index, primary site of HNC, clinical nodal (N) status, primary site surgical margins, sex, and pathologic nodal (N) status. This will help clinicians screen HNC survivors before SPCs occur.
Collapse
Affiliation(s)
- Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung 40201, Taiwan;
- Department of Information Management, Ming Chuan University, Taoyuan 33300, Taiwan
| | - Tse-Hung Huang
- Department of Traditional Chinese Medicine, Chang Gung Memorial Hospital, Keelung 20401, Taiwan;
- School of Traditional Chinese Medicine, Chang Gung University, Taoyuan 33300, Taiwan
- School of Nursing, National Taipei University of Nursing and Health Sciences, Taipei 11200, Taiwan
- Graduate Institute of Health Industry Technology, Chang Gung University of Science and Technology, Taoyuan 33300, Taiwan
| | - Pei-Wei Shueng
- Department of Radiology, Division of Radiation Oncology, Far Eastern Memorial Hospital, New Taipei 22060, Taiwan;
- Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei 22060, Taiwan
| | - Ssu-Han Chen
- Department of Industrial Engineering and Management, Ming Chi University of Technology, New Taipei 24330, Taiwan
- Center for Artificial Intelligence & Data Science, Ming Chi University of Technology, New Taipei 24330, Taiwan
| | - Chun-Chia Chen
- Institute of Medicine, Chung Shan Medical University, Taichung 40201, Taiwan
- Department of Surgery, Division of Plastic Surgery, Chung Shan Medical University Hospital, Taichung 40201, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei 242062, Taiwan;
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei 242062, Taiwan
| | - Yi-Ju Tseng
- Department of Information Management, National Central University, Taoyuan 32031, Taiwan;
| |
Collapse
|
32
|
Tinnevelt GH, Wouters K, Postma GJ, Folcarelli R, Jansen JJ. High-throughput single cell data analysis - A tutorial. Anal Chim Acta 2021; 1185:338872. [PMID: 34711307 DOI: 10.1016/j.aca.2021.338872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 06/28/2021] [Accepted: 07/21/2021] [Indexed: 11/30/2022]
Abstract
White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.
Collapse
Affiliation(s)
- Gerjen H Tinnevelt
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands.
| | - Kristiaan Wouters
- Department of Internal Medicine, Laboratory of Metabolism and Vascular Medicine, P.O. Box 616 (UNS50/14), 6200, MD, Maastricht, the Netherlands
| | - Geert J Postma
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| | - Rita Folcarelli
- Corbion, Arkelsedijk 46, 4206, AC, Gorinchem, the Netherlands
| | - Jeroen J Jansen
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| |
Collapse
|
33
|
Olusoji OD, Spaak JW, Holmes M, Neyens T, Aerts M, De Laender F. cyanoFilter: An R package to identify phytoplankton populations from flow cytometry data using cell pigmentation and granularity. Ecol Modell 2021. [DOI: 10.1016/j.ecolmodel.2021.109743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
34
|
Ionita M, Schretzenmair R, Jones D, Moore J, Wang LS, Rogers W. Tailor: Targeting heavy tails in flow cytometry data with fast, interpretable mixture modeling. Cytometry A 2021; 99:133-144. [PMID: 33476090 DOI: 10.1002/cyto.a.24307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 12/22/2020] [Accepted: 01/05/2021] [Indexed: 11/11/2022]
Abstract
Automated clustering workflows are increasingly used for the analysis of high parameter flow cytometry data. This trend calls for algorithms which are able to quickly process tens of millions of data points, to compare results across subjects or time points, and to provide easily actionable interpretations of the results. To this end, we created Tailor, a model-based clustering algorithm specialized for flow cytometry data. Our approach leverages a phenotype-aware binning scheme to provide a coarse model of the data, which is then refined using a multivariate Gaussian mixture model. We benchmark Tailor using a simulation study and two flow cytometry data sets, and show that the results are robust to moderate departures from normality and inter-sample variation. Moreover, Tailor provides automated, non-overlapping annotations of its clusters, which facilitates interpretation of results and downstream analysis. Tailor is released as an R package, and the source code is publicly available at www.github.com/matei-ionita/Tailor.
Collapse
Affiliation(s)
- Matei Ionita
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Richard Schretzenmair
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Derek Jones
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Jonni Moore
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Wade Rogers
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Corporate/Research, Still Pond Cytomics, West Chester, PA, USA
| |
Collapse
|
35
|
Béné MC, Lacombe F, Porwit A. Unsupervised flow cytometry analysis in hematological malignancies: A new paradigm. Int J Lab Hematol 2021; 43 Suppl 1:54-64. [PMID: 34288436 DOI: 10.1111/ijlh.13548] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 03/13/2021] [Accepted: 03/28/2021] [Indexed: 01/10/2023]
Abstract
Ever since hematopoietic cells became "events" enumerated and characterized in suspension by cell counters or flow cytometers, researchers and engineers have strived to refine the acquisition and display of the electronic signals generated. A large array of solutions was then developed to identify at best the numerous cell subsets that can be delineated, notably among hematopoietic cells. As instruments became more and more stable and robust, the focus moved to analytic software. Almost concomitantly, the capacity increased to use large panels (both with mass and classical cytometry) and to apply artificial intelligence/machine learning for their analysis. The combination of these concepts raised new analytical possibilities, opening an unprecedented field of subtle exploration for many conditions, including hematopoiesis and hematological disorders. In this review, the general concepts and progress achieved in the development of new analytical approaches for exploring high-dimensional data sets at the single-cell level will be described as they appeared over the past few years. A larger and more practical part will detail the various steps that need to be mastered, both in data acquisition and in the preanalytical check of data files. Finally, a step-by-step explanation of the solution in development to combine the Bioconductor clustering algorithm FlowSOM and the popular and widely used software Kaluza® (Beckman Coulter) will be presented. The aim of this review was to point out that the day when these progresses will reach routine hematology laboratories does not seem so far away.
Collapse
Affiliation(s)
- Marie C Béné
- Hematology Biology, Nantes University Hospital, Nantes, France.,CRCINA Inserm, Nantes, France
| | - Francis Lacombe
- Hematology Biology, Cytometry Department, Bordeaux University Hospital, Bordeaux, France
| | - Anna Porwit
- Department of Clinical Sciences, Oncology and Pathology, Faculty of Medicine, Lund University, Lund, Sweden.,Department of Clinical Genetics and Pathology, Skåne University Hospital, Lund, Sweden
| |
Collapse
|
36
|
Quintelier K, Couckuyt A, Emmaneel A, Aerts J, Saeys Y, Van Gassen S. Analyzing high-dimensional cytometry data using FlowSOM. Nat Protoc 2021; 16:3775-3801. [PMID: 34172973 DOI: 10.1038/s41596-021-00550-0] [Citation(s) in RCA: 72] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 03/31/2021] [Indexed: 02/06/2023]
Abstract
The dimensionality of cytometry data has strongly increased in the last decade, and in many situations the traditional manual downstream analysis becomes insufficient. The field is therefore slowly moving toward more automated approaches, and in this paper we describe the protocol for analyzing high-dimensional cytometry data using FlowSOM, a clustering and visualization algorithm based on a self-organizing map. FlowSOM is used to distinguish cell populations from cytometry data in an unsupervised way and can help to gain deeper insights in fields such as immunology and oncology. Since the original FlowSOM publication (2015), we have validated the tool on a wide variety of datasets, and to write this protocol, we made use of this experience to improve the user-friendliness of the package (e.g., comprehensive functions replacing commonly required scripts). Where the original paper focused mainly on the algorithm description, this protocol offers user guidelines on how to implement the procedure, detailed parameter descriptions and troubleshooting recommendations. The protocol provides clearly annotated R code, and is therefore relevant for all scientists interested in computational high-dimensional analyses without requiring a strong bioinformatics background. We demonstrate the complete workflow, starting from data preparation (such as compensation, transformation and quality control), including detailed discussion of the different FlowSOM parameters and visualization options, and concluding with how the results can be further used to answer biological questions, such as statistical comparison between groups of interest. An average FlowSOM analysis takes 1-3 h to complete, though quality issues can increase this time considerably.
Collapse
Affiliation(s)
- Katrien Quintelier
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Data Mining and Modeling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium.,Department of Pulmonary Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Artuur Couckuyt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Data Mining and Modeling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
| | - Annelies Emmaneel
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Data Mining and Modeling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
| | - Joachim Aerts
- Department of Pulmonary Medicine, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Yvan Saeys
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium.,Data Mining and Modeling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium
| | - Sofie Van Gassen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium. .,Data Mining and Modeling for Biomedicine Group, VIB Center for Inflammation Research, Ghent, Belgium.
| |
Collapse
|
37
|
Jin T, Zhang C, Liu F, Chen X, Liang G, Ren F, Liang S, Song C, Shi J, Qiu W, Jiang X, Li K, Xi L. On-Chip Multicolor Photoacoustic Imaging Flow Cytometry. Anal Chem 2021; 93:8134-8142. [PMID: 34048649 DOI: 10.1021/acs.analchem.0c05218] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
On-chip imaging flow cytometry has been widely used in cancer biology, immunology, microbiology, and drug discovery. Pure optical imaging combined with flow cytometry to derive chemical, structural, and morphological features of cells provides systematic insights into biological processes. However, due to the high concentration and strong optical attenuation of red blood cells, preprocessing is necessary for optical flow cytometry while dealing with whole blood. In this study, we develop an on-chip photoacoustic imaging flow cytometry (PAIFC), which combines multicolor high-speed photoacoustic microscopy and microfluidics for cell imaging. The device employs a micro-optical scanner to achieve a miniaturized outer size of 30 × 17 × 24 mm3 and ultrafast cross-sectional imaging at a frame rate of 1758 Hz and provides lateral and axial resolutions of 2.2 and 33 μm, respectively. Using a multicolor strategy, PAIFC is able to differentiate cells labeled by external contrast agents, detect melanoma cells with an endogenous contrast in whole blood, and image melanoma cells in blood samples from tumor-bearing mice. The results suggest that PAIFC has sufficient sensitivity and specificity for future cell-on-chip applications.
Collapse
Affiliation(s)
- Tian Jin
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Chen Zhang
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Fei Liu
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Xingxing Chen
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Guangru Liang
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Fei Ren
- School of Materials Science and Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Suzi Liang
- Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Shenzhen, Guangdong 518055, China
| | - Chaolong Song
- School of Mechanical Engineering and Electronic Information, China University of Geosciences (Wuhan), Wuhan, Hubei 430074, China
| | - Jianbing Shi
- School of Materials Science and Engineering, Beijing Institute of Technology, Beijing 100081, China
| | - Weibao Qiu
- Research Center for Biomedical Imaging, Shenzhen Institutes of Advanced Technology, Shenzhen, Guangdong 518055, China
| | - Xingyu Jiang
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Kai Li
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| | - Lei Xi
- Department of Biomedical Engineering, Southern University of Science and Technology, Shenzhen, Guangdong 518055, China
| |
Collapse
|
38
|
Gerber R, Robinson MD. Censcyt: censored covariates in differential abundance analysis in cytometry. BMC Bioinformatics 2021; 22:235. [PMID: 33971812 PMCID: PMC8108359 DOI: 10.1186/s12859-021-04125-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 04/08/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Innovations in single cell technologies have lead to a flurry of datasets and computational tools to process and interpret them, including analyses of cell composition changes and transition in cell states. The diffcyt workflow for differential discovery in cytometry data consist of several steps, including preprocessing, cell population identification and differential testing for an association with a binary or continuous covariate. However, the commonly measured quantity of survival time in clinical studies often results in a censored covariate where classical differential testing is inapplicable. RESULTS To overcome this limitation, multiple methods to directly include censored covariates in differential abundance analysis were examined with the use of simulation studies and a case study. Results show that multiple imputation based methods offer on-par performance with the Cox proportional hazards model in terms of sensitivity and error control, while offering flexibility to account for covariates. The tested methods are implemented in the R package censcyt as an extension of diffcyt and are available at https://bioconductor.org/packages/censcyt . CONCLUSION Methods for the direct inclusion of a censored variable as a predictor in GLMMs are a valid alternative to classical survival analysis methods, such as the Cox proportional hazard model, while allowing for more flexibility in the differential analysis.
Collapse
Affiliation(s)
- Reto Gerber
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Mark D Robinson
- Department of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, Zurich, Switzerland.
| |
Collapse
|
39
|
Dai Y, Xu A, Li J, Wu L, Yu S, Chen J, Zhao W, Sun XJ, Huang J. CytoTree: an R/Bioconductor package for analysis and visualization of flow and mass cytometry data. BMC Bioinformatics 2021; 22:138. [PMID: 33752602 PMCID: PMC7983272 DOI: 10.1186/s12859-021-04054-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 02/26/2021] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND The rapidly increasing dimensionality and throughput of flow and mass cytometry data necessitate new bioinformatics tools for analysis and interpretation, and the recently emerging single-cell-based algorithms provide a powerful strategy to meet this challenge. RESULTS Here, we present CytoTree, an R/Bioconductor package designed to analyze and interpret multidimensional flow and mass cytometry data. CytoTree provides multiple computational functionalities that integrate most of the commonly used techniques in unsupervised clustering and dimensionality reduction and, more importantly, support the construction of a tree-shaped trajectory based on the minimum spanning tree algorithm. A graph-based algorithm is also implemented to estimate the pseudotime and infer intermediate-state cells. We apply CytoTree to several examples of mass cytometry and time-course flow cytometry data on heterogeneity-based cytology and differentiation/reprogramming experiments to illustrate the practical utility achieved in a fast and convenient manner. CONCLUSIONS CytoTree represents a versatile tool for analyzing multidimensional flow and mass cytometry data and to producing heuristic results for trajectory construction and pseudotime estimation in an integrated workflow.
Collapse
Affiliation(s)
- Yuting Dai
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Aining Xu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Jianfeng Li
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Liang Wu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Shanhe Yu
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, 200 1st St SW, Rochester, MN, 55905, USA
| | - Weili Zhao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| | - Xiao-Jian Sun
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| | - Jinyan Huang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 197 Ruijin Er Road, Shanghai, 200025, China.
| |
Collapse
|
40
|
Cheung M, Campbell JJ, Whitby L, Thomas RJ, Braybrook J, Petzing J. Current trends in flow cytometry automated data analysis software. Cytometry A 2021; 99:1007-1021. [PMID: 33606354 DOI: 10.1002/cyto.a.24320] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/21/2021] [Accepted: 01/28/2021] [Indexed: 12/16/2022]
Abstract
Automated flow cytometry (FC) data analysis tools for cell population identification and characterization are increasingly being used in academic, biotechnology, pharmaceutical, and clinical laboratories. The development of these computational methods is designed to overcome reproducibility and process bottleneck issues in manual gating, however, the take-up of these tools remains (anecdotally) low. Here, we performed a comprehensive literature survey of state-of-the-art computational tools typically published by research, clinical, and biomanufacturing laboratories for automated FC data analysis and identified popular tools based on literature citation counts. Dimensionality reduction methods ranked highly, such as generic t-distributed stochastic neighbor embedding (t-SNE) and its initial Matlab-based implementation for cytometry data viSNE. Software with graphical user interfaces also ranked highly, including PhenoGraph, SPADE1, FlowSOM, and Citrus, with unsupervised learning methods outnumbering supervised learning methods, and algorithm type popularity spread across K-Means, hierarchical, density-based, model-based, and other classes of clustering algorithms. Additionally, to illustrate the actual use typically within clinical spaces alongside frequent citations, a survey issued by UK NEQAS Leucocyte Immunophenotyping to identify software usage trends among clinical laboratories was completed. The survey revealed 53% of laboratories have not yet taken up automated cell population identification methods, though among those that have, Infinicyt software is the most frequently identified. Survey respondents considered data output quality to be the most important factor when using automated FC data analysis software, followed by software speed and level of technical support. This review found differences in software usage between biomedical institutions, with tools for discovery, data exploration, and visualization more popular in academia, whereas automated tools for specialized targeted analysis that apply supervised learning methods were more used in clinical settings.
Collapse
Affiliation(s)
- Melissa Cheung
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | | | - Liam Whitby
- UK NEQAS for Leucocyte Immunophenotyping, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Robert J Thomas
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | - Julian Braybrook
- National Measurement Laboratory, LGC, Teddington, United Kingdom
| | - Jon Petzing
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| |
Collapse
|
41
|
Del Barrio E, Inouzhe H, Loubes JM, Matrán C, Mayo-Íscar A. optimalFlow: optimal transport approach to flow cytometry gating and population matching. BMC Bioinformatics 2020; 21:479. [PMID: 33109072 PMCID: PMC7590740 DOI: 10.1186/s12859-020-03795-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 10/01/2020] [Indexed: 11/12/2022] Open
Abstract
Background Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating. Results We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow, a Bioconductor R package at https://bioconductor.org/packages/optimalFlow. Conclusions optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.
Collapse
Affiliation(s)
- Eustasio Del Barrio
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Hristo Inouzhe
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain. .,IMUVA, Calle Paseo de Belén, Valladolid, Spain.
| | - Jean-Michel Loubes
- Université Paul Sabatier, Route de Narbonne, Toulouse, France.,IMT, Route de Narbonne, Toulouse, France
| | - Carlos Matrán
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Agustín Mayo-Íscar
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| |
Collapse
|
42
|
Keyes TJ, Domizi P, Lo YC, Nolan GP, Davis KL. A Cancer Biologist's Primer on Machine Learning Applications in High-Dimensional Cytometry. Cytometry A 2020; 97:782-799. [PMID: 32602650 PMCID: PMC7416435 DOI: 10.1002/cyto.a.24158] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2019] [Revised: 03/10/2020] [Accepted: 05/12/2020] [Indexed: 12/11/2022]
Abstract
The application of machine learning and artificial intelligence to high-dimensional cytometry data sets has increasingly become a staple of bioinformatic data analysis over the past decade. This is especially true in the field of cancer biology, where protocols for collecting multiparameter single-cell data in a high-throughput fashion are rapidly developed. As the use of machine learning methodology in cytometry becomes increasingly common, there is a need for cancer biologists to understand the basic theory and applications of a variety of algorithmic tools for analyzing and interpreting cytometry data. We introduce the reader to several keystone machine learning-based analytic approaches with an emphasis on defining key terms and introducing a conceptual framework for making translational or clinically relevant discoveries. The target audience consists of cancer cell biologists and physician-scientists interested in applying these tools to their own data, but who may have limited training in bioinformatics. © 2020 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Timothy J Keyes
- Medical Scientist Training Program, Stanford University School of Medicine, Stanford, California
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
| | - Pablo Domizi
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
| | - Yu-Chen Lo
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
| | - Garry P Nolan
- Department of Microbiology and Immunology | Baxter Laboratory for Stem Cell Biology, Stanford University School of Medicine, Stanford, California
| | - Kara L Davis
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California
| |
Collapse
|
43
|
Stassen SV, Siu DMD, Lee KCM, Ho JWK, So HKH, Tsia KK. PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells. Bioinformatics 2020; 36:2778-2786. [PMID: 31971583 PMCID: PMC7203756 DOI: 10.1093/bioinformatics/btaa042] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/24/2019] [Accepted: 01/16/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Joshua W K Ho
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | | - Kevin K Tsia
- Department of Electrical and Electronic Engineering
| |
Collapse
|
44
|
Liu P, Liu S, Fang Y, Xue X, Zou J, Tseng G, Konnikova L. Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data. Front Cell Dev Biol 2020; 8:234. [PMID: 32411698 PMCID: PMC7198724 DOI: 10.3389/fcell.2020.00234] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 03/20/2020] [Indexed: 11/13/2022] Open
Abstract
The progress in the field of high-dimensional cytometry has greatly increased the number of markers that can be simultaneously analyzed producing datasets with large numbers of parameters. Traditional biaxial manual gating might not be optimal for such datasets. To overcome this, a large number of automated tools have been developed to aid with cellular clustering of multi-dimensional datasets. Here were review two large categories of such tools; unsupervised and supervised clustering tools. After a thorough review of the popularity and use of each of the available unsupervised clustering tools, we focus on the top six tools to discuss their advantages and limitations. Furthermore, we employ a publicly available dataset to directly compare the usability, speed, and relative effectiveness of the available unsupervised and supervised tools. Finally, we discuss the current challenges for existing methods and future direction for the new generation of cell type identification approaches.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Silvia Liu
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yusi Fang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Xiangning Xue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jian Zou
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - George Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Liza Konnikova
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
45
|
Lucchesi S, Furini S, Medaglini D, Ciabattini A. From Bivariate to Multivariate Analysis of Cytometric Data: Overview of Computational Methods and Their Application in Vaccination Studies. Vaccines (Basel) 2020; 8:E138. [PMID: 32244919 PMCID: PMC7157606 DOI: 10.3390/vaccines8010138] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 03/17/2020] [Accepted: 03/18/2020] [Indexed: 12/15/2022] Open
Abstract
Flow and mass cytometry are used to quantify the expression of multiple extracellular or intracellular molecules on single cells, allowing the phenotypic and functional characterization of complex cell populations. Multiparametric flow cytometry is particularly suitable for deep analysis of immune responses after vaccination, as it allows to measure the frequency, the phenotype, and the functional features of antigen-specific cells. When many parameters are investigated simultaneously, it is not feasible to analyze all the possible bi-dimensional combinations of marker expression with classical manual analysis and the adoption of advanced automated tools to process and analyze high-dimensional data sets becomes necessary. In recent years, the development of many tools for the automated analysis of multiparametric cytometry data has been reported, with an increasing record of publications starting from 2014. However, the use of these tools has been preferentially restricted to bioinformaticians, while few of them are routinely employed by the biomedical community. Filling the gap between algorithms developers and final users is fundamental for exploiting the advantages of computational tools in the analysis of cytometry data. The potentialities of automated analyses range from the improvement of the data quality in the pre-processing steps up to the unbiased, data-driven examination of complex datasets using a variety of algorithms based on different approaches. In this review, an overview of the automated analysis pipeline is provided, spanning from the pre-processing phase to the automated population analysis. Analysis based on computational tools might overcame both the subjectivity of manual gating and the operator-biased exploration of expected populations. Examples of applications of automated tools that have successfully improved the characterization of different cell populations in vaccination studies are also presented.
Collapse
Affiliation(s)
- Simone Lucchesi
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| | - Simone Furini
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| | - Donata Medaglini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| | - Annalisa Ciabattini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| |
Collapse
|
46
|
Qi Y, Fang Y, Sinclair DR, Guo S, Alberich-Jorda M, Lu J, Tenen DG, Kharas MG, Pyne S. High-speed automatic characterization of rare events in flow cytometric data. PLoS One 2020; 15:e0228651. [PMID: 32045462 PMCID: PMC7012421 DOI: 10.1371/journal.pone.0228651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 01/21/2020] [Indexed: 11/19/2022] Open
Abstract
A new computational framework for FLow cytometric Analysis of Rare Events (FLARE) has been developed specifically for fast and automatic identification of rare cell populations in very large samples generated by platforms like multi-parametric flow cytometry. Using a hierarchical Bayesian model and information-sharing via parallel computation, FLARE rapidly explores the high-dimensional marker-space to detect highly rare populations that are consistent across multiple samples. Further it can focus within specified regions of interest in marker-space to detect subpopulations with desired precision.
Collapse
Affiliation(s)
- Yuan Qi
- Department of Computer Science, Purdue University, West Lafayette, IN, United States of America
- Department of Statistics, Purdue University, West Lafayette, IN, United States of America
- * E-mail: (YQ); (SP)
| | - Youhan Fang
- Department of Computer Science, Purdue University, West Lafayette, IN, United States of America
| | - David R. Sinclair
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- Department of Health Policy and Management, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Shangqin Guo
- Department of Cell Biology, Yale University School of Medicine, New Haven, CT, United States of America
| | | | - Jun Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, United States of America
- Yale Stem Cell Center, Yale University School of Medicine, New Haven, CT, United States of America
| | - Daniel G. Tenen
- Center for Life Sciences, Harvard Medical School, Boston, MA, United States of America
- Harvard Stem Cell Institute, Harvard Medical School, Boston, MA, United States of America
- Cancer Science Institute, National University of Singapore, Singapore, Singapore
| | - Michael G. Kharas
- Molecular Pharmacology Program, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America
| | - Saumyadipta Pyne
- Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- * E-mail: (YQ); (SP)
| |
Collapse
|
47
|
Abstract
The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a novel approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the event that the observed data are generated by the model, one conditions on the event that the model generates data close to the observed data, in a distributional sense. When closeness is defined in terms of relative entropy, the resulting "coarsened" posterior can be approximated by simply tempering the likelihood-that is, by raising the likelihood to a fractional power-thus, inference can usually be implemented via standard algorithms, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data using mixture models and autoregressive models of unknown order.
Collapse
|
48
|
Liu X, Song W, Wong BY, Zhang T, Yu S, Lin GN, Ding X. A comparison framework and guideline of clustering methods for mass cytometry data. Genome Biol 2019; 20:297. [PMID: 31870419 PMCID: PMC6929440 DOI: 10.1186/s13059-019-1917-7] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/09/2019] [Indexed: 12/31/2022] Open
Abstract
Background With the expanding applications of mass cytometry in medical research, a wide variety of clustering methods, both semi-supervised and unsupervised, have been developed for data analysis. Selecting the optimal clustering method can accelerate the identification of meaningful cell populations. Result To address this issue, we compared three classes of performance measures, “precision” as external evaluation, “coherence” as internal evaluation, and stability, of nine methods based on six independent benchmark datasets. Seven unsupervised methods (Accense, Xshift, PhenoGraph, FlowSOM, flowMeans, DEPECHE, and kmeans) and two semi-supervised methods (Automated Cell-type Discovery and Classification and linear discriminant analysis (LDA)) are tested on six mass cytometry datasets. We compute and compare all defined performance measures against random subsampling, varying sample sizes, and the number of clusters for each method. LDA reproduces the manual labels most precisely but does not rank top in internal evaluation. PhenoGraph and FlowSOM perform better than other unsupervised tools in precision, coherence, and stability. PhenoGraph and Xshift are more robust when detecting refined sub-clusters, whereas DEPECHE and FlowSOM tend to group similar clusters into meta-clusters. The performances of PhenoGraph, Xshift, and flowMeans are impacted by increased sample size, but FlowSOM is relatively stable as sample size increases. Conclusion All the evaluations including precision, coherence, stability, and clustering resolution should be taken into synthetic consideration when choosing an appropriate tool for cytometry data analysis. Thus, we provide decision guidelines based on these characteristics for the general reader to more easily choose the most suitable clustering tools.
Collapse
Affiliation(s)
- Xiao Liu
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China
| | - Weichen Song
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, 200030, China
| | - Brandon Y Wong
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Ting Zhang
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China
| | - Shunying Yu
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, 200030, China
| | - Guan Ning Lin
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China. .,Shanghai Key Laboratory of Psychotic Disorders, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, 600 South Wanping Road, Shanghai, 200030, China.
| | - Xianting Ding
- State Key Laboratory of Oncogenes and Related Genes, Institute for Personalized Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, 1954 Huashan Road, Shanghai, 200030, China.
| |
Collapse
|
49
|
Ludwig J, Zu Siederdissen CH, Liu Z, Stadler PF, Müller S. flowEMMi: an automated model-based clustering tool for microbial cytometric data. BMC Bioinformatics 2019; 20:643. [PMID: 31815609 PMCID: PMC6902487 DOI: 10.1186/s12859-019-3152-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 10/10/2019] [Indexed: 12/17/2022] Open
Abstract
Background Flow cytometry (FCM) is a powerful single-cell based measurement method to ascertain multidimensional optical properties of millions of cells. FCM is widely used in medical diagnostics and health research. There is also a broad range of applications in the analysis of complex microbial communities. The main concern in microbial community analyses is to track the dynamics of microbial subcommunities. So far, this can be achieved with the help of time-consuming manual clustering procedures that require extensive user-dependent input. In addition, several tools have recently been developed by using different approaches which, however, focus mainly on the clustering of medical FCM data or of microbial samples with a well-known background, while much less work has been done on high-throughput, online algorithms for two-channel FCM. Results We bridge this gap with flowEMMi, a model-based clustering tool based on multivariate Gaussian mixture models with subsampling and foreground/background separation. These extensions provide a fast and accurate identification of cell clusters in FCM data, in particular for microbial community FCM data that are often affected by irrelevant information like technical noise, beads or cell debris. flowEMMi outperforms other available tools with regard to running time and information content of the clustering results and provides near-online results and optional heuristics to reduce the running-time further. Conclusions flowEMMi is a useful tool for the automated cluster analysis of microbial FCM data. It overcomes the user-dependent and time-consuming manual clustering procedure and provides consistent results with ancillary information and statistical proof.
Collapse
Affiliation(s)
- Joachim Ludwig
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| | | | - Zishu Liu
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| | - Peter F Stadler
- Department of Computer Science, University Leipzig, Härtelstr. 16-18, Leipzig, 04107, Germany
| | - Susann Müller
- Department of Environmental Microbiology, Research Group Flow Cytometry, Helmholtz Centre for Environmental Research, Permoserstraße 15, Leipzig, 04318, Germany
| |
Collapse
|
50
|
Lucchesi S, Nolfi E, Pettini E, Pastore G, Fiorino F, Pozzi G, Medaglini D, Ciabattini A. Computational Analysis of Multiparametric Flow Cytometric Data to Dissect B Cell Subsets in Vaccine Studies. Cytometry A 2019; 97:259-267. [PMID: 31710181 PMCID: PMC7079172 DOI: 10.1002/cyto.a.23922] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Revised: 09/11/2019] [Accepted: 10/07/2019] [Indexed: 01/03/2023]
Abstract
The generation of the B cell response upon vaccination is characterized by the induction of different functional and phenotypic subpopulations and is strongly dependent on the vaccine formulation, including the adjuvant used. Here, we have profiled the different B cell subsets elicited upon vaccination, using machine learning methods for interpreting high‐dimensional flow cytometry data sets. The B cell response elicited by an adjuvanted vaccine formulation, compared to the antigen alone, was characterized using two automated methods based on clustering (FlowSOM) and dimensional reduction (t‐SNE) approaches. The clustering method identified, based on multiple marker expression, different B cell populations, including plasmablasts, plasma cells, germinal center B cells and their subsets, while this profiling was more difficult with t‐SNE analysis. When undefined phenotypes were detected, their characterization could be improved by integrating the t‐SNE spatial visualization of cells with the FlowSOM clusters. The frequency of some cellular subsets, in particular plasma cells, was significantly higher in lymph nodes of mice primed with the adjuvanted formulation compared to antigen alone. Thanks to this automatic data analysis it was possible to identify, in an unbiased way, different B cell populations and also intermediate stages of cell differentiation elicited by immunization, thus providing a signature of B cell recall response that can be hardly obtained with the classical bidimensional gating analysis. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Simone Lucchesi
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Emanuele Nolfi
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Elena Pettini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Gabiria Pastore
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Fabio Fiorino
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Gianni Pozzi
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Donata Medaglini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| | - Annalisa Ciabattini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical BiotechnologiesUniversity of SienaSienaItaly
| |
Collapse
|