1
|
Moeller HV, L'Etoile‐Goga A, Vincenzi L, Norlin A, Barbaglia GS, Runte GC, Kaare‐Rasmussen JT, Johnson MD. Retention of blue-green cryptophyte organelles by Mesodinium rubrum and their effects on photophysiology and growth. J Eukaryot Microbiol 2025; 72:e13066. [PMID: 39584600 PMCID: PMC11822877 DOI: 10.1111/jeu.13066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 10/18/2024] [Accepted: 10/18/2024] [Indexed: 11/26/2024]
Abstract
As chloroplast-stealing or "kleptoplastidic" lineages become more reliant on stolen machinery, they also tend to become more specialized on the prey from which they acquire this machinery. For example, the ciliate Mesodinium rubrum obtains > 95% of its carbon from photosynthesis, and specializes on plastids from the Teleaulax clade of cryptophytes. However, M. rubrum is sometimes observed in nature containing plastids from other cryptophyte species. Here, we report on substantial ingestion of the blue-green cryptophyte Hemiselmis pacifica by M. rubrum, leading to organelle retention and transient increases in M. rubrum's growth rate. However, microscopy data suggest that H. pacifica organelles do not experience the same rearrangement and integration as Teleaulax amphioxeia's. We measured M. rubrum's functional response, quantified the magnitude and duration of growth benefits, and estimated kleptoplastid photosynthetic rates. Our results suggest that a lack of discrimination between H. pacifica and the preferred prey T. amphioxeia (perhaps due to similarities in cryptophyte size and swimming behavior) may result in H. pacifica ingestion Thus, while blue-green cryptophytes may represent a negligible prey source in natural environments, they may help M. rubrum survive when Teleaulax are unavailable. Furthermore, these results represent a useful tool for manipulating M. rubrum's cell biology and photophysiology.
Collapse
Affiliation(s)
- Holly V. Moeller
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Amelie L'Etoile‐Goga
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Lucas Vincenzi
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Andreas Norlin
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
- College of Marine SciencesUniversity of South FloridaSt. PetersburgFloridaUSA
| | - Gina S. Barbaglia
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Gabriel C. Runte
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Jonatan T. Kaare‐Rasmussen
- Department of Ecology, Evolution, and Marine BiologyUniversity of California, Santa BarbaraSanta BarbaraCaliforniaUSA
| | - Matthew D. Johnson
- Biology DepartmentWoods Hole Oceanographic InstitutionWoods HoleMassachusettsUSA
| |
Collapse
|
2
|
Spies NC, Rangel A, English P, Morrison M, O’Fallon B, Ng DP. Machine Learning Methods in Clinical Flow Cytometry. Cancers (Basel) 2025; 17:483. [PMID: 39941850 PMCID: PMC11816335 DOI: 10.3390/cancers17030483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 01/27/2025] [Accepted: 01/28/2025] [Indexed: 02/16/2025] Open
Abstract
This review will explore the integration of machine learning (ML) techniques to enhance the analysis of increasingly complex and voluminous flow cytometry data, as traditional manual methods are insufficient for handling this data. We attempt to provide a comprehensive introduction to ML in flow cytometry, detailing the transition from manual gating to computational methods and emphasizing the importance of data quality. Key ML techniques are discussed, including supervised learning methods like logistic regression, support vector machines, and neural networks, which rely on labeled data to classify disease states. Unsupervised methods, such as k-means clustering, FlowSOM, UMAP, and t-SNE, are highlighted for their ability to identify novel cell populations without predefined labels. We also delve into newer semi-supervised and weakly supervised methods, which leverage partial labeling to improve model performance. Practical aspects of implementing ML in clinical settings are addressed, including regulatory considerations, data preprocessing, model training, validation, and the importance of generalizability, and we underscore the collaborative effort required among pathologists, data scientists, and laboratory professionals to ensure robust model development and deployment. Finally, we show the transformative potential of ML in flow cytometry in uncovering new biological insights through advanced computational techniques.
Collapse
Affiliation(s)
- Nicholas C. Spies
- Department of Pathology, University of Utah, Salt Lake City, UT 84112, USA
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| | - Alexandra Rangel
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| | - Paul English
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| | - Muir Morrison
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| | - Brendan O’Fallon
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| | - David P. Ng
- Department of Pathology, University of Utah, Salt Lake City, UT 84112, USA
- ARUP Laboratories, Division of Applied Artificial Intelligence, Institute for Research and Innovation, Salt Lake City, UT 84108, USAbrendan.o’
| |
Collapse
|
3
|
Liu P, Pan Y, Chang HC, Wang W, Fang Y, Xue X, Zou J, Toothaker JM, Olaloye O, Santiago EG, McCourt B, Mitsialis V, Presicce P, Kallapur SG, Snapper SB, Liu JJ, Tseng GC, Konnikova L, Liu S. Comprehensive evaluation and practical guideline of gating methods for high-dimensional cytometry data: manual gating, unsupervised clustering, and auto-gating. Brief Bioinform 2024; 26:bbae633. [PMID: 39656848 DOI: 10.1093/bib/bbae633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 11/13/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024] Open
Abstract
Cytometry is an advanced technique for simultaneously identifying and quantifying many cell surface and intracellular proteins at a single-cell resolution. Analyzing high-dimensional cytometry data involves identifying and quantifying cell populations based on their marker expressions. This study provided a quantitative review and comparison of various ways to phenotype cellular populations within the cytometry data, including manual gating, unsupervised clustering, and supervised auto-gating. Six datasets from diverse species and sample types were included in the study, and manual gating with two hierarchical layers was used as the truth for evaluation. For manual gating, results from five researchers were compared to illustrate the gating consistency among different raters. For unsupervised clustering, 23 tools were quantitatively compared in terms of accuracy with the truth and computing cost. While no method outperformed all others, several tools, including PAC-MAN, CCAST, FlowSOM, flowClust, and DEPECHE, generally demonstrated strong performance. For supervised auto-gating methods, four algorithms were evaluated, where DeepCyTOF and CyTOF Linear Classifier performed the best. We further provided practical recommendations on prioritizing gating methods based on different application scenarios. This study offers comprehensive insights for biologists to understand diverse gating methods and choose the best-suited ones for their applications.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yuchen Pan
- Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, 1400 Pressler St., Houston, TX 77030, US
| | - Hung-Ching Chang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Wenjia Wang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Yusi Fang
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Xiangning Xue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jian Zou
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
| | - Jessica M Toothaker
- Department of Immunology, University of Pittsburgh, 5051 Centre Avenue, Pittsburgh, PA 15213, US
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Oluwabunmi Olaloye
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | | | - Black McCourt
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
| | - Vanessa Mitsialis
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women's Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Pietro Presicce
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Suhas G Kallapur
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
| | - Scott B Snapper
- Department of Pediatrics, Division of Gastroenterology, Hepatology, and Nutrition, Boston Children's Hospital and Department of Pediatrics, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
- Department of Medicine, Division of Gastroenterology, Hepatology, and Endoscopy, Brigham & Women's Hospital and Department of Medicine, Harvard Medical School, 300 Longwood Ave., Boston, MA 02115, US
| | - Jia-Jun Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
| | - George C Tseng
- Department of Biostatistics, School of Public Health, University of Pittsburgh, 130 De Soto St., Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
| | - Liza Konnikova
- Department of Pediatrics, Yale University, 15 York Street New Haven, CT 06510, US
- Division of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California Los Angeles, 757 Westwood Plaza, Los Angeles, CA 90095, US
- Department of Obstetrics, Gynecology and Reproductive Sciences, Yale University, 333 Cedar Street, New Haven, CT 06510, US
- Department of Immunobiology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Human and Translational Immunology, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Program in Translational Biomedicine, Yale University, 300 Cedar Street, New Haven, CT 06520, US
- Center for Systems and Engineering Immunology, Yale University, 100 College St., New Haven, CT 06510, US
| | - Silvia Liu
- Drug Discovery Institute, School of Medicine, University of Pittsburgh, 700 Technology Dr, Pittsburgh, PA 15219, US
- Pittsburgh Liver Research Center, School of Medicine, University of Pittsburgh, 200 Lothrop Street, Pittsburgh, PA 15261, US
- Computational and Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, PA 15213, US
- Department of Pharmacology and Chemical Biology, School of Medicine, University of Pittsburgh, 200 Lothrop St., Pittsburgh, PA 15261, US
- Hillman Cancer Center, University of Pittsburgh, 5150 Centre Ave., Pittsburgh, PA 15232, US
| |
Collapse
|
4
|
Eslami M, Moseley RC, Eramian H, Bryce D, Haase SB. AutoGater: a weakly supervised neural network model to gate cells in flow cytometric analyses. Sci Rep 2024; 14:23581. [PMID: 39384769 PMCID: PMC11479614 DOI: 10.1038/s41598-024-66936-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 07/05/2024] [Indexed: 10/11/2024] Open
Abstract
Flow cytometry is a useful and efficient method for the rapid characterization of a cell population based on the optical and fluorescence properties of individual cells. Ideally, the cell population would consist of only healthy viable cells as dead cells can confound the analysis. Thus, separating out healthy cells from dying and dead cells, and any potential debris, is an important first step in analysis of flow cytometry data. While gating of debris can be conducted using measured optical properties, identifying dead and dying cells often requires utilizing fluorescent stains (e.g. Sytox, a nucleic acid stain that stains cells with compromised cell membranes) to identify cells that should be excluded from downstream analyses. These stains prolong the experimental preparation process and use a flow cytometer's fluorescence channels that could otherwise be used to measure additional fluorescent markers within the cells (e.g. reporter proteins). Here we outline a stain-free method for identifying viable cells for downstream processing by gating cells that are dying or dead. AutoGater is a weakly supervised deep learning model that can separate healthy populations from unhealthy and dead populations using only light-scatter channels. In addition, AutoGater harmonizes different measurements of dead cells such as Sytox and CFUs.
Collapse
Affiliation(s)
| | - Robert C Moseley
- Department of Biology, Duke University, Durham, NC, USA
- Cymantix, LLC, Chapel Hill, NC, USA
| | | | - Daniel Bryce
- Smart Information Flow Technologies, LLC, St. Paul, USA
| | - Steven B Haase
- Departments of Biology and Medicine, Duke University, Durham, NC, USA.
| |
Collapse
|
5
|
Dinalankara W, Ng DP, Marchionni L, Simonson PD. Comparison of three machine learning algorithms for classification of B-cell neoplasms using clinical flow cytometry data. CYTOMETRY. PART B, CLINICAL CYTOMETRY 2024; 106:282-293. [PMID: 38721890 DOI: 10.1002/cyto.b.22177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/22/2024] [Accepted: 04/12/2024] [Indexed: 05/18/2024]
Abstract
Multiparameter flow cytometry data is visually inspected by expert personnel as part of standard clinical disease diagnosis practice. This is a demanding and costly process, and recent research has demonstrated that it is possible to utilize artificial intelligence (AI) algorithms to assist in the interpretive process. Here we report our examination of three previously published machine learning methods for classification of flow cytometry data and apply these to a B-cell neoplasm dataset to obtain predicted disease subtypes. Each of the examined methods classifies samples according to specific disease categories using ungated flow cytometry data. We compare and contrast the three algorithms with respect to their architectures, and we report the multiclass classification accuracies and relative required computation times. Despite different architectures, two of the methods, flowCat and EnsembleCNN, had similarly good accuracies with relatively fast computational times. We note a speed advantage for EnsembleCNN, particularly in the case of addition of training data and retraining of the classifier.
Collapse
Affiliation(s)
- Wikum Dinalankara
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| | - David P Ng
- Department of Pathology, University of Utah, Salt Lake City, Utah, USA
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| | - Paul D Simonson
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
6
|
Chen Y, De Spiegelaere W, Trypsteen W, Gleerup D, Vandesompele J, Lievens A, Vynck M, Thas O. Benchmarking digital PCR partition classification methods with empirical and simulated duplex data. Brief Bioinform 2024; 25:bbae120. [PMID: 38555473 PMCID: PMC10981767 DOI: 10.1093/bib/bbae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/09/2024] [Accepted: 02/26/2024] [Indexed: 04/02/2024] Open
Abstract
Digital PCR (dPCR) is a highly accurate technique for the quantification of target nucleic acid(s). It has shown great potential in clinical applications, like tumor liquid biopsy and validation of biomarkers. Accurate classification of partitions based on end-point fluorescence intensities is crucial to avoid biased estimators of the concentration of the target molecules. We have evaluated many clustering methods, from general-purpose methods to specific methods for dPCR and flowcytometry, on both simulated and real-life data. Clustering method performance was evaluated by simulating various scenarios. Based on our extensive comparison of clustering methods, we describe the limits of these methods, and formulate guidelines for choosing an appropriate method. In addition, we have developed a novel method for simulating realistic dPCR data. The method is based on a mixture distribution of a Poisson point process and a skew-$t$ distribution, which enables the generation of irregularities of cluster shapes and randomness of partitions between clusters ('rain') as commonly observed in dPCR data. Users can fine-tune the model parameters and generate labeled datasets, using their own data as a template. Besides, the database of experimental dPCR data augmented with the labeled simulated data can serve as training and testing data for new clustering methods. The simulation method is available as an R Shiny app.
Collapse
Affiliation(s)
- Yao Chen
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Ward De Spiegelaere
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Wim Trypsteen
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- Department of Internal Medicine, Ghent University and University Hospital, Belgium
| | - David Gleerup
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Jo Vandesompele
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- Department of Biomolecular Medicine, Ghent University and University Hospital, Belgium
- Cancer Research Institute Ghent (CRIG), Ghent University and University Hospital, Belgium
- Pxlence, Belgium
| | - Antoon Lievens
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Matthijs Vynck
- Department of Morphology, Imaging, Orthopedics, Rehabilitation and Nutrition, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
| | - Olivier Thas
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Belgium
- Ghent University Digital PCR Consortium, Ghent University, Belgium
- I-BioStat, Data Science Institute, Hasselt University, Belgium
- National Institute for Applied Statistics Research Australia (NIASRA), University of Wollongong, Australia
| |
Collapse
|
7
|
Dutta S, Box AC, Li Y, Sardiu ME. Identifying dynamical persistent biomarker structures for rare events using modern integrative machine learning approach. Proteomics 2023; 23:e2200290. [PMID: 36852539 PMCID: PMC11503472 DOI: 10.1002/pmic.202200290] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 01/30/2023] [Accepted: 02/17/2023] [Indexed: 03/01/2023]
Abstract
The evolution of omics and computational competency has accelerated discoveries of the underlying biological processes in an unprecedented way. High throughput methodologies, such as flow cytometry, can reveal deeper insights into cell processes, thereby allowing opportunities for scientific discoveries related to health and diseases. However, working with cytometry data often imposes complex computational challenges due to high-dimensionality, large size, and nonlinearity of the data structure. In addition, cytometry data frequently exhibit diverse patterns across biomarkers and suffer from substantial class imbalances which can further complicate the problem. The existing methods of cytometry data analysis either predict cell population or perform feature selection. Through this study, we propose a "wisdom of the crowd" approach to simultaneously predict rare cell populations and perform feature selection by integrating a pool of modern machine learning (ML) algorithms. Given that our approach integrates superior performing ML models across different normalization techniques based on entropy and rank, our method can detect diverse patterns existing across the model features. Furthermore, the method identifies a dynamic biomarker structure that divides the features into persistently selected, unselected, and fluctuating assemblies indicating the role of each biomarker in rare cell prediction, which can subsequently aid in studies of disease progression.
Collapse
Affiliation(s)
- Sreejata Dutta
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Andrew C. Box
- Stowers Institute for Medical Research, Kansas City, Missouri, USA
| | - Yanming Li
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, Kansas, USA
| | - Mihaela E. Sardiu
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, Kansas, USA
- University of Kansas Cancer Center, Kansas City, Kansas, USA
- Kansas Institute for Precision Medicine, University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
8
|
Robles EE, Jin Y, Smyth P, Scheuermann RH, Bui JD, Wang HY, Oak J, Qian Y. A cell-level discriminative neural network model for diagnosis of blood cancers. Bioinformatics 2023; 39:btad585. [PMID: 37756695 PMCID: PMC10563151 DOI: 10.1093/bioinformatics/btad585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 09/12/2023] [Accepted: 09/22/2023] [Indexed: 09/29/2023] Open
Abstract
MOTIVATION Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. RESULTS We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes sample-level training data and predicts the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. AVAILABILITY AND IMPLEMENTATION The source code of CSNN and datasets used in the experiments are publicly available on GitHub (http://github.com/erobl/csnn). Raw FCS files can be downloaded from FlowRepository (ID: FR-FCM-Z6YK).
Collapse
Affiliation(s)
- Edgar E Robles
- Department of Computer Science, University of California, Irvine, CA 92697, United States
| | - Ye Jin
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, United States
| | - Padhraic Smyth
- Department of Computer Science, University of California, Irvine, CA 92697, United States
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, United States
- Department of Pathology, University of California, San Diego, CA 92093, United States
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA 92037, United States
| | - Jack D Bui
- Department of Pathology, University of California, San Diego, CA 92093, United States
| | - Huan-You Wang
- Department of Pathology, University of California, San Diego, CA 92093, United States
| | - Jean Oak
- Department of Pathology, Stanford University, Stanford, CA 94305, United States
| | - Yu Qian
- Department of Informatics, J. Craig Venter Institute, La Jolla, CA 92037, United States
| |
Collapse
|
9
|
Wallace ML, Tallarida N, Schubert WW, Lambert J. Life Detection on Icy Moons Using Flow Cytometry and Exogenous Fluorescent Stains. ASTROBIOLOGY 2023; 23:1071-1082. [PMID: 37672625 DOI: 10.1089/ast.2023.0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Flow cytometry is a potential technology for in situ life detection on icy moons (such as Enceladus and Europa) and on the polar ice caps of Mars. We developed a method for using flow cytometry to positively identify four classes of biomarkers using exogenous fluorescent stains: nucleic acids, proteins, carbohydrates, and lipids. We demonstrated the effectiveness of exogenous stains with six known organisms and known abiotic material and showed that the cytometer is easily able to distinguish between the known organisms and the known abiotic material using the exogenous stains. To simulate a life-detection experiment on an icy world lander, we used six natural samples with unknown biotic and abiotic content. We showed that flow cytometry can identify all four biomarkers using the exogenous stains and can separate the biotic material from the known abiotic material on scatter plots. Exogenous staining techniques would likely be used in conjunction with intrinsic fluorescence, clustering, and sorting for a more complete and capable life-detection instrument on an icy moon lander.
Collapse
Affiliation(s)
- Matthew L Wallace
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
| | - Nicholas Tallarida
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
| | - Wayne W Schubert
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
| | - James Lambert
- Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
10
|
Puccio S, Grillo G, Alvisi G, Scirgolea C, Galletti G, Mazza EMC, Consiglio A, De Simone G, Licciulli F, Lugli E. CRUSTY: a versatile web platform for the rapid analysis and visualization of high-dimensional flow cytometry data. Nat Commun 2023; 14:5102. [PMID: 37666818 PMCID: PMC10477295 DOI: 10.1038/s41467-023-40790-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Accepted: 08/10/2023] [Indexed: 09/06/2023] Open
Abstract
Flow cytometry (FCM) can investigate dozens of parameters from millions of cells and hundreds of specimens in a short time and at a reasonable cost, but the amount of data that is generated is considerable. Computational approaches are useful to identify novel subpopulations and molecular biomarkers, but generally require deep expertize in bioinformatics and the use of different platforms. To overcome these limitations, we introduce CRUSTY, an interactive, user-friendly webtool incorporating the most popular algorithms for FCM data analysis, and capable of visualizing graphical and tabular results and automatically generating publication-quality figures within minutes. CRUSTY also hosts an interactive interface for the exploration of results in real time. Thus, CRUSTY enables a large number of users to mine complex datasets and reduce the time required for data exploration and interpretation. CRUSTY is accessible at https://crusty.humanitas.it/ .
Collapse
Affiliation(s)
- Simone Puccio
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
- Institute of Genetic and Biomedical Research, UoS Milan, National Research Council, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| | - Giorgio Grillo
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Giorgia Alvisi
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Caterina Scirgolea
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Giovanni Galletti
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
- School of Biological Sciences, Department of Molecular Biology, University of California San Diego, San Diego, CA, USA
| | - Emilia Maria Cristina Mazza
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Arianna Consiglio
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Gabriele De Simone
- Flow Cytometry Core, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy
| | - Flavio Licciulli
- Institute for Biomedical Technologies, National Research Council, via Amendola 122/D, 70126, Bari, Italy
| | - Enrico Lugli
- Laboratory of Translational Immunology, IRCCS Humanitas Research Hospital, via Manzoni 56, 20089, Rozzano, Milan, Italy.
| |
Collapse
|
11
|
Zhang J, Li J, Lin L. Statistical and machine learning methods for immunoprofiling based on single-cell data. Hum Vaccin Immunother 2023:2234792. [PMID: 37485833 PMCID: PMC10373621 DOI: 10.1080/21645515.2023.2234792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 06/30/2023] [Accepted: 07/04/2023] [Indexed: 07/25/2023] Open
Abstract
Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships and potentially developing personalized intervention strategies. Single-cell data have emerged as a promising source for identifying immune response biomarkers. In this review, we discuss the current state-of-the-art methods for immunoprofiling, including those for reducing the dimensionality of high-dimensional single-cell data and methods for clustering, classification, and prediction. We also draw attention to recent developments in data integration.
Collapse
Affiliation(s)
- Jingxuan Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Jia Li
- Department of Statistics, Pennsylvania State University, University Park, PA, USA
| | - Lin Lin
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| |
Collapse
|
12
|
Hoang MD, Riessner S, Oropeza Vargas JE, von den Eichen N, Heins AL. Influence of Varying Pre-Culture Conditions on the Level of Population Heterogeneity in Batch Cultures with an Escherichia coli Triple Reporter Strain. Microorganisms 2023; 11:1763. [PMID: 37512936 PMCID: PMC10384452 DOI: 10.3390/microorganisms11071763] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/26/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
When targeting robust, high-yielding bioprocesses, phenomena such as population heterogeneity have to be considered. Therefore, the influence of the conditions which the cells experience prior to the main culture should also be evaluated. Here, the influence of a pre-culture medium (complex vs. minimal medium), optical density for inoculation of the main culture (0.005, 0.02 and 0.0125) and harvest time points of the pre-culture in exponential growth phase (early, mid and late) on the level of population heterogeneity in batch cultures of the Escherichia coli triple reporter strain G7BL21(DE3) in stirred-tank bioreactors was studied. This strain allows monitoring the growth (rrnB-EmGFP), general stress response (rpoS-mStrawberry) and oxygen limitation (nar-TagRFP657) of single cells through the expression of fluorescent proteins. Data from batch cultivations with varying pre-culture conditions were analysed with principal component analysis. According to fluorescence data, the pre-culture medium had the largest impact on population heterogeneities during the bioprocess. While a minimal medium as a pre-culture medium elevated the differences in cellular growth behaviour in the subsequent batch process, a complex medium increased the general stress response and led to a higher population heterogeneity. The latter was promoted by an early harvest of the cells with low inoculation density. Seemingly, nar-operon expression acted independently of the pre-culture conditions.
Collapse
Affiliation(s)
- Manh Dat Hoang
- Chair of Biochemical Engineering, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany
| | - Sophi Riessner
- Chair of Biochemical Engineering, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany
| | - Jose Enrique Oropeza Vargas
- Chair of Biochemical Engineering, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany
| | - Nikolas von den Eichen
- Chair of Biochemical Engineering, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany
| | - Anna-Lena Heins
- Chair of Biochemical Engineering, TUM School of Engineering and Design, Technical University of Munich, 85748 Garching, Germany
| |
Collapse
|
13
|
Baldzhieva A, Burnusuzov HA, Murdjeva MA, Dimcheva TD, Taskov HB. A concise review of flow cytometric methods for minimal residual disease assessment in childhood B-cell precursor acute lymphoblastic leukemia. Folia Med (Plovdiv) 2023; 65:355-361. [PMID: 38351809 DOI: 10.3897/folmed.65.e96440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/04/2023] [Indexed: 02/16/2024] Open
Abstract
Minimal residual disease refers to a leukemia cell population that is resistant to chemotherapy or radiotherapy and leads to disease relapse. The assessment of MRD is crucial for making an accurate prognosis of the disease and for the choice of optimal treatment strategy. Here, we review the advantages and disadvantages of the available genetic and phenotypic methods and focus on the multiparametric flow cytometry as a promising method with greater sensitivity, speed, and standardization options. In addition, we discuss how the application of automated data analysis outweighs the use of complex combinations of windows and gates in classical analysis, thus eliminating subjective evaluation.
Collapse
|
14
|
Fuda F, Chen M, Chen W, Cox A. Artificial intelligence in clinical multiparameter flow cytometry and mass cytometry-key tools and progress. Semin Diagn Pathol 2023; 40:120-128. [PMID: 36894355 DOI: 10.1053/j.semdp.2023.02.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/07/2023]
Abstract
There are many research studies and emerging tools using artificial intelligence (AI) and machine learning to augment flow and mass cytometry workflows. Emerging AI tools can quickly identify common cell populations with continuous improvement of accuracy, uncover patterns in high-dimensional cytometric data that are undetectable by human analysis, facilitate the discovery of cell subpopulations, perform semi-automated immune cell profiling, and demonstrate potential to automate aspects of clinical multiparameter flow cytometric (MFC) diagnostic workflow. Utilizing AI in the analysis of cytometry samples can reduce subjective variability and assist in breakthroughs in understanding diseases. Here we review the diverse types of AI that are being applied to clinical cytometry data and how AI is driving advances in data analysis to improve diagnostic sensitivity and accuracy. We review supervised and unsupervised clustering algorithms for cell population identification, various dimensionality reduction techniques, and their utilities in visualization and machine learning pipelines, and supervised learning approaches for classifying entire cytometry samples.Understanding the AI landscape will enable pathologists to better utilize open source and commercially available tools, plan exploratory research projects to characterize diseases, and work with machine learning and data scientists to implement clinical data analysis pipelines.
Collapse
Affiliation(s)
- Franklin Fuda
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Mingyi Chen
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Weina Chen
- Department of Pathology and Laboratory Medicine, University of Texas, Southwestern Medical Center, Dallas, Texas, USA
| | - Andrew Cox
- Lyda Hill Department of Bioinformatics, University of Texas, Southwestern Medical Center, Dallas, Texas, USA; Department of Cell and Molecular Biology, University of Texas, Southwestern Medical Center, Dallas, Texas, USA.
| |
Collapse
|
15
|
Robles EE, Jin Y, Smyth P, Scheuermann RH, Bui JD, Wang HY, Oak J, Qian Y. A cell-level discriminative neural network model for diagnosis of blood cancers. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.07.23285606. [PMID: 36798344 PMCID: PMC9934808 DOI: 10.1101/2023.02.07.23285606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/12/2023]
Abstract
Motivation Precise identification of cancer cells in patient samples is essential for accurate diagnosis and clinical monitoring but has been a significant challenge in machine learning approaches for cancer precision medicine. In most scenarios, training data are only available with disease annotation at the subject or sample level. Traditional approaches separate the classification process into multiple steps that are optimized independently. Recent methods either focus on predicting sample-level diagnosis without identifying individual pathologic cells or are less effective for identifying heterogeneous cancer cell phenotypes. Results We developed a generalized end-to-end differentiable model, the Cell Scoring Neural Network (CSNN), which takes the available sample-level training data and predicts both the diagnosis of the testing samples and the identity of the diagnostic cells in the sample, simultaneously. The cell-level density differences between samples are linked to the sample diagnosis, which allows the probabilities of individual cells being diagnostic to be calculated using backpropagation. We applied CSNN to two independent clinical flow cytometry datasets for leukemia diagnosis. In both qualitative and quantitative assessments, CSNN outperformed preexisting neural network modeling approaches for both cancer diagnosis and cell-level classification. Post hoc decision trees and 2D dot plots were generated for interpretation of the identified cancer cells, showing that the identified cell phenotypes match the cancer endotypes observed clinically in patient cohorts. Independent data clustering analysis confirmed the identified cancer cell populations. Availability The source code of CSNN and datasets used in the experiments are publicly available on GitHub and FlowRepository. Contact Edgar E. Robles: roblesee@uci.edu and Yu Qian: mqian@jcvi.org. Supplementary information Supplementary data are available on GitHub and at Bioinformatics online.
Collapse
|
16
|
Rao NS, Ermann Lundberg L, Tomasson J, Tullberg C, Brink DP, Palmkron SB, van Niel EWJ, Håkansson S, Carlquist M. Non-inhibitory levels of oxygen during cultivation increase freeze-drying stress tolerance in Limosilactobacillus reuteri DSM 17938. Front Microbiol 2023; 14:1152389. [PMID: 37125176 PMCID: PMC10140318 DOI: 10.3389/fmicb.2023.1152389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/22/2023] [Indexed: 05/02/2023] Open
Abstract
The physiological effects of oxygen on Limosilactobacillus reuteri DSM 17938 during cultivation and the ensuing properties of the freeze-dried probiotic product was investigated. On-line flow cytometry and k-means clustering gating was used to follow growth and viability in real time during cultivation. The bacterium tolerated aeration at 500 mL/min, with a growth rate of 0.74 ± 0.13 h-1 which demonstrated that low levels of oxygen did not influence the growth kinetics of the bacterium. Modulation of the redox metabolism was, however, seen already at non-inhibitory oxygen levels by 1.5-fold higher production of acetate and 1.5-fold lower ethanol production. A significantly higher survival rate in the freeze-dried product was observed for cells cultivated in presence of oxygen compared to absence of oxygen (61.8% ± 2.4% vs. 11.5% ± 4.3%), coinciding with a higher degree of unsaturated fatty acids (UFA:SFA ratio of 10 for air sparged vs. 3.59 for N2 sparged conditions.). Oxygen also resulted in improved bile tolerance and boosted 5'nucleotidase activity (370 U/L vs. 240 U/L in N2 sparged conditions) but lower tolerance to acidic conditions compared bacteria grown under complete anaerobic conditions which survived up to 90 min of exposure at pH 2. Overall, our results indicate the controlled supply of oxygen during production may be used as means for probiotic activity optimization of L. reuteri DSM 17938.
Collapse
Affiliation(s)
- Nikhil Seshagiri Rao
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- *Correspondence: Nikhil Seshagiri Rao,
| | - Ludwig Ermann Lundberg
- The Department of Molecular Sciences, Uppsala BioCenter, Swedish University of Agricultural Sciences, Uppsala, Sweden
- BioGaia, SE-103 64, Stockholm, Sweden
| | | | - Cecilia Tullberg
- Division of Biotechnology, Department of Chemistry, Lund University, Lund, Sweden
| | - Daniel P. Brink
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | - Shuai Bai Palmkron
- Department of Food Technology, Engineering and Nutrition, Department of Chemistry, Lund University, Lund, Sweden
| | - Ed W. J. van Niel
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | - Sebastian Håkansson
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- BioGaia, SE-241 38, Eslöv, Sweden
| | - Magnus Carlquist
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
- Magnus Carlquist,
| |
Collapse
|
17
|
Verhoeff J, Abeln S, Garcia-Vallejo JJ. INFLECT: an R-package for cytometry cluster evaluation using marker modality. BMC Bioinformatics 2022; 23:487. [DOI: 10.1186/s12859-022-05018-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 10/28/2022] [Indexed: 11/17/2022] Open
Abstract
Abstract
Background
Current methods of high-dimensional unsupervised clustering of mass cytometry data lack means to monitor and evaluate clustering results. Whether unsupervised clustering is correct is typically evaluated by agreement with dimensionality reduction techniques or based on benchmarking with manually classified cells. The ambiguity and lack of reproducibility of sequential gating has been replaced with ambiguity in interpretation of clustering results. On the other hand, spurious overclustering of data leads to loss of statistical power. We have developed INFLECT, an R-package designed to give insight in clustering results and provide an optimal number of clusters. In our approach, a mass cytometry dataset is overclustered intentionally to ensure the smallest phenotypically different subsets are captured using FlowSOM. A range of metacluster number endpoints are generated and evaluated using marker interquartile range and distribution unimodality checks. The fraction of marker distributions that pass these checks is taken as a measure of clustering success. The fraction of unimodal distributions within metaclusters is plotted against the number of generated metaclusters and reaches a plateau of diminishing returns. The inflection point at which this occurs gives an optimal point of capturing cellular heterogeneity versus statistical power.
Results
We applied INFLECT to four publically available mass cytometry datasets of different size and number of markers. The unimodality score consistently reached a plateau, with an inflection point dependent on dataset size and number of dimensions. We tested both ConsenusClusterPlus metaclustering and hierarchical clustering. While hierarchical clustering is less computationally expensive and thus faster, it achieved similar results to ConsensusClusterPlus. The four datasets consisted of labeled data and we compared INFLECT metaclustering to published results. INFLECT identified a higher optimal number of metaclusters for all datasets. We illustrated the underlying heterogeneity within labels, showing that these labels encompass distinct types of cells.
Conclusion
INFLECT addresses a knowledge gap in high-dimensional cytometry analysis, namely assessing clustering results. This is done through monitoring marker distributions for interquartile range and unimodality across a range of metacluster numbers. The inflection point is the optimal trade-off between cellular heterogeneity and statistical power, applied in this work for FlowSOM clustering on mass cytometry datasets.
Collapse
|
18
|
Clustering by measuring local direction centrality for data with heterogeneous density and weak connectivity. Nat Commun 2022; 13:5455. [PMID: 36114209 PMCID: PMC9481560 DOI: 10.1038/s41467-022-33136-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. It is widely used in computer science, bioscience, geoscience, and economics. Although the state-of-the-art partition-based and connectivity-based clustering methods have been developed, weak connectivity and heterogeneous density in data impede their effectiveness. In this work, we propose a boundary-seeking Clustering algorithm using the local Direction Centrality (CDC). It adopts a density-independent metric based on the distribution of K-nearest neighbors (KNNs) to distinguish between internal and boundary points. The boundary points generate enclosed cages to bind the connections of internal points, thereby preventing cross-cluster connections and separating weakly-connected clusters. We demonstrate the validity of CDC by detecting complex structured clusters in challenging synthetic datasets, identifying cell types from single-cell RNA sequencing (scRNA-seq) and mass cytometry (CyTOF) data, recognizing speakers on voice corpuses, and testifying on various types of real-world benchmarks. Clustering is a powerful machine learning method for discovering similar patterns according to the proximity of elements in feature space. Here the authors propose a local direction centrality clustering algorithm that copes with heterogeneous density and weak connectivity issues.
Collapse
|
19
|
Heins A, Hoang MD, Weuster‐Botz D. Advances in automated real-time flow cytometry for monitoring of bioreactor processes. Eng Life Sci 2022; 22:260-278. [PMID: 35382548 PMCID: PMC8961054 DOI: 10.1002/elsc.202100082] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 10/22/2021] [Accepted: 10/27/2021] [Indexed: 12/18/2022] Open
Abstract
Flow cytometry and its technological possibilities have greatly advanced in the past decade as analysis tool for single cell properties and population distributions of different cell types in bioreactors. Along the way, some solutions for automated real-time flow cytometry (ART-FCM) were developed for monitoring of bioreactor processes without operator interference over extended periods with variable sampling frequency. However, there is still great potential for ART-FCM to evolve and possibly become a standard application in bioprocess monitoring and process control. This review first addresses different components of an ART-FCM, including the sampling device, the sample-processing unit, the unit for sample delivery to the flow cytometer and the settings for measurement of pre-processed samples. Also, available algorithms are presented for automated data analysis of multi-parameter fluorescence datasets derived from ART-FCM experiments. Furthermore, challenges are discussed for integration of fluorescence-activated cell sorting into an ART-FCM setup for isolation and separation of interesting subpopulations that can be further characterized by for instance omics-methods. As the application of ART-FCM is especially of interest for bioreactor process monitoring, including investigation of population heterogeneity and automated process control, a summary of already existing setups for these purposes is given. Additionally, the general future potential of ART-FCM is addressed.
Collapse
Affiliation(s)
- Anna‐Lena Heins
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| | - Manh Dat Hoang
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| | - Dirk Weuster‐Botz
- Institute of Biochemical EngineeringTechnical University of MunichGarchingGermany
| |
Collapse
|
20
|
Tinnevelt GH, Wouters K, Postma GJ, Folcarelli R, Jansen JJ. High-throughput single cell data analysis - A tutorial. Anal Chim Acta 2021; 1185:338872. [PMID: 34711307 DOI: 10.1016/j.aca.2021.338872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Revised: 06/28/2021] [Accepted: 07/21/2021] [Indexed: 11/30/2022]
Abstract
White blood cells protect the body against disease but may also cause chronic inflammation, auto-immune diseases or leukemia. There are many different white blood cell types whose identity and function can be studied by measuring their protein expression. Therefore, high-throughput analytical instruments were developed to measure multiple proteins on millions of single cells. The information-rich biochemistry information may only be fully extracted using multivariate statistics. Here we show an overview of the most essential steps for multivariate data analysis of single cell data. We used white blood cells (immunology) as a case study, but a similar approach may be used in environment or biotech research. The first step is analyzing the study design and subsequently formulating a research question. The three main designs are immunophenotyping (finding different cell types), cell activation and rare cell discovery. When preparing the data it is essential to consider the design and focus on the cell type of interest by removing all unwanted events. After pre-processing, the ten-thousands to millions of single cells per sample need to be converted into a cellular distribution. For immunophenotyping a clustering method such as Self-Organizing Maps is useful and for cell activation a model that describes the covariance such as Principal Component Analysis is useful. In rare cell discovery it is useful to first model all common cells and remove them to find the rare cells. Finally discriminant analysis based on the cellular distribution may highlight which cell (sub)types are different between groups.
Collapse
Affiliation(s)
- Gerjen H Tinnevelt
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands.
| | - Kristiaan Wouters
- Department of Internal Medicine, Laboratory of Metabolism and Vascular Medicine, P.O. Box 616 (UNS50/14), 6200, MD, Maastricht, the Netherlands
| | - Geert J Postma
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| | - Rita Folcarelli
- Corbion, Arkelsedijk 46, 4206, AC, Gorinchem, the Netherlands
| | - Jeroen J Jansen
- Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500, GL, Nijmegen, the Netherlands
| |
Collapse
|
21
|
Abstract
Cell cycle involves a series of changes that lead to cell growth and division. Cell cycle analysis is crucial to understand cellular responses to changing environmental conditions. Since its inception, flow cytometry has been particularly useful for cell cycle analysis at single cell level due to its speed and precision. Previously, flow cytometric cell cycle analysis relied solely on the measurement of cellular DNA content. Later, methods were developed for multiparametric analysis. This review explains the journey of flow cytometry to understand different molecular and cellular events underlying cell cycle using various protocols. Recent advances in the field that overcome the shortcomings of traditional flow cytometry and expand its scope for cell cycle studies are also discussed.
Collapse
|
22
|
Ralph AP, Webb R, Moreland NJ, McGregor R, Bosco A, Broadhurst D, Lassmann T, Barnett TC, Benothman R, Yan J, Remenyi B, Bennett J, Wilson N, Mayo M, Pearson G, Kollmann T, Carapetis JR. Searching for a technology-driven acute rheumatic fever test: the START study protocol. BMJ Open 2021; 11:e053720. [PMID: 34526345 PMCID: PMC8444258 DOI: 10.1136/bmjopen-2021-053720] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
INTRODUCTION The absence of a diagnostic test for acute rheumatic fever (ARF) is a major impediment in managing this serious childhood condition. ARF is an autoimmune condition triggered by infection with group A Streptococcus. It is the precursor to rheumatic heart disease (RHD), a leading cause of health inequity and premature mortality for Indigenous peoples of Australia, New Zealand and internationally. METHODS AND ANALYSIS: 'Searching for a Technology-Driven Acute Rheumatic Fever Test' (START) is a biomarker discovery study that aims to detect and test a biomarker signature that distinguishes ARF cases from non-ARF, and use systems biology and serology to better understand ARF pathogenesis. Eligible participants with ARF diagnosed by an expert clinical panel according to the 2015 Revised Jones Criteria, aged 5-30 years, will be recruited from three hospitals in Australia and New Zealand. Age, sex and ethnicity-matched individuals who are healthy or have non-ARF acute diagnoses or RHD, will be recruited as controls. In the discovery cohort, blood samples collected at baseline, and during convalescence in a subset, will be interrogated by comprehensive profiling to generate possible diagnostic biomarker signatures. A biomarker validation cohort will subsequently be used to test promising combinations of biomarkers. By defining the first biomarker signatures able to discriminate between ARF and other clinical conditions, the START study has the potential to transform the approach to ARF diagnosis and RHD prevention. ETHICS AND DISSEMINATION The study has approval from the Northern Territory Department of Health and Menzies School of Health Research ethics committee and the New Zealand Health and Disability Ethics Committee. It will be conducted according to ethical standards for research involving Indigenous Australians and New Zealand Māori and Pacific Peoples. Indigenous investigators and governance groups will provide oversight of study processes and advise on cultural matters.
Collapse
Affiliation(s)
- Anna P Ralph
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Rachel Webb
- KidzFirst Hospital, Counties Manukau District Health Board, Auckland, New Zealand
- Starship Children's Hospital, Auckland, New Zealand
- Department of Paediatrics; Child and Youth Health, University of Auckland, Auckland, New Zealand
| | - Nicole J Moreland
- School of Medical Sciences and Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Reuben McGregor
- School of Medical Sciences and Maurice Wilkins Centre, The University of Auckland, Auckland, New Zealand
| | - Anthony Bosco
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - David Broadhurst
- Centre for Integrative Metabolomics and Computational Biology, Edith Cowan University, Perth, Western Australia, Australia
| | - Timo Lassmann
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Timothy C Barnett
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Rym Benothman
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Jennifer Yan
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Bo Remenyi
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
- Royal Darwin Hospital, Darwin, Northern Territory, Australia
| | - Julie Bennett
- Department of Public Health, University of Otago, Wellington, New Zealand
| | - Nigel Wilson
- Starship Children's Hospital, Auckland, New Zealand
| | - Mark Mayo
- Global and Tropical Health, Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia
| | - Glenn Pearson
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Tobias Kollmann
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
| | - Jonathan R Carapetis
- Wesfarmers Centre for Vaccines and Infectious Diseases, Telethon Kids Institute, Perth, Western Australia, Australia
- Department of Infectious Diseases, Perth Children's Hospital, Perth, Western Australia, Australia
- School of Medicine, University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
23
|
Reisman BJ, Barone SM, Bachmann BO, Irish JM. DebarcodeR increases fluorescent cell barcoding capacity and accuracy. Cytometry A 2021; 99:946-953. [PMID: 33960644 PMCID: PMC8410645 DOI: 10.1002/cyto.a.24363] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 03/09/2021] [Accepted: 04/20/2021] [Indexed: 12/25/2022]
Abstract
Fluorescent cell barcoding (FCB) enables efficient collection of tens to hundreds of flow cytometry samples by covalently marking cells with varying concentration of spectrally distinct dyes. A key consideration in FCB is to balance the density of dye barcodes, the complexity of cells in the sample, and the desired accuracy of the debarcoding. Unfortunately, barcoding bench and computational methods have not benefited from the high dimensional revolution in cytometry due to a lack of automated computational tools that effectively balance these common cytometry needs. DebarcodeR addresses these unmet needs by providing a framework for computational debarcoding augmented by improvements to experimental methods. Adaptive regression modeling accounted for differential dye uptake between different cell types and Gaussian mixture modeling provided a robust method to probabilistically assign cells to samples. Assignment tolerance parameters are available to allow users to balance high cell recovery with accurate assignments. Improvements to experimental methods include: (1) inclusion of an "external standard" control where a pool of all cells was stained a single level of each barcoding dyes and (2) an "internal standard" where each cell is stained with a single level of a separate dye. DebarcodeR significantly improved speed, accuracy, and reproducibility of FCB while avoiding selective loss of unusual cell subsets when debarcoding microtiter plates of cell lines and heterogenous mixtures of primary cells. DebarcodeR is available on Github as an R package that works with flowCore and Cytoverse packages at github.com/cytolab/DebarcodeR.
Collapse
Affiliation(s)
| | - Sierra M. Barone
- Department of Cell & Developmental Biology, Vanderbilt University, Nashville, TN, USA
- Department of Pathology, Microbiology & Immunology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Jonathan M. Irish
- Department of Cell & Developmental Biology, Vanderbilt University, Nashville, TN, USA
- Department of Pathology, Microbiology & Immunology, Vanderbilt University, Nashville, TN, USA
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
24
|
Wong N, Kim D, Robinson Z, Huang C, Conboy IM. K-means quantization for a web-based open-source flow cytometry analysis platform. Sci Rep 2021; 11:6735. [PMID: 33762594 PMCID: PMC7991430 DOI: 10.1038/s41598-021-86015-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 03/03/2021] [Indexed: 11/20/2022] Open
Abstract
Flow cytometry (FCM) is an analytic technique that is capable of detecting and recording the emission of fluorescence and light scattering of cells or particles (that are collectively called “events”) in a population1. A typical FCM experiment can produce a large array of data making the analysis computationally intensive2. Current FCM data analysis platforms (FlowJo3, etc.), while very useful, do not allow interactive data processing online due to the data size limitations. Here we report a more effective way to analyze FCM data on the web. Freecyto is a free and intuitive Python-flask-based web application that uses a weighted k-means clustering algorithm to facilitate the interactive analysis of flow cytometry data. A key limitation of web browsers is their inability to interactively display large amounts of data. Freecyto addresses this bottleneck through the use of the k-means algorithm to quantize the data, allowing the user to access a representative set of data points for interactive visualization of complex datasets. Moreover, Freecyto enables the interactive analyses of large complex datasets while preserving the standard FCM visualization features, such as the generation of scatterplots (dotplots), histograms, heatmaps, boxplots, as well as a SQL-based sub-population gating feature2. We also show that Freecyto can be applied to the analysis of various experimental setups that frequently require the use of FCM. Finally, we demonstrate that the data accuracy is preserved when Freecyto is compared to conventional FCM software.
Collapse
Affiliation(s)
- Nathan Wong
- Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA, 94720, USA.
| | - Daehwan Kim
- Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA, 94720, USA
| | - Zachery Robinson
- Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA, 94720, USA
| | - Connie Huang
- Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA, 94720, USA
| | - Irina M Conboy
- Department of Bioengineering and QB3, UC Berkeley, Berkeley, CA, 94720, USA.
| |
Collapse
|
25
|
Chulián S, Martínez-Rubio Á, Marciniak-Czochra A, Stiehl T, Goñi CB, Rodríguez Gutiérrez JF, Ramírez Orellana M, Castillo Robleda A, Pérez-García VM, Rosa M. Dynamical properties of feedback signalling in B lymphopoiesis: A mathematical modelling approach. J Theor Biol 2021; 522:110685. [PMID: 33745905 DOI: 10.1016/j.jtbi.2021.110685] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Revised: 12/09/2020] [Accepted: 03/15/2021] [Indexed: 12/11/2022]
Abstract
Haematopoiesis is the process of generation of blood cells. Lymphopoiesis generates lymphocytes, the cells in charge of the adaptive immune response. Disruptions of this process are associated with diseases like leukaemia, which is especially incident in children. The characteristics of self-regulation of this process make them suitable for a mathematical study. In this paper we develop mathematical models of lymphopoiesis using currently available data. We do this by drawing inspiration from existing structured models of cell lineage development and integrating them with paediatric bone marrow data, with special focus on regulatory mechanisms. A formal analysis of the models is carried out, giving steady states and their stability conditions. We use this analysis to obtain biologically relevant regions of the parameter space and to understand the dynamical behaviour of B-cell renovation. Finally, we use numerical simulations to obtain further insight into the influence of proliferation and maturation rates on the reconstitution of the cells in the B line. We conclude that a model including feedback regulation of cell proliferation represents a biologically plausible depiction for B-cell reconstitution in bone marrow. Research into haematological disorders could benefit from a precise dynamical description of B lymphopoiesis.
Collapse
Affiliation(s)
- Salvador Chulián
- Department of Mathematics, Universidad de Cádiz, Puerto Real, Cádiz, Spain; Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain.
| | - Álvaro Martínez-Rubio
- Department of Mathematics, Universidad de Cádiz, Puerto Real, Cádiz, Spain; Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
| | - Anna Marciniak-Czochra
- Institute of Applied Mathematics, BioQuant and Interdisciplinary Center of Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
| | - Thomas Stiehl
- Institute of Applied Mathematics, BioQuant and Interdisciplinary Center of Scientific Computing (IWR), Heidelberg University, Heidelberg, Germany
| | | | | | - Manuel Ramírez Orellana
- Department of Paediatric Haematology and Oncology, Hospital Infantil Universitario Niño Jesús, Instituto Investigación Sanitaria La Princesa, Madrid, Spain
| | - Ana Castillo Robleda
- Department of Paediatric Haematology and Oncology, Hospital Infantil Universitario Niño Jesús, Instituto Investigación Sanitaria La Princesa, Madrid, Spain
| | - Víctor M Pérez-García
- Department of Mathematics, Mathematical Oncology Laboratory (MOLAB), Universidad de Castilla-La Mancha, Ciudad Real, Spain; Instituto de Matemática Aplicada a la Ciencia y la Ingeniería (IMACI), Universidad de Castilla-La Mancha, Ciudad Real, Spain; ETSI Industriales, Universidad de Castilla-La Mancha, Ciudad Real, Spain
| | - María Rosa
- Department of Mathematics, Universidad de Cádiz, Puerto Real, Cádiz, Spain; Biomedical Research and Innovation Institute of Cádiz (INiBICA), Hospital Universitario Puerta del Mar, Cádiz, Spain
| |
Collapse
|
26
|
Ex vivo characterization of Breg cells in patients with chronic Chagas disease. Sci Rep 2021; 11:5511. [PMID: 33750870 PMCID: PMC7943772 DOI: 10.1038/s41598-021-84765-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Accepted: 02/18/2021] [Indexed: 02/07/2023] Open
Abstract
Despite the growing importance of the regulatory function of B cells in many infectious diseases, their immunosuppressive role remains elusive in chronic Chagas disease (CCD). Here, we studied the proportion of different B cell subsets and their capacity to secrete IL-10 ex vivo in peripheral blood from patients with or without CCD cardiomyopathy. First, we immunophenotyped peripheral blood mononuclear cells from patients according to the expression of markers CD19, CD24, CD38 and CD27 and we showed an expansion of total B cell and transitional CD24highCD38high B cell subsets in CCD patients with cardiac involvement compared to non-infected donors. Although no differences were observed in the frequency of total IL-10 producing B cells (B10) among the groups, CCD patients with cardiac involvement showed an increased proportion of naïve B10 cells and a tendency to a higher frequency of transitional B10 cells compared to non-infected donors. Our research demonstrates that transitional B cells are greatly expanded in patients with the cardiac form of CCD and these cells retain the ability to secrete IL-10. These findings provide insight into the phenotypic distribution of regulatory B cells in CCD, an important step towards new strategies to prevent cardiomyopathy associated with T. cruzi infection.
Collapse
|
27
|
Cheung M, Campbell JJ, Whitby L, Thomas RJ, Braybrook J, Petzing J. Current trends in flow cytometry automated data analysis software. Cytometry A 2021; 99:1007-1021. [PMID: 33606354 DOI: 10.1002/cyto.a.24320] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Revised: 01/21/2021] [Accepted: 01/28/2021] [Indexed: 12/16/2022]
Abstract
Automated flow cytometry (FC) data analysis tools for cell population identification and characterization are increasingly being used in academic, biotechnology, pharmaceutical, and clinical laboratories. The development of these computational methods is designed to overcome reproducibility and process bottleneck issues in manual gating, however, the take-up of these tools remains (anecdotally) low. Here, we performed a comprehensive literature survey of state-of-the-art computational tools typically published by research, clinical, and biomanufacturing laboratories for automated FC data analysis and identified popular tools based on literature citation counts. Dimensionality reduction methods ranked highly, such as generic t-distributed stochastic neighbor embedding (t-SNE) and its initial Matlab-based implementation for cytometry data viSNE. Software with graphical user interfaces also ranked highly, including PhenoGraph, SPADE1, FlowSOM, and Citrus, with unsupervised learning methods outnumbering supervised learning methods, and algorithm type popularity spread across K-Means, hierarchical, density-based, model-based, and other classes of clustering algorithms. Additionally, to illustrate the actual use typically within clinical spaces alongside frequent citations, a survey issued by UK NEQAS Leucocyte Immunophenotyping to identify software usage trends among clinical laboratories was completed. The survey revealed 53% of laboratories have not yet taken up automated cell population identification methods, though among those that have, Infinicyt software is the most frequently identified. Survey respondents considered data output quality to be the most important factor when using automated FC data analysis software, followed by software speed and level of technical support. This review found differences in software usage between biomedical institutions, with tools for discovery, data exploration, and visualization more popular in academia, whereas automated tools for specialized targeted analysis that apply supervised learning methods were more used in clinical settings.
Collapse
Affiliation(s)
- Melissa Cheung
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | | | - Liam Whitby
- UK NEQAS for Leucocyte Immunophenotyping, Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, United Kingdom
| | - Robert J Thomas
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| | - Julian Braybrook
- National Measurement Laboratory, LGC, Teddington, United Kingdom
| | - Jon Petzing
- Centre for Biological Engineering, Loughborough University, Loughborough, Leicestershire, United Kingdom
| |
Collapse
|
28
|
Abstract
Flow cytometry is an important technology for the study of microbial communities. It grants the ability to rapidly generate phenotypic single-cell data that are both quantitative, multivariate and of high temporal resolution. The complexity and amount of data necessitate an objective and streamlined data processing workflow that extends beyond commercial instrument software. No full overview of the necessary steps regarding the computational analysis of microbial flow cytometry data currently exists. In this review, we provide an overview of the full data analysis pipeline, ranging from measurement to data interpretation, tailored toward studies in microbial ecology. At every step, we highlight computational methods that are potentially useful, for which we provide a short nontechnical description. We place this overview in the context of a number of open challenges to the field and offer further motivation for the use of standardized flow cytometry in microbial ecology research.
Collapse
Affiliation(s)
| | - Ruben Props
- Center for Microbial Ecology & Technology (CMET), Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| |
Collapse
|
29
|
Ben-Othman R, Cai B, Liu AC, Varankovich N, He D, Blimkie TM, Lee AH, Gill EE, Novotny M, Aevermann B, Drissler S, Shannon CP, McCann S, Marty K, Bjornson G, Edgar RD, Lin DTS, Gladish N, Maclsaac J, Amenyogbe N, Chan Q, Llibre A, Collin J, Landais E, Le K, Reiss SM, Koff WC, Havenar-Daughton C, Heran M, Sangha B, Walt D, Krajden M, Crotty S, Sok D, Briney B, Burton DR, Duffy D, Foster LJ, Mohn WW, Kobor MS, Tebbutt SJ, Brinkman RR, Scheuermann RH, Hancock REW, Kollmann TR, Sadarangani M. Systems Biology Methods Applied to Blood and Tissue for a Comprehensive Analysis of Immune Response to Hepatitis B Vaccine in Adults. Front Immunol 2020; 11:580373. [PMID: 33250895 PMCID: PMC7672042 DOI: 10.3389/fimmu.2020.580373] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 09/24/2020] [Indexed: 12/26/2022] Open
Abstract
Conventional vaccine design has been based on trial-and-error approaches, which have been generally successful. However, there have been some major failures in vaccine development and we still do not have highly effective licensed vaccines for tuberculosis, HIV, respiratory syncytial virus, and other major infections of global significance. Approaches at rational vaccine design have been limited by our understanding of the immune response to vaccination at the molecular level. Tools now exist to undertake in-depth analysis using systems biology approaches, but to be fully realized, studies are required in humans with intensive blood and tissue sampling. Methods that support this intensive sampling need to be developed and validated as feasible. To this end, we describe here a detailed approach that was applied in a study of 15 healthy adults, who were immunized with hepatitis B vaccine. Sampling included ~350 mL of blood, 12 microbiome samples, and lymph node fine needle aspirates obtained over a ~7-month period, enabling comprehensive analysis of the immune response at the molecular level, including single cell and tissue sample analysis. Samples were collected for analysis of immune phenotyping, whole blood and single cell gene expression, proteomics, lipidomics, epigenetics, whole blood response to key immune stimuli, cytokine responses, in vitro T cell responses, antibody repertoire analysis and the microbiome. Data integration was undertaken using different approaches-NetworkAnalyst and DIABLO. Our results demonstrate that such intensive sampling studies are feasible in healthy adults, and data integration tools exist to analyze the vast amount of data generated from a multi-omics systems biology approach. This will provide the basis for a better understanding of vaccine-induced immunity and accelerate future rational vaccine design.
Collapse
Affiliation(s)
- Rym Ben-Othman
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada.,Telethon Kids Institute, University of Western Australia, Nedlands, WA, Australia
| | - Bing Cai
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Aaron C Liu
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Natallia Varankovich
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Daniel He
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Travis M Blimkie
- Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, BC, Canada
| | - Amy H Lee
- Simon Fraser University, Burnaby, BC, Canada
| | - Erin E Gill
- Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, BC, Canada
| | - Mark Novotny
- Department of Informatics, J. Craig Venter Institute (La Jolla), La Jolla, CA, United States
| | - Brian Aevermann
- Department of Informatics, J. Craig Venter Institute (La Jolla), La Jolla, CA, United States
| | | | - Casey P Shannon
- Prevention of Organ Failure (PROOF) Centre of Excellence and Centre for Heart Lung Innovation, St. Paul's Hospital, Vancouver, BC, Canada
| | - Sarah McCann
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Kim Marty
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Gordean Bjornson
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| | - Rachel D Edgar
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - David Tse Shen Lin
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Nicole Gladish
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Julia Maclsaac
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Nelly Amenyogbe
- Telethon Kids Institute, University of Western Australia, Nedlands, WA, Australia
| | - Queenie Chan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Alba Llibre
- Translational Immunology Lab, Institut Pasteur, Paris, France
| | - Joyce Collin
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States
| | - Elise Landais
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States.,IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, United States
| | - Khoa Le
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States.,IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, United States
| | - Samantha M Reiss
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA, United States
| | - Wayne C Koff
- Human Vaccines Project, New York, NY, United States
| | - Colin Havenar-Daughton
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA, United States
| | - Manraj Heran
- Department of Radiology, BC Children's Hospital, Vancouver, BC, Canada
| | - Bippan Sangha
- Department of Radiology, BC Children's Hospital, Vancouver, BC, Canada
| | - David Walt
- Wyss Institute at Harvard University, Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, United States
| | - Mel Krajden
- British Columbia Centre for Disease Control, Vancouver, BC, Canada
| | - Shane Crotty
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA, United States
| | - Devin Sok
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States.,IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA, United States
| | - Bryan Briney
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States
| | - Dennis R Burton
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, United States
| | - Darragh Duffy
- Translational Immunology Lab, Institut Pasteur, Paris, France
| | - Leonard J Foster
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - William W Mohn
- Department of Microbiology and Immunology, Life Sciences Institute, University of British Columbia, Vancouver, BC, Canada
| | - Michael S Kobor
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Scott J Tebbutt
- Prevention of Organ Failure (PROOF) Centre of Excellence and Centre for Heart Lung Innovation, St. Paul's Hospital, Vancouver, BC, Canada.,Department of Medicine, Division of Respiratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Ryan R Brinkman
- Terry Fox Laboratory, Vancouver, BC, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Richard H Scheuermann
- Department of Informatics, J. Craig Venter Institute (La Jolla), La Jolla, CA, United States.,Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology (LJI), La Jolla, CA, United States
| | - Robert E W Hancock
- Centre for Microbial Diseases and Immunity Research, University of British Columbia, Vancouver, BC, Canada
| | - Tobias R Kollmann
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada.,Telethon Kids Institute, University of Western Australia, Nedlands, WA, Australia
| | - Manish Sadarangani
- Vaccine Evaluation Center, BC Children's Hospital Research Institute, Vancouver, BC, Canada
| |
Collapse
|
30
|
Del Barrio E, Inouzhe H, Loubes JM, Matrán C, Mayo-Íscar A. optimalFlow: optimal transport approach to flow cytometry gating and population matching. BMC Bioinformatics 2020; 21:479. [PMID: 33109072 PMCID: PMC7590740 DOI: 10.1186/s12859-020-03795-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 10/01/2020] [Indexed: 11/12/2022] Open
Abstract
Background Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating. Results We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow, a Bioconductor R package at https://bioconductor.org/packages/optimalFlow. Conclusions optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.
Collapse
Affiliation(s)
- Eustasio Del Barrio
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Hristo Inouzhe
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain. .,IMUVA, Calle Paseo de Belén, Valladolid, Spain.
| | - Jean-Michel Loubes
- Université Paul Sabatier, Route de Narbonne, Toulouse, France.,IMT, Route de Narbonne, Toulouse, France
| | - Carlos Matrán
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| | - Agustín Mayo-Íscar
- Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.,IMUVA, Calle Paseo de Belén, Valladolid, Spain
| |
Collapse
|
31
|
Wang Y, Wang D, Pang W, Miao C, Tan AH, Zhou Y. A systematic density-based clustering method using anchor points. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.02.119] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
32
|
Ochi Y, Kon A, Sakata T, Nakagawa MM, Nakazawa N, Kakuta M, Kataoka K, Koseki H, Nakayama M, Morishita D, Tsuruyama T, Saiki R, Yoda A, Okuda R, Yoshizato T, Yoshida K, Shiozawa Y, Nannya Y, Kotani S, Kogure Y, Kakiuchi N, Nishimura T, Makishima H, Malcovati L, Yokoyama A, Takeuchi K, Sugihara E, Sato TA, Sanada M, Takaori-Kondo A, Cazzola M, Kengaku M, Miyano S, Shirahige K, Suzuki HI, Ogawa S. Combined Cohesin-RUNX1 Deficiency Synergistically Perturbs Chromatin Looping and Causes Myelodysplastic Syndromes. Cancer Discov 2020; 10:836-853. [PMID: 32249213 PMCID: PMC7269820 DOI: 10.1158/2159-8290.cd-19-0982] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Revised: 02/05/2020] [Accepted: 03/16/2020] [Indexed: 12/27/2022]
Abstract
STAG2 encodes a cohesin component and is frequently mutated in myeloid neoplasms, showing highly significant comutation patterns with other drivers, including RUNX1. However, the molecular basis of cohesin-mutated leukemogenesis remains poorly understood. Here we show a critical role of an interplay between STAG2 and RUNX1 in the regulation of enhancer-promoter looping and transcription in hematopoiesis. Combined loss of STAG2 and RUNX1, which colocalize at enhancer-rich, CTCF-deficient sites, synergistically attenuates enhancer-promoter loops, particularly at sites enriched for RNA polymerase II and Mediator, and deregulates gene expression, leading to myeloid-skewed expansion of hematopoietic stem/progenitor cells (HSPC) and myelodysplastic syndromes (MDS) in mice. Attenuated enhancer-promoter loops in STAG2/RUNX1-deficient cells are associated with downregulation of genes with high basal transcriptional pausing, which are important for regulation of HSPCs. Downregulation of high-pausing genes is also confirmed in STAG2-cohesin-mutated primary leukemia samples. Our results highlight a unique STAG2-RUNX1 interplay in gene regulation and provide insights into cohesin-mutated leukemogenesis. SIGNIFICANCE: We demonstrate a critical role of an interplay between STAG2 and a master transcription factor of hematopoiesis, RUNX1, in MDS development, and further reveal their contribution to regulation of high-order chromatin structures, particularly enhancer-promoter looping, and the link between transcriptional pausing and selective gene dysregulation caused by cohesin deficiency.This article is highlighted in the In This Issue feature, p. 747.
Collapse
Affiliation(s)
- Yotaro Ochi
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Department of Hematology and Oncology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ayana Kon
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Toyonori Sakata
- Laboratory of Genome Structure and Function, Research Division for Quantitative Life Sciences, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan
| | - Masahiro M Nakagawa
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Naotaka Nakazawa
- Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Kyoto, Japan
| | - Masanori Kakuta
- Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Keisuke Kataoka
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Haruhiko Koseki
- Laboratory for Developmental Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | - Manabu Nakayama
- Laboratory of Medical Omics Research, Department of Frontier Research and Development, Kazusa DNA Research Institute, Kisarazu, Japan
| | | | - Tatsuaki Tsuruyama
- Department of Drug and Discovery Medicine, Pathology Division, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Ryunosuke Saiki
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Akinori Yoda
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Rurika Okuda
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Tetsuichi Yoshizato
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Kenichi Yoshida
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yusuke Shiozawa
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasuhito Nannya
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Shinichi Kotani
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Department of Hematology and Oncology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasunori Kogure
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Nobuyuki Kakiuchi
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Tomomi Nishimura
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Hideki Makishima
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Luca Malcovati
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
- Department of Hematology Oncology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Akihiko Yokoyama
- Tsuruoka Metabolomics Laboratory, National Cancer Center, Yamagata, Japan
| | - Kengo Takeuchi
- Pathology Project for Molecular Targets, Cancer Institute, Japanese Foundation for Cancer Research, Tokyo, Japan
| | - Eiji Sugihara
- Research and Development Center for Precision Medicine, University of Tsukuba, Ibaraki, Japan
| | - Taka-Aki Sato
- Research and Development Center for Precision Medicine, University of Tsukuba, Ibaraki, Japan
| | - Masashi Sanada
- Department of Advanced Diagnosis, Clinical Research Center, National Hospital Organization Nagoya Medical Center, Nagoya, Japan
| | - Akifumi Takaori-Kondo
- Department of Hematology and Oncology, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Mario Cazzola
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
- Department of Hematology Oncology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Mineko Kengaku
- Institute for Integrated Cell-Material Sciences (WPI-iCeMS), Kyoto University, Kyoto, Japan
- Graduate School of Biostudies, Kyoto University, Kyoto, Japan
| | - Satoru Miyano
- Laboratory of DNA Information Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Katsuhiko Shirahige
- Laboratory of Genome Structure and Function, Research Division for Quantitative Life Sciences, Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan
| | - Hiroshi I Suzuki
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology, Cambridge, Massachusetts.
| | - Seishi Ogawa
- Department of Pathology and Tumor Biology, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
- Department of Medicine, Centre for Haematology and Regenerative Medicine, Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
33
|
Stassen SV, Siu DMD, Lee KCM, Ho JWK, So HKH, Tsia KK. PARC: ultrafast and accurate clustering of phenotypic data of millions of single cells. Bioinformatics 2020; 36:2778-2786. [PMID: 31971583 PMCID: PMC7203756 DOI: 10.1093/bioinformatics/btaa042] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/24/2019] [Accepted: 01/16/2020] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION New single-cell technologies continue to fuel the explosive growth in the scale of heterogeneous single-cell data. However, existing computational methods are inadequately scalable to large datasets and therefore cannot uncover the complex cellular heterogeneity. RESULTS We introduce a highly scalable graph-based clustering algorithm PARC-Phenotyping by Accelerated Refined Community-partitioning-for large-scale, high-dimensional single-cell data (>1 million cells). Using large single-cell flow and mass cytometry, RNA-seq and imaging-based biophysical data, we demonstrate that PARC consistently outperforms state-of-the-art clustering algorithms without subsampling of cells, including Phenograph, FlowSOM and Flock, in terms of both speed and ability to robustly detect rare cell populations. For example, PARC can cluster a single-cell dataset of 1.1 million cells within 13 min, compared with >2 h for the next fastest graph-clustering algorithm. Our work presents a scalable algorithm to cope with increasingly large-scale single-cell analysis. AVAILABILITY AND IMPLEMENTATION https://github.com/ShobiStassen/PARC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Joshua W K Ho
- School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | | | - Kevin K Tsia
- Department of Electrical and Electronic Engineering
| |
Collapse
|
34
|
Liu P, Liu S, Fang Y, Xue X, Zou J, Tseng G, Konnikova L. Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data. Front Cell Dev Biol 2020; 8:234. [PMID: 32411698 PMCID: PMC7198724 DOI: 10.3389/fcell.2020.00234] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 03/20/2020] [Indexed: 11/13/2022] Open
Abstract
The progress in the field of high-dimensional cytometry has greatly increased the number of markers that can be simultaneously analyzed producing datasets with large numbers of parameters. Traditional biaxial manual gating might not be optimal for such datasets. To overcome this, a large number of automated tools have been developed to aid with cellular clustering of multi-dimensional datasets. Here were review two large categories of such tools; unsupervised and supervised clustering tools. After a thorough review of the popularity and use of each of the available unsupervised clustering tools, we focus on the top six tools to discuss their advantages and limitations. Furthermore, we employ a publicly available dataset to directly compare the usability, speed, and relative effectiveness of the available unsupervised and supervised tools. Finally, we discuss the current challenges for existing methods and future direction for the new generation of cell type identification approaches.
Collapse
Affiliation(s)
- Peng Liu
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Silvia Liu
- Department of Pathology, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yusi Fang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Xiangning Xue
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jian Zou
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - George Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Liza Konnikova
- Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, United States
- Department of Developmental Biology, University of Pittsburgh, Pittsburgh, PA, United States
| |
Collapse
|
35
|
Lucchesi S, Furini S, Medaglini D, Ciabattini A. From Bivariate to Multivariate Analysis of Cytometric Data: Overview of Computational Methods and Their Application in Vaccination Studies. Vaccines (Basel) 2020; 8:E138. [PMID: 32244919 PMCID: PMC7157606 DOI: 10.3390/vaccines8010138] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 03/17/2020] [Accepted: 03/18/2020] [Indexed: 12/15/2022] Open
Abstract
Flow and mass cytometry are used to quantify the expression of multiple extracellular or intracellular molecules on single cells, allowing the phenotypic and functional characterization of complex cell populations. Multiparametric flow cytometry is particularly suitable for deep analysis of immune responses after vaccination, as it allows to measure the frequency, the phenotype, and the functional features of antigen-specific cells. When many parameters are investigated simultaneously, it is not feasible to analyze all the possible bi-dimensional combinations of marker expression with classical manual analysis and the adoption of advanced automated tools to process and analyze high-dimensional data sets becomes necessary. In recent years, the development of many tools for the automated analysis of multiparametric cytometry data has been reported, with an increasing record of publications starting from 2014. However, the use of these tools has been preferentially restricted to bioinformaticians, while few of them are routinely employed by the biomedical community. Filling the gap between algorithms developers and final users is fundamental for exploiting the advantages of computational tools in the analysis of cytometry data. The potentialities of automated analyses range from the improvement of the data quality in the pre-processing steps up to the unbiased, data-driven examination of complex datasets using a variety of algorithms based on different approaches. In this review, an overview of the automated analysis pipeline is provided, spanning from the pre-processing phase to the automated population analysis. Analysis based on computational tools might overcame both the subjectivity of manual gating and the operator-biased exploration of expected populations. Examples of applications of automated tools that have successfully improved the characterization of different cell populations in vaccination studies are also presented.
Collapse
Affiliation(s)
- Simone Lucchesi
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| | - Simone Furini
- Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy;
| | - Donata Medaglini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| | - Annalisa Ciabattini
- Laboratory of Molecular Microbiology and Biotechnology (LA.M.M.B.), Department of Medical Biotechnologies, University of Siena, 53100 Siena, Italy; (S.L.); (D.M.)
| |
Collapse
|
36
|
Garg SK, Ott MJ, Mostofa AGM, Chen Z, Chen YA, Kroeger J, Cao B, Mailloux AW, Agrawal A, Schaible BJ, Sarnaik A, Weber JS, Berglund AE, Mulé JJ, Markowitz J. Multi-Dimensional Flow Cytometry Analyses Reveal a Dichotomous Role for Nitric Oxide in Melanoma Patients Receiving Immunotherapy. Front Immunol 2020; 11:164. [PMID: 32161584 PMCID: PMC7052497 DOI: 10.3389/fimmu.2020.00164] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 01/21/2020] [Indexed: 11/13/2022] Open
Abstract
Phenotyping of immune cell subsets in clinical trials is limited to well-defined phenotypes, due to technological limitations of reporting flow cytometry multi-dimensional phenotyping data. We developed a multi-dimensional phenotyping analysis tool and applied it to detect nitric oxide (NO) levels in peripheral blood immune cells before and after adjuvant ipilimumab co-administration with a peptide vaccine in melanoma patients. We analyzed inhibitory and stimulatory markers for immune cell phenotypes that were felt to be important in the NO analysis. The pipeline allows visualization of immune cell phenotypes without knowledge of clustering techniques and to categorize cells by association with relapse-free survival (RFS). Using this analysis, we uncovered the potential for a dichotomous role of NO as a pro- and anti-melanoma factor. NO was found in subsets of immune-suppressor cells associated with shorter-term (≤ 1 year) RFS, whereas NO was also present in immune-stimulatory effector cells obtained from patients with significant longer-term (> 1 year) RFS. These studies provide insights into the cell-specific immunomodulatory role of NO. The methods presented herein can be applied to monitor the pro- and anti-tumor effects of a variety of immune-based therapeutics in cancer patients. Clinical Trial Registration Number: NCT00084656 (https://clinicaltrials.gov/ct2/show/NCT00084656).
Collapse
Affiliation(s)
- Saurabh K Garg
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Matthew J Ott
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - A G M Mostofa
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Zhihua Chen
- Cancer Informatics Core, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Y Ann Chen
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Jodi Kroeger
- Flow Cytometry Core, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Biwei Cao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Adam W Mailloux
- Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Alisha Agrawal
- Department of Oncologic Sciences, USF Health Morsani College of Medicine, University of South Florida, Tampa, FL, United States
| | - Braydon J Schaible
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Amod Sarnaik
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States.,Department of Oncologic Sciences, USF Health Morsani College of Medicine, University of South Florida, Tampa, FL, United States
| | - Jeffrey S Weber
- Department of Medicine, Laura and Isaac Perlmutter Cancer Center, NYU Langone Health, New York, NY, United States
| | - Anders E Berglund
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - James J Mulé
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States.,Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States
| | - Joseph Markowitz
- Department of Cutaneous Oncology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States.,Department of Immunology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States.,Department of Oncologic Sciences, USF Health Morsani College of Medicine, University of South Florida, Tampa, FL, United States
| |
Collapse
|
37
|
Qi Y, Fang Y, Sinclair DR, Guo S, Alberich-Jorda M, Lu J, Tenen DG, Kharas MG, Pyne S. High-speed automatic characterization of rare events in flow cytometric data. PLoS One 2020; 15:e0228651. [PMID: 32045462 PMCID: PMC7012421 DOI: 10.1371/journal.pone.0228651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 01/21/2020] [Indexed: 11/19/2022] Open
Abstract
A new computational framework for FLow cytometric Analysis of Rare Events (FLARE) has been developed specifically for fast and automatic identification of rare cell populations in very large samples generated by platforms like multi-parametric flow cytometry. Using a hierarchical Bayesian model and information-sharing via parallel computation, FLARE rapidly explores the high-dimensional marker-space to detect highly rare populations that are consistent across multiple samples. Further it can focus within specified regions of interest in marker-space to detect subpopulations with desired precision.
Collapse
Affiliation(s)
- Yuan Qi
- Department of Computer Science, Purdue University, West Lafayette, IN, United States of America
- Department of Statistics, Purdue University, West Lafayette, IN, United States of America
- * E-mail: (YQ); (SP)
| | - Youhan Fang
- Department of Computer Science, Purdue University, West Lafayette, IN, United States of America
| | - David R. Sinclair
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- Department of Health Policy and Management, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Shangqin Guo
- Department of Cell Biology, Yale University School of Medicine, New Haven, CT, United States of America
| | | | - Jun Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, United States of America
- Yale Stem Cell Center, Yale University School of Medicine, New Haven, CT, United States of America
| | - Daniel G. Tenen
- Center for Life Sciences, Harvard Medical School, Boston, MA, United States of America
- Harvard Stem Cell Institute, Harvard Medical School, Boston, MA, United States of America
- Cancer Science Institute, National University of Singapore, Singapore, Singapore
| | - Michael G. Kharas
- Molecular Pharmacology Program, Memorial Sloan Kettering Cancer Center, New York, NY, United States of America
| | - Saumyadipta Pyne
- Public Health Dynamics Laboratory, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, United States of America
- * E-mail: (YQ); (SP)
| |
Collapse
|
38
|
Brink BG, Meskas J, Brinkman RR. ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics 2019. [PMID: 29534153 PMCID: PMC6061851 DOI: 10.1093/bioinformatics/bty136] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Motivation Droplet digital PCR (ddPCR) is an emerging technology for quantifying DNA. By partitioning the target DNA into ∼20 000 droplets, each serving as its own PCR reaction compartment, a very high sensitivity of DNA quantification can be achieved. However, manual analysis of the data is time consuming and algorithms for automated analysis of non-orthogonal, multiplexed ddPCR data are unavailable, presenting a major bottleneck for the advancement of ddPCR transitioning from low-throughput to high-throughput. Results ddPCRclust is an R package for automated analysis of data from Bio-Rad’s droplet digital PCR systems (QX100 and QX200). It can automatically analyze and visualize multiplexed ddPCR experiments with up to four targets per reaction. Results are on par with manual analysis, but only take minutes to compute instead of hours. The accompanying Shiny app ddPCRvis provides easy access to the functionalities of ddPCRclust through a web-browser based GUI. Availability and implementation R package: https://github.com/bgbrink/ddPCRclust; Interface: https://github.com/bgbrink/ddPCRvis/; Web: https://bibiserv.cebitec.uni-bielefeld.de/ddPCRvis/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Benedikt G Brink
- International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Biodata Mining Group, Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Justin Meskas
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, Canada
| | - Ryan R Brinkman
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, Canada.,Department Medical Genetics, University of British Columbia, Vancouver, Canada
| |
Collapse
|
39
|
Ram Y, Dellus-Gur E, Bibi M, Karkare K, Obolski U, Feldman MW, Cooper TF, Berman J, Hadany L. Predicting microbial growth in a mixed culture from growth curve data. Proc Natl Acad Sci U S A 2019; 116:14698-14707. [PMID: 31253703 PMCID: PMC6642348 DOI: 10.1073/pnas.1902217116] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Determining the fitness of specific microbial genotypes has extensive application in microbial genetics, evolution, and biotechnology. While estimates from growth curves are simple and allow high throughput, they are inaccurate and do not account for interactions between costs and benefits accruing over different parts of a growth cycle. For this reason, pairwise competition experiments are the current "gold standard" for accurate estimation of fitness. However, competition experiments require distinct markers, making them difficult to perform between isolates derived from a common ancestor or between isolates of nonmodel organisms. In addition, competition experiments require that competing strains be grown in the same environment, so they cannot be used to infer the fitness consequence of different environmental perturbations on the same genotype. Finally, competition experiments typically consider only the end-points of a period of competition so that they do not readily provide information on the growth differences that underlie competitive ability. Here, we describe a computational approach for predicting density-dependent microbial growth in a mixed culture utilizing data from monoculture and mixed-culture growth curves. We validate this approach using 2 different experiments with Escherichia coli and demonstrate its application for estimating relative fitness. Our approach provides an effective way to predict growth and infer relative fitness in mixed cultures.
Collapse
Affiliation(s)
- Yoav Ram
- School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv 6997801, Israel;
- Department of Biology, Stanford University, Stanford, CA 94305
- School of Computer Science, Interdisciplinary Center Herzliya, Herzliya 4610101, Israel
| | - Eynat Dellus-Gur
- School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maayan Bibi
- School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Kedar Karkare
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004
| | - Uri Obolski
- School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv 6997801, Israel
- School of Public Health, Tel Aviv University, Tel Aviv 6997801, Israel
- Porter School of the Environment and Earth Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | | | - Tim F Cooper
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77004
- Institute of Natural and Mathematical Sciences, Massey University, Palmerston North, 4442, New Zealand
| | - Judith Berman
- School of Molecular Cell Biology and Biotechnology, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Lilach Hadany
- School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
40
|
Nguyen B, Rubbens P, Kerckhof FM, Boon N, De Baets B, Waegeman W. Learning Single-Cell Distances from Cytometry Data. Cytometry A 2019; 95:782-791. [PMID: 31099963 DOI: 10.1002/cyto.a.23792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 03/31/2019] [Accepted: 04/23/2019] [Indexed: 12/27/2022]
Abstract
Recent years have seen an increased interest in employing data analysis techniques for the automated identification of cell populations in the field of cytometry. These techniques highly depend on the use of a distance metric, a function that quantifies the distances between single-cell measurements. In most cases, researchers simply use the Euclidean distance metric. In this article, we exploit the availability of single-cell labels to find an optimal Mahalanobis distance metric derived from the data. We show that such a Mahalanobis distance metric results in an improved identification of cell populations compared with the Euclidean distance metric. Once determined, it can be used for the analysis of multiple samples that were measured under the same experimental setup. We illustrate this approach for cytometry data from two different origins, that is, flow cytometry applied to microbial cells and mass cytometry for the analysis of human blood cells. We also illustrate that such a distance metric results in an improved identification of cell populations when clustering methods are employed. Generally, these results imply that the performance of data analysis techniques can be improved by using a more advanced distance metric. © 2019 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Bac Nguyen
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Peter Rubbens
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Frederiek-Maarten Kerckhof
- Center for Microbial Ecology and Technology, Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Nico Boon
- Center for Microbial Ecology and Technology, Department of Biotechnology, Ghent University, 9000 Ghent, Belgium
| | - Bernard De Baets
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| | - Willem Waegeman
- KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, 9000 Ghent, Belgium
| |
Collapse
|
41
|
Tinnevelt GH, van Staveren S, Wouters K, Wijnands E, Verboven K, Folcarelli R, Koenderman L, Buydens LMC, Jansen JJ. A novel data fusion method for the effective analysis of multiple panels of flow cytometry data. Sci Rep 2019; 9:6777. [PMID: 31043667 PMCID: PMC6494873 DOI: 10.1038/s41598-019-43166-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 04/17/2019] [Indexed: 11/09/2022] Open
Abstract
Multicolour flow cytometry (MFC) is used to measure multiple cellular markers at the single-cell level. Cellular markers may be coloured with different panels of fluorescently-labelled antibodies to enable cell identification or the detection of activated cells in pre-defined, ‘gated’ specific cell subsets. The number of markers that can be used per measurement is technologically limited however, requiring every panel to be analysed in a separate aliquot measurement. The combined analyses of these dedicated panels may enhance the predictive ability of these measurements and could enrich the interpretation of the immunological information. Here we introduce a fusion method for MFC data, based on DAMACY (Discriminant Analysis of Multi-Aspect Cytometry data), which can combine information from complementary panels. This approach leads to both enhanced predictions and clearer interpretations in comparison with the analysis of separate measurements. We illustrate this method using two datasets: the response of neutrophils evoked by a systemic endotoxin challenge and the activated immune status of the innate cells, T cells and B cells in obese versus lean individuals. The data fusion approach was able to detect cells that do not individually show a difference between clinical phenotypes but do play a role in combination with other cells.
Collapse
Affiliation(s)
- Gerjen H Tinnevelt
- Radboud University, Institute for Molecules and Materials (Analytical Chemistry), postvak 61, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands. .,TI-COAST, Science Park 904, 1098 XH, Amsterdam, The Netherlands.
| | - Selma van Staveren
- TI-COAST, Science Park 904, 1098 XH, Amsterdam, The Netherlands.,Department of Respiratory Medicine and laboratory of translational immunology (LTI), University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Kristiaan Wouters
- Deptartment of Internal Medicine Laboratory of Metabolism and Vascular Medicine, P.O. Box 616 (UNS50/14), 6200 MD, Maastricht, The Netherlands
| | - Erwin Wijnands
- Experimental Vascular Pathology group, P.O. Box 5800, 6202 MZ, Maastricht, The Netherlands
| | - Kenneth Verboven
- REVAL - Rehabilitation Research Center, Faculty of Rehabilitation Sciences, Hasselt University, Diepenbeek, Belgium.,BIOMED - Biomedical Research Institute, Faculty of Medicine and Life Sciences, Hasselt University, Diepenbeek, Belgium
| | - Rita Folcarelli
- Radboud University, Institute for Molecules and Materials (Analytical Chemistry), postvak 61, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
| | - Leo Koenderman
- Department of Respiratory Medicine and laboratory of translational immunology (LTI), University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
| | - Lutgarde M C Buydens
- Radboud University, Institute for Molecules and Materials (Analytical Chemistry), postvak 61, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
| | - Jeroen J Jansen
- Radboud University, Institute for Molecules and Materials (Analytical Chemistry), postvak 61, P.O. Box 9010, 6500 GL, Nijmegen, The Netherlands
| |
Collapse
|
42
|
Abstract
Background Flow cytometry is a popular technology for quantitative single-cell profiling of cell surface markers. It enables expression measurement of tens of cell surface protein markers in millions of single cells. It is a powerful tool for discovering cell sub-populations and quantifying cell population heterogeneity. Traditionally, scientists use manual gating to identify cell types, but the process is subjective and is not effective for large multidimensional data. Many clustering algorithms have been developed to analyse these data but most of them are not scalable to very large data sets with more than ten million cells. Results Here, we present a new clustering algorithm that combines the advantages of density-based clustering algorithm DBSCAN with the scalability of grid-based clustering. This new clustering algorithm is implemented in python as an open source package, FlowGrid. FlowGrid is memory efficient and scales linearly with respect to the number of cells. We have evaluated the performance of FlowGrid against other state-of-the-art clustering programs and found that FlowGrid produces similar clustering results but with substantially less time. For example, FlowGrid is able to complete a clustering task on a data set of 23.6 million cells in less than 12 seconds, while other algorithms take more than 500 seconds or get into error. Conclusions FlowGrid is an ultrafast clustering algorithm for large single-cell flow cytometry data. The source code is available at https://github.com/VCCRI/FlowGrid.
Collapse
Affiliation(s)
- Xiaoxin Ye
- Victor Chang Cardiac Research Institute, Sydney, Australia.,University of New South Wales, Sydney, Australia
| | - Joshua W K Ho
- Victor Chang Cardiac Research Institute, Sydney, Australia. .,University of New South Wales, Sydney, Australia. .,School of Biomedical Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| |
Collapse
|
43
|
Theorell A, Bryceson YT, Theorell J. Determination of essential phenotypic elements of clusters in high-dimensional entities-DEPECHE. PLoS One 2019; 14:e0203247. [PMID: 30845234 PMCID: PMC6405191 DOI: 10.1371/journal.pone.0203247] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2018] [Accepted: 01/11/2019] [Indexed: 01/17/2023] Open
Abstract
Technological advances have facilitated an exponential increase in the amount of information that can be derived from single cells, necessitating new computational tools that can make such highly complex data interpretable. Here, we introduce DEPECHE, a rapid, parameter free, sparse k-means-based algorithm for clustering of multi- and megavariate single-cell data. In a number of computational benchmarks aimed at evaluating the capacity to form biologically relevant clusters, including flow/mass-cytometry and single cell RNA sequencing data sets with manually curated gold standard solutions, DEPECHE clusters as well or better than the currently available best performing clustering algorithms. However, the main advantage of DEPECHE, compared to the state-of-the-art, is its unique ability to enhance interpretability of the formed clusters, in that it only retains variables relevant for cluster separation, thereby facilitating computational efficient analyses as well as understanding of complex datasets. DEPECHE is implemented in the open source R package DepecheR currently available at github.com/Theorell/DepecheR.
Collapse
Affiliation(s)
- Axel Theorell
- IBG-1: Biotechnology, Institute of Bio- and Geosciences, Forschungszentrum Jülich GmbH, Jülich, North Rhine-Westphalia, Germany
| | - Yenan Troi Bryceson
- Center for Hematology and Regenerative Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden
- Broegelmann Research Laboratory, Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Jakob Theorell
- Center for Hematology and Regenerative Medicine, Department of Medicine Huddinge, Karolinska Institutet, Stockholm, Sweden
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, United Kingdom
- * E-mail:
| |
Collapse
|
44
|
Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R. Sequential Dirichlet process mixtures of multivariate skew $t$-distributions for model-based clustering of flow cytometry data. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1209] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
45
|
CytoBinning: Immunological insights from multi-dimensional data. PLoS One 2018; 13:e0205291. [PMID: 30379838 PMCID: PMC6209166 DOI: 10.1371/journal.pone.0205291] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Accepted: 09/22/2018] [Indexed: 01/25/2023] Open
Abstract
New cytometric techniques continue to push the boundaries of multi-parameter quantitative data acquisition at the single-cell level particularly in immunology and medicine. Sophisticated analysis methods for such ever higher dimensional datasets are rapidly emerging, with advanced data representations and dimensional reduction approaches. However, these are not yet standardized and clinical scientists and cell biologists are not yet experienced in their interpretation. More fundamentally their range of statistical validity is not yet fully established. We therefore propose a new method for the automated and unbiased analysis of high-dimensional single cell datasets that is simple and robust, with the goal of reducing this complex information into a familiar 2D scatter plot representation that is of immediate utility to a range of biomedical and clinical settings. Using publicly available flow cytometry and mass cytometry datasets we demonstrate that this method (termed CytoBinning), recapitulates the results of traditional manual cytometric analyses and leads to new and testable hypotheses.
Collapse
|
46
|
Manohar S, Shah P, Biswas S, Mukadam A, Joshi M, Viswanathan G. Combining fluorescent cell barcoding and flow cytometry‐based phospho‐ERK1/2 detection at short time scales in adherent cells. Cytometry A 2018; 95:192-200. [DOI: 10.1002/cyto.a.23602] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 08/09/2018] [Accepted: 08/20/2018] [Indexed: 12/23/2022]
Affiliation(s)
- Sonal Manohar
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| | - Prachi Shah
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| | - Sharmila Biswas
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| | - Anam Mukadam
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| | - Madhura Joshi
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| | - Ganesh Viswanathan
- Department of Chemical EngineeringIndian Institute of Technology Bombay Powai, Mumbai 400076 India
| |
Collapse
|
47
|
Yang X, Qiu P. Automatically generate two-dimensional gating hierarchy from clustered cytometry data. Cytometry A 2018; 93:1039-1050. [PMID: 30176185 DOI: 10.1002/cyto.a.23577] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2017] [Revised: 07/17/2018] [Accepted: 07/18/2018] [Indexed: 12/29/2022]
Abstract
Cytometry is an important technique widely used in medicine and biological research. Biologists traditionally analyze single-cell cytometry data by manual gating, which can be subjective and labor intensive. To address this issue, many automated and semiautomated methods have been developed. These advanced methods are designed to speed up and standardize the analysis of cytometry data, but their popularity is limited by their visualizations which are not intuitive to biologists who are accustomed to the conventional biaxial gating plots. In this article, we present a new method called Cluster-to-Gate (C2G) that can take clustering results as input, and automatically generate a nested two-dimensional gating hierarchy, which is a visualization representation that biologists are familiar with. This method can generate gating sequences for multiple target populations simultaneously and summarize them in one hierarchical tree that represents the gating hierarchy. We have tested this method on target populations defined by manual gating, automated clustering algorithms (k-means for example), and visualization-assisted methods (SPADE and tSNE). We have demonstrated that C2G is able to generate gating sequences that capture cell populations defined by the various clustering strategies, and robust to over-clustered and overlapping target populations. © 2018 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- Xingyu Yang
- Department of Biology, Georgia Institute of Technology, Atlanta, Georgia
| | - Peng Qiu
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia
| |
Collapse
|
48
|
Cheng Y, Dundar M, Mohler G. A coupled ETAS-I2GMM point process with applications to seismic fault detection. Ann Appl Stat 2018. [DOI: 10.1214/18-aoas1134] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
49
|
Lee AJ, Chang I, Burel JG, Lindestam Arlehamn CS, Mandava A, Weiskopf D, Peters B, Sette A, Scheuermann RH, Qian Y. DAFi: A directed recursive data filtering and clustering approach for improving and interpreting data clustering identification of cell populations from polychromatic flow cytometry data. Cytometry A 2018; 93:597-610. [PMID: 29665244 PMCID: PMC6030426 DOI: 10.1002/cyto.a.23371] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 02/05/2018] [Accepted: 03/15/2018] [Indexed: 11/10/2022]
Abstract
Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and the ClusterR package. For cell population identification, DAFi supports multiple options including clustering, bisecting, slope-based gating, and reversed filtering to meet various autogating needs from different scientific use cases. © 2018 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
| | - Ivan Chang
- J. Craig Venter Institute, La Jolla, California
| | - Julie G. Burel
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | | | | | - Daniela Weiskopf
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, California
| | - Alessandro Sette
- La Jolla Institute for Allergy and Immunology, La Jolla, California
- Department of Medicine, University of California, San Diego, California
| | - Richard H. Scheuermann
- J. Craig Venter Institute, La Jolla, California
- Department of Pathology, University of California, San Diego, California
| | - Yu Qian
- J. Craig Venter Institute, La Jolla, California
| |
Collapse
|
50
|
Quantifying cell densities and biovolumes of phytoplankton communities and functional groups using scanning flow cytometry, machine learning and unsupervised clustering. PLoS One 2018; 13:e0196225. [PMID: 29746500 PMCID: PMC5945019 DOI: 10.1371/journal.pone.0196225] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Accepted: 04/09/2018] [Indexed: 12/04/2022] Open
Abstract
Scanning flow cytometry (SFCM) is characterized by the measurement of time-resolved pulses of fluorescence and scattering, enabling the high-throughput quantification of phytoplankton morphology and pigmentation. Quantifying variation at the single cell and colony level improves our ability to understand dynamics in natural communities. Automated high-frequency monitoring of these communities is presently limited by the absence of repeatable, rapid protocols to analyse SFCM datasets, where images of individual particles are not available. Here we demonstrate a repeatable, semi-automated method to (1) rapidly clean SFCM data from a phytoplankton community by removing signals that do not belong to live phytoplankton cells, (2) classify individual cells into trait clusters that correspond to functional groups, and (3) quantify the biovolumes of individual cells, the total biovolume of the whole community and the total biovolumes of the major functional groups. Our method involves the development of training datasets using lab cultures, the use of an unsupervised clustering algorithm to identify trait clusters, and machine learning tools (random forests) to (1) evaluate variable importance, (2) classify data points, and (3) estimate biovolumes of individual cells. We provide example datasets and R code for our analytical approach that can be adapted for analysis of datasets from other flow cytometers or scanning flow cytometers.
Collapse
|