101
|
Sasidharan K, Tomita M, Aon M, Lloyd D, Murray DB. Time-structure of the yeast metabolism in vivo. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2012; 736:359-79. [PMID: 22161340 DOI: 10.1007/978-1-4419-7210-1_21] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
All previous studies on the yeast metabolome have yielded a plethora of information on the components, function and organisation of low molecular mass and macromolecular components involved in the cellular metabolic network. Here we emphasise that an understanding of the global dynamics of the metabolome in vivo requires elucidation of the temporal dynamics of metabolic processes on many time-scales. We illustrate this using the 40 min oscillation in respiratory activity displayed in auto-synchronous continuously grown cultures of Saccharomyces cerevisiae, where respiration cycles between a phase of increased respiration (oxidative phase) and decreased respiration (reductive phase). Thereby an ultradian clock, i.e. a timekeeping device that runs through many cycles during one day, is involved in the co-ordination of the vast majority of events and processes in yeast. Through continuous online measurements, we first show that mitochondrial and redox physiology are intertwined to produce the temporal landscape on which cellular events occur. Next we look at the higher order processes of DNA duplication and mitochondrial structure to reveal that both events are choreographed during the respiratory cycles. Furthermore, spectral analysis using the discrete Fourier transformation of high-resolution (10 Hz) time-series of NAD(P)H confirms the existence of higher frequency components of biological origin and that these follow a scale-free architecture even in stable oscillating modes. A different signal-processing approach using discrete wavelet transformations (DWT) indicates that there is a significant contribution to the overall signal from ` ~5, ~ 10 and ~ 20-minutes cycles and the amplitudes of these cycles are phase-dependent. Further investigation (derivative of Gaussian continuous wavelet transformation) reveals that the observed 20-minutes cycles are actually confined to the reductive phase and consist of two ~15-minutes cycles. Moreover, the 5 and 10-minutes cycles are restricted to the oxidative phase of the cycle. The mitochondrial origin of these signals was confirmed by pulse-injection of the cytochrome c oxidase inhibitor H(2)S. We next discuss how these multi-oscillatory states can impinge on the apparently complex reactome (represented as a phase diagram of 1,650 chemical species that show oscillatory behaviour). We conclude that biological processes can be considerably more comprehensible when dynamic in vivo time-structure is taken into account.
Collapse
Affiliation(s)
- Kalesh Sasidharan
- Institute for Advanced Biosciences, Keio University, Nipponkoku 403-1, Daihouji, Tsuruoka City, Yamagata 997-0017, Japan.
| | | | | | | | | |
Collapse
|
102
|
Lo K, Gottardo R. Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution. STATISTICS AND COMPUTING 2012; 22:33-52. [PMID: 22125375 PMCID: PMC3223965 DOI: 10.1007/s11222-010-9204-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.
Collapse
Affiliation(s)
- Kenneth Lo
- Department of Microbiology, University of Washington, Seattle, WA, USA
| | - Raphael Gottardo
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| |
Collapse
|
103
|
Ribalet F, Schruth DM, Armbrust EV. flowPhyto: enabling automated analysis of microscopic algae from continuous flow cytometric data. Bioinformatics 2011; 27:732-3. [PMID: 21208987 DOI: 10.1093/bioinformatics/btr003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Flow cytometry is a widely used technique among biologists to study the abundances of populations of microscopic algae living in aquatic environments. A new generation of high-frequency flow cytometers collects up to several hundred samples per day and can run continuously for several weeks. Automated computational methods are needed to analyze the different phytoplankton populations present in each sample. Software packages in the programming environment R provide powerful tools for conducting such analyses. RESULTS We introduce flowPhyto, an R package that performs aggregate statistics on virtually unlimited collections of raw flow cytometry files and provides a memory efficient, parallelized solution for analyzing high-throughput flow cytometric data. AVAILABILITY Freely accessible at http://www.bioconductor.org.
Collapse
Affiliation(s)
- Francois Ribalet
- School of Oceanography, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
104
|
Finak G, Perez JM, Weng A, Gottardo R. Optimizing transformations for automated, high throughput analysis of flow cytometry data. BMC Bioinformatics 2010; 11:546. [PMID: 21050468 PMCID: PMC3243046 DOI: 10.1186/1471-2105-11-546] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 11/04/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In a high throughput setting, effective flow cytometry data analysis depends heavily on proper data preprocessing. While usual preprocessing steps of quality assessment, outlier removal, normalization, and gating have received considerable scrutiny from the community, the influence of data transformation on the output of high throughput analysis has been largely overlooked. Flow cytometry measurements can vary over several orders of magnitude, cell populations can have variances that depend on their mean fluorescence intensities, and may exhibit heavily-skewed distributions. Consequently, the choice of data transformation can influence the output of automated gating. An appropriate data transformation aids in data visualization and gating of cell populations across the range of data. Experience shows that the choice of transformation is data specific. Our goal here is to compare the performance of different transformations applied to flow cytometry data in the context of automated gating in a high throughput, fully automated setting. We examine the most common transformations used in flow cytometry, including the generalized hyperbolic arcsine, biexponential, linlog, and generalized Box-Cox, all within the BioConductor flowCore framework that is widely used in high throughput, automated flow cytometry data analysis. All of these transformations have adjustable parameters whose effects upon the data are non-intuitive for most users. By making some modelling assumptions about the transformed data, we develop maximum likelihood criteria to optimize parameter choice for these different transformations. RESULTS We compare the performance of parameter-optimized and default-parameter (in flowCore) data transformations on real and simulated data by measuring the variation in the locations of cell populations across samples, discovered via automated gating in both the scatter and fluorescence channels. We find that parameter-optimized transformations improve visualization, reduce variability in the location of discovered cell populations across samples, and decrease the misclassification (mis-gating) of individual events when compared to default-parameter counterparts. CONCLUSIONS Our results indicate that the preferred transformation for fluorescence channels is a parameter- optimized biexponential or generalized Box-Cox, in accordance with current best practices. Interestingly, for populations in the scatter channels, we find that the optimized hyperbolic arcsine may be a better choice in a high-throughput setting than current standard practice of no transformation. However, generally speaking, the choice of transformation remains data-dependent. We have implemented our algorithm in the BioConductor package, flowTrans, which is publicly available.
Collapse
Affiliation(s)
- Greg Finak
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, 1100 Fariview Ave N, Seattle, WA 98109, USA.
| | | | | | | |
Collapse
|
105
|
Sugár IP, Sealfon SC. Misty Mountain clustering: application to fast unsupervised flow cytometry gating. BMC Bioinformatics 2010; 11:502. [PMID: 20932336 PMCID: PMC2967560 DOI: 10.1186/1471-2105-11-502] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2009] [Accepted: 10/09/2010] [Indexed: 11/26/2022] Open
Abstract
Background There are many important clustering questions in computational biology for which no satisfactory method exists. Automated clustering algorithms, when applied to large, multidimensional datasets, such as flow cytometry data, prove unsatisfactory in terms of speed, problems with local minima or cluster shape bias. Model-based approaches are restricted by the assumptions of the fitting functions. Furthermore, model based clustering requires serial clustering for all cluster numbers within a user defined interval. The final cluster number is then selected by various criteria. These supervised serial clustering methods are time consuming and frequently different criteria result in different optimal cluster numbers. Various unsupervised heuristic approaches that have been developed such as affinity propagation are too expensive to be applied to datasets on the order of 106 points that are often generated by high throughput experiments. Results To circumvent these limitations, we developed a new, unsupervised density contour clustering algorithm, called Misty Mountain, that is based on percolation theory and that efficiently analyzes large data sets. The approach can be envisioned as a progressive top-down removal of clouds covering a data histogram relief map to identify clusters by the appearance of statistically distinct peaks and ridges. This is a parallel clustering method that finds every cluster after analyzing only once the cross sections of the histogram. The overall run time for the composite steps of the algorithm increases linearly by the number of data points. The clustering of 106 data points in 2D data space takes place within about 15 seconds on a standard laptop PC. Comparison of the performance of this algorithm with other state of the art automated flow cytometry gating methods indicate that Misty Mountain provides substantial improvements in both run time and in the accuracy of cluster assignment. Conclusions Misty Mountain is fast, unbiased for cluster shape, identifies stable clusters and is robust to noise. It provides a useful, general solution for multidimensional clustering problems. We demonstrate its suitability for automated gating of flow cytometry data.
Collapse
Affiliation(s)
- István P Sugár
- Department of Neurology and Center for Translational Systems Biology, Mount Sinai School of Medicine, New York, NY, USA.
| | | |
Collapse
|
106
|
Hahne F, Khodabakhshi AH, Bashashati A, Wong CJ, Gascoyne RD, Weng AP, Seyfert-Margolis V, Bourcier K, Asare A, Lumley T, Gentleman R, Brinkman RR. Per-channel basis normalization methods for flow cytometry data. Cytometry A 2010; 77:121-31. [PMID: 19899135 DOI: 10.1002/cyto.a.20823] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Between-sample variation in high-throughput flow cytometry data poses a significant challenge for analysis of large-scale data sets, such as those derived from multicenter clinical trials. It is often hard to match biologically relevant cell populations across samples because of technical variation in sample acquisition and instrumentation differences. Thus, normalization of data is a critical step before analysis, particularly in large-scale data sets from clinical trials, where group-specific differences may be subtle and patient-to-patient variation common. We have developed two normalization methods that remove technical between-sample variation by aligning prominent features (landmarks) in the raw data on a per-channel basis. These algorithms were tested on two independent flow cytometry data sets by comparing manually gated data, either individually for each sample or using static gating templates, before and after normalization. Our results show a marked improvement in the overlap between manual and static gating when the data are normalized, thereby facilitating the use of automated analyses on large flow cytometry data sets. Such automated analyses are essential for high-throughput flow cytometry.
Collapse
Affiliation(s)
- Florian Hahne
- Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
107
|
Zare H, Shooshtari P, Gupta A, Brinkman RR. Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics 2010; 11:403. [PMID: 20667133 PMCID: PMC2923634 DOI: 10.1186/1471-2105-11-403] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2009] [Accepted: 07/28/2010] [Indexed: 02/08/2023] Open
Abstract
Background Recent biological discoveries have shown that clustering large datasets is essential for better understanding biology in many areas. Spectral clustering in particular has proven to be a powerful tool amenable for many applications. However, it cannot be directly applied to large datasets due to time and memory limitations. To address this issue, we have modified spectral clustering by adding an information preserving sampling procedure and applying a post-processing stage. We call this entire algorithm SamSPECTRAL. Results We tested our algorithm on flow cytometry data as an example of large, multidimensional data containing potentially hundreds of thousands of data points (i.e., "events" in flow cytometry, typically corresponding to cells). Compared to two state of the art model-based flow cytometry clustering methods, SamSPECTRAL demonstrates significant advantages in proper identification of populations with non-elliptical shapes, low density populations close to dense ones, minor subpopulations of a major population and rare populations. Conclusions This work is the first successful attempt to apply spectral methodology on flow cytometry data. An implementation of our algorithm as an R package is freely available through BioConductor.
Collapse
Affiliation(s)
- Habil Zare
- Terry Fox Laboratory, BC Cancer Agency, 675 W 10th Ave, Vancouver, BC, Canada
| | | | | | | |
Collapse
|
108
|
Naumann U, Luta G, Wand MP. The curvHDR method for gating flow cytometry samples. BMC Bioinformatics 2010; 11:44. [PMID: 20096119 PMCID: PMC2832899 DOI: 10.1186/1471-2105-11-44] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2009] [Accepted: 01/22/2010] [Indexed: 11/16/2022] Open
Abstract
Background High-throughput flow cytometry experiments produce hundreds of large multivariate samples of cellular characteristics. These samples require specialized processing to obtain clinically meaningful measurements. A major component of this processing is a form of cell subsetting known as gating. Manual gating is time-consuming and subjective. Good automatic and semi-automatic gating algorithms are very beneficial to high-throughput flow cytometry. Results We develop a statistical procedure, named curvHDR, for automatic and semi-automatic gating. The method combines the notions of significant high negative curvature regions and highest density regions and has the ability to adapt well to human-perceived gates. The underlying principles apply to dimension of arbitrary size, although we focus on dimensions up to three. Accompanying software, compatible with contemporary flow cytometry infor-matics, is developed. Conclusion The method is seen to adapt well to nuances in the data and, to a reasonable extent, match human perception of useful gates. It offers big savings in human labour when processing high-throughput flow cytometry data whilst retaining a good degree of efficacy.
Collapse
Affiliation(s)
- Ulrike Naumann
- School of Mathematics and Applied Statistics, The University of New South Wales, Sydney, Australia
| | | | | |
Collapse
|
109
|
Stanton RA, Escobar S, Elliott GS. A software framework enabling analysis of plate-based flow cytometry data for high-throughput screening. Assay Drug Dev Technol 2009; 8:228-37. [PMID: 20035617 DOI: 10.1089/adt.2009.0227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Flow cytometry (FCM) is an important technology with a broad spectrum of applications ranging from basic research to clinical diagnostics. In a typical FCM experiment, thousands of cells are queried with respect to size, shape, and abundance of multiple cell surface antigens. Recent advances in FCM techniques and instrumentation have enabled researchers to raise the throughput of experimentation dramatically. However, data analysis has remained a time-consuming activity requiring significant manual intervention for gating as well as for overall data reduction and interpretation. Presented in this article is a novel, algorithmically flexible, internally developed, software framework for the analysis of plate-based FCM data for high-throughput screening (HTS). Utilizing a post-treatment pooling strategy, >87,000 individual wells representing over 240,000 compounds were automatically gated, percent of control (POC) calculated, results assembled, deconvolved, and sorted, allowing researchers to visually assess wells of interest in minutes.
Collapse
Affiliation(s)
- Rick A Stanton
- Chemistry Research and Discovery, Amgen Inc., Thousand Oaks, California 91320, USA.
| | | | | |
Collapse
|
110
|
Analysis of High-Throughput Flow Cytometry Data Using plateCore. Adv Bioinformatics 2009:356141. [PMID: 19956418 PMCID: PMC2777006 DOI: 10.1155/2009/356141] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 07/21/2009] [Indexed: 11/18/2022] Open
Abstract
Flow cytometry (FCM) software packages from R/Bioconductor, such as flowCore and flowViz, serve as an open platform for development of new analysis tools and methods. We created plateCore, a new package that extends the functionality in these core packages to enable automated negative control-based gating and make the processing and analysis of plate-based data sets from high-throughput FCM screening experiments easier. plateCore was used to analyze data from a BD FACS CAP screening experiment where five Peripheral Blood Mononucleocyte Cell (PBMC) samples were assayed for 189 different human cell surface markers. This same data set was also manually analyzed by a cytometry expert using the FlowJo data analysis software package (TreeStar, USA). We show that the expression values for markers characterized using the automated approach in plateCore are in good agreement with those from FlowJo, and that using plateCore allows for more reproducible analyses of FCM screening data.
Collapse
|
111
|
Bridging the Divide between Manual Gating and Bioinformatics with the Bioconductor Package flowFlowJo. Adv Bioinformatics 2009:809469. [PMID: 19956421 PMCID: PMC2775689 DOI: 10.1155/2009/809469] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Revised: 06/19/2009] [Accepted: 07/19/2009] [Indexed: 11/30/2022] Open
Abstract
In flow cytometry, different cell types are usually selected or “gated” by a series of 1- or 2-dimensional geometric subsets of the measurements made on each cell. This is easily accomplished in commercial flow cytometry packages but it is difficult to work computationally with the results of this process. The ability to retrieve the results and work with both them and the raw data is critical; our experience points to the importance of bioinformatics tools that will allow us to examine gating robustness, combine manual and automated gating, and perform exploratory data analysis. To provide this capability, we have developed a Bioconductor package called flowFlowJo that can import gates defined by the commercial package FlowJo and work with them in a manner consistent with the other flow packages in Bioconductor. We present this package and illustrate some of the ways in which it can be used.
Collapse
|