1
|
Song Z, Shen W, Vannucci M, Baldizon A, Cinciripini PM, Versace F, Guindani M. Clustering computer mouse tracking data with informed hierarchical shrinkage partition priors. Biometrics 2024; 80:ujae124. [PMID: 39475297 PMCID: PMC11523067 DOI: 10.1093/biomtc/ujae124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 08/29/2024] [Accepted: 10/02/2024] [Indexed: 11/02/2024]
Abstract
Mouse-tracking data, which record computer mouse trajectories while participants perform an experimental task, provide valuable insights into subjects' underlying cognitive processes. Neuroscientists are interested in clustering the subjects' responses during computer mouse-tracking tasks to reveal patterns of individual decision-making behaviors and identify population subgroups with similar neurobehavioral responses. These data can be combined with neuroimaging data to provide additional information for personalized interventions. In this article, we develop a novel hierarchical shrinkage partition (HSP) prior for clustering summary statistics derived from the trajectories of mouse-tracking data. The HSP model defines a subjects' cluster as a set of subjects that gives rise to more similar (rather than identical) nested partitions of the conditions. The proposed model can incorporate prior information about the partitioning of either subjects or conditions to facilitate clustering, and it allows for deviations of the nested partitions within each subject group. These features distinguish the HSP model from other bi-clustering methods that typically create identical nested partitions of conditions within a subject group. Furthermore, it differs from existing nested clustering methods, which define clusters based on common parameters in the sampling model and identify subject groups by different distributions. We illustrate the unique features of the HSP model on a mouse tracking dataset from a pilot study and in simulation studies. Our results show the ability and effectiveness of the proposed exploratory framework in clustering and revealing possible different behavioral patterns across subject groups.
Collapse
Affiliation(s)
- Ziyi Song
- Department of Statistics, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697, United States
| | - Weining Shen
- Department of Statistics, Donald Bren School of Information and Computer Sciences
, University of California, Irvine, CA 92697, United States
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, TX 77005, United States
| | - Alexandria Baldizon
- Department of Behavioral Science, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Paul M Cinciripini
- Department of Behavioral Science, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Francesco Versace
- Department of Behavioral Science, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, United States
| | - Michele Guindani
- Department of Biostatistics, UCLA Fielding School of Public Health, University of California, Los Angeles, CA 90095, United States
| |
Collapse
|
2
|
Garcia NS, Du M, Guindani M, McIlvin MR, Moran DM, Saito MA, Martiny AC. Proteome trait regulation of marine Synechococcus elemental stoichiometry under global change. THE ISME JOURNAL 2024; 18:wrae046. [PMID: 38513256 PMCID: PMC11020310 DOI: 10.1093/ismejo/wrae046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/27/2024] [Accepted: 03/19/2024] [Indexed: 03/23/2024]
Abstract
Recent studies have demonstrated regional differences in marine ecosystem C:N:P with implications for carbon and nutrient cycles. Due to strong co-variance, temperature and nutrient stress explain variability in C:N:P equally well. A reductionistic approach can link changes in individual environmental drivers with changes in biochemical traits and cell C:N:P. Thus, we quantified effects of temperature and nutrient stress on Synechococcus chemistry using laboratory chemostats, chemical analyses, and data-independent acquisition mass spectrometry proteomics. Nutrient supply accounted for most C:N:Pcell variability and induced tradeoffs between nutrient acquisition and ribosomal proteins. High temperature prompted heat-shock, whereas thermal effects via the "translation-compensation hypothesis" were only seen under P-stress. A Nonparametric Bayesian Local Clustering algorithm suggested that changes in lipopolysaccharides, peptidoglycans, and C-rich compatible solutes may also contribute to C:N:P regulation. Physiological responses match field-based trends in ecosystem stoichiometry and suggest a hierarchical environmental regulation of current and future ocean C:N:P.
Collapse
Affiliation(s)
- Nathan S Garcia
- Department of Earth System Science, University of California, Irvine, Irvine, CA 92697, United States
| | - Mingyu Du
- Department of Statistics, University of California, Irvine, Irvine, CA 92697, United States
| | - Michele Guindani
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, United States
| | - Matthew R McIlvin
- Marine Chemistry and Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, United States
| | - Dawn M Moran
- Marine Chemistry and Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, United States
| | - Mak A Saito
- Marine Chemistry and Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, United States
| | - Adam C Martiny
- Department of Earth System Science, University of California, Irvine, Irvine, CA 92697, United States
- Department of Ecology and Evolutionary Biology, University of California, Irvine, Irvine, CA 92697, United States
| |
Collapse
|
3
|
Li Y, Bandyopadhyay D, Xie F, Xu Y. BAREB: A Bayesian repulsive biclustering model for periodontal data. Stat Med 2020; 39:2139-2151. [PMID: 32246534 PMCID: PMC7272289 DOI: 10.1002/sim.8536] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 02/12/2020] [Accepted: 03/07/2020] [Indexed: 11/11/2022]
Abstract
Preventing periodontal diseases (PD) and maintaining the structure and function of teeth are important goals for personal oral care. To understand the heterogeneity in patients with diverse PD patterns, we develop a Bayesian repulsive biclustering method that can simultaneously cluster the PD patients and their tooth sites after taking the patient- and site-level covariates into consideration. BAREB uses the determinantal point process prior to induce diversity among different biclusters to facilitate parsimony and interpretability. Since PD progression is hypothesized to be spatially referenced, BAREB factors in the spatial dependence among tooth sites. In addition, since PD is the leading cause for tooth loss, the missing data mechanism is nonignorable. Such nonrandom missingness is incorporated into BAREB. For the posterior inference, we design an efficient reversible jump Markov chain Monte Carlo sampler. Simulation studies show that BAREB is able to accurately estimate the biclusters, and compares favorably to alternatives. For real world application, we apply BAREB to a dataset from a clinical PD study, and obtain desirable and interpretable results. A major contribution of this article is the Rcpp implementation of our methodology, available in the R package BAREB.
Collapse
Affiliation(s)
- Yuliang Li
- Department of Applied Mathematics and Statistics, Johns Hopkins University, MD, U.S.A
| | | | - Fangzheng Xie
- Department of Applied Mathematics and Statistics, Johns Hopkins University, MD, U.S.A
| | - Yanxun Xu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, MD, U.S.A
| |
Collapse
|
4
|
Biclustering of medical monitoring data using a nonparametric hierarchical Bayesian model. Stat (Int Stat Inst) 2020; 9. [DOI: 10.1002/sta4.279] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
5
|
Han S, Zhang H, Sheng W, Arshad H. The nested joint clustering via Dirichlet process mixture model. J STAT COMPUT SIM 2019; 89:815-830. [DOI: 10.1080/00949655.2019.1572756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Shengtong Han
- Joseph J. Zilber School of Public Health, University of Wisconsin, Milwaukee, WI, USA
| | - Hongmei Zhang
- School of Public Health, University of Memphis, Memphis, TN, USA
| | - Wenhui Sheng
- Department of Mathematics, Statistics and Computer Science, Marquette University, Milwaukee, WI, USA
| | - Hasan Arshad
- Allergy and Clinical Immunology, Clinical and Experimental Sciences, University of Southampton, Southampton, UK
| |
Collapse
|
6
|
Burgette LF, Escarce JJ, Paddock SM, Ridgely MS, Wilder WG, Yanagihara D, Damberg CL. Sample selection in the face of design constraints: Use of clustering to define sample strata for qualitative research. Health Serv Res 2018; 54:509-517. [PMID: 30548243 DOI: 10.1111/1475-6773.13100] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
OBJECTIVE To sample 40 physician organizations stratified on the basis of longitudinal cost of care measures for qualitative interviews in order to describe the range of care delivery structures and processes that are being deployed to influence the total costs of caring for patients. DATA SOURCES Three years of physician organization-level total cost of care data (n = 156 in California) from the Integrated Healthcare Association's value-based pay-for-performance program. STUDY DESIGN We fit total cost of care data using mixture and K-means clustering algorithms to segment the population of physician organizations into sampling strata based on 3-year cost trajectories (ie, cost curves). PRINCIPAL FINDINGS A mixture of multivariate normal distributions can classify physician organization cost curves into clusters defined by total cost level, shape, and within-cluster variation. K-means clustering does not accommodate differing levels of within-cluster variation and resulted in more clusters being allocated to unstable cost curves. A mixture of regressions approach focuses overly on anomalous trajectories and is sensitive to model coding. CONCLUSIONS Statistical clustering can be used to form sampling strata when longitudinal measures are of primary interest. Many clustering algorithms are available; the choice of the clustering algorithm can strongly impact the resulting strata because various algorithms focus on different aspects of the observed data.
Collapse
Affiliation(s)
| | - José J Escarce
- University of California at Los Angeles, Los Angeles, California
| | | | | | | | | | | |
Collapse
|
7
|
Zhang H, Zou Y, Terry W, Karmaus W, Arshad H. Joint clustering with correlated variables. AM STAT 2018; 73:296-306. [PMID: 32863387 DOI: 10.1080/00031305.2018.1424033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Traditional clustering methods focus on grouping subjects or (dependent) variables assuming independence between the variables. Clusters formed through these approaches can potentially lack homogeneity. This article proposes a joint clustering method by which both variables and subjects are clustered. In each joint cluster (in general composed of a subset of variables and a subset of subjects), there exists a unique association between dependent variables and covariates of interest. To this end, a Bayesian method is designed, in which a semi-parametric model is used to evaluate any unknown relationships between possibly correlated variables and covariates of interest, and a Dirichlet process is utilized to cluster subjects. Compared to existing clustering techniques, the major novelty of the method exists in its ability to improve the homogeneity of clusters, along with the ability to take the correlations between variables into account. Via simulations, we examine the performance and efficiency of the proposed method. Applying the method to cluster allergens and subjects based on the association of wheal size in reaction to allergens with age, we found that a certain pattern of allergic sensitization to a set of allergens has a potential to reduce the occurrence of asthma.
Collapse
Affiliation(s)
- Hongmei Zhang
- School of Public Health, The University of Memphis, Memphis, TN
| | - Yubo Zou
- Blue Cross Blue Shield of South Carolina, Columbia, SC
| | - Will Terry
- School of Public Health, The University of Memphis, Memphis, TN
| | | | - Hasan Arshad
- University of Southampton Faculty of Medicine, Southampton, UK
| |
Collapse
|
8
|
Zuanetti DA, Müller P, Zhu Y, Yang S, Ji Y. Clustering distributions with the marginalized nested Dirichlet process. Biometrics 2017; 74:584-594. [PMID: 28960246 DOI: 10.1111/biom.12778] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/01/2017] [Accepted: 08/01/2017] [Indexed: 11/30/2022]
Abstract
We introduce a marginal version of the nested Dirichlet process to cluster distributions or histograms. We apply the model to cluster genes by patterns of gene-gene interaction. The proposed approach is based on the nested partition that is implied in the original construction of the nested Dirichlet process. It allows simulation exact inference, as opposed to a truncated Dirichlet process approximation. More importantly, the construction highlights the nature of the nested Dirichlet process as a nested partition of experimental units. We apply the proposed model to inference on clustering genes related to DNA mismatch repair (DMR) by the distribution of gene-gene interactions with other genes. Gene-gene interactions are recorded as coefficients in an auto-logistic model for the co-expression of two genes, adjusting for copy number variation, methylation and protein activation. These coefficients are extracted from an online database, called Zodiac, computed based on The Cancer Genome Atlas (TCGA) data. We compare results with a variation of k-means clustering that is set up to cluster distributions, truncated NDP and a hierarchical clustering method. The proposed inference shows favorable performance, under simulated conditions and also in the real data sets.
Collapse
Affiliation(s)
| | - Peter Müller
- Department of Mathematics, University of Texas, Austin, Texas, U.S.A
| | - Yitan Zhu
- NorthShore University HealthSystem, Evanston, Illinois, U.S.A
| | - Shengjie Yang
- NorthShore University HealthSystem, Evanston, Illinois, U.S.A
| | - Yuan Ji
- NorthShore University HealthSystem, Evanston and University of Chicago, U.S.A
| |
Collapse
|
9
|
Tamminga CA, Pearlson GD, Stan AD, Gibbons RD, Padmanabhan J, Keshavan M, Clementz BA. Strategies for Advancing Disease Definition Using Biomarkers and Genetics: The Bipolar and Schizophrenia Network for Intermediate Phenotypes. BIOLOGICAL PSYCHIATRY: COGNITIVE NEUROSCIENCE AND NEUROIMAGING 2016; 2:20-27. [PMID: 29560884 DOI: 10.1016/j.bpsc.2016.07.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Revised: 05/22/2016] [Accepted: 07/01/2016] [Indexed: 10/21/2022]
Abstract
It is critical for psychiatry as a field to develop approaches to define the molecular, cellular, and circuit basis of its brain diseases, especially for serious mental illnesses, and then to use these definitions to generate biologically based disease categories, as well as to explore disease mechanisms and illness etiologies. Our current reliance on phenomenology is inadequate to support exploration of molecular treatment targets and disease formulations, and the leap directly from phenomenology to disease biology has been limiting because of broad heterogeneity within conventional diagnoses. The questions addressed in this review are formulated around how we can use brain biomarkers to achieve disease categories that are biologically based. We have grouped together a series of vignettes as examples of early approaches, all using the Bipolar and Schizophrenia Network on Intermediate Phenotypes (BSNIP) biomarker database and collaborators, starting off with describing the foundational statistical methods for these goals. We use primarily criterion-free statistics to identify pertinent groups of involved genes related to psychosis as well as symptoms, and finally, to create new biologically based disease cohorts within the psychopathological dimension of psychosis. Although we do not put these results forward as final formulations, they represent a novel effort to rely minimally on phenomenology as a diagnostic tool and to fully embrace brain characteristics of structure, as well as molecular and cellular characteristics and function, to support disease definition in psychosis.
Collapse
Affiliation(s)
- Carol A Tamminga
- Department of Psychiatry, UT Southwestern Medical School, Dallas, Texas.
| | | | - Ana D Stan
- Department of Psychiatry, UT Southwestern Medical School, Dallas, Texas
| | - Robert D Gibbons
- Center for Health Statistics, University of Chicago School of Medicine, Chicago, Illinois
| | - Jaya Padmanabhan
- Department of Psychiatry, Beth Israel and Women's Hospital, Harvard University, Boston, Massachusetts
| | - Matcheri Keshavan
- Department of Psychiatry, Beth Israel and Women's Hospital, Harvard University, Boston, Massachusetts
| | - Brett A Clementz
- Department of Psychology, University of Georgia, Athens, Georgia
| |
Collapse
|
10
|
Lee J, Müller P, Zhu Y, Ji Y. A Nonparametric Bayesian Model for Nested Clustering. Methods Mol Biol 2016; 1362:129-41. [PMID: 26519174 DOI: 10.1007/978-1-4939-3106-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
We propose a nonparametric Bayesian model for clustering where clusters of experimental units are determined by a shared pattern of clustering another set of experimental units. The proposed model is motivated by the analysis of protein activation data, where we cluster proteins such that all proteins in one cluster give rise to the same clustering of patients. That is, we define clusters of proteins by the way that patients group with respect to the corresponding protein activations. This is in contrast to (almost) all currently available models that use shared parameters in the sampling model to define clusters. This includes in particular model based clustering, Dirichlet process mixtures, product partition models, and more. We show results for two typical biostatistical inference problems that give rise to clustering.
Collapse
Affiliation(s)
- Juhee Lee
- Department of Applied Mathematics and Statistics, UC Santa Cruz, Santa Cruz, CA, USA.
| | - Peter Müller
- Department of Mathematics, UT Austin, Austin, TX, USA
| | - Yitan Zhu
- Program for Computational Genomics and Medicine Research Institute, NorthShore University HealthSystem, Evanston, IL, USA
| | - Yuan Ji
- Department of Health Studies, The University of Chicago, Chicago, IL, USA
| |
Collapse
|
11
|
Guha S, Baladandayuthapani V. A nonparametric Bayesian technique for high-dimensional regression. Electron J Stat 2016. [DOI: 10.1214/16-ejs1184] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
Xu Y, Lee J, Yuan Y, Mitra R, Liang S, Müller P, Ji Y. Nonparametric Bayesian Bi-Clustering for Next Generation Sequencing Count Data. BAYESIAN ANALYSIS 2013; 8:759-780. [PMID: 26246865 PMCID: PMC4523245 DOI: 10.1214/13-ba822] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Histone modifications (HMs) play important roles in transcription through post-translational modifications. Combinations of HMs, known as chromatin signatures, encode specific messages for gene regulation. We therefore expect that inference on possible clustering of HMs and an annotation of genomic locations on the basis of such clustering can contribute new insights about the functions of regulatory elements and their relationships to combinations of HMs. We propose a nonparametric Bayesian local clustering Poisson model (NoB-LCP) to facilitate posterior inference on two-dimensional clustering of HMs and genomic locations. The NoB-LCP clusters HMs into HM sets and lets each HM set define its own clustering of genomic locations. Furthermore, it probabilistically excludes HMs and genomic locations that are irrelevant to clustering. By doing so, the proposed model effectively identifies important sets of HMs and groups regulatory elements with similar functionality based on HM patterns.
Collapse
Affiliation(s)
- Yanxun Xu
- Department of Statistics, Rice University, Houston, TX, U.S.A. ; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, U.S.A
| | - Juhee Lee
- Department of Statistics, The Ohio State University, Columbus, Ohio, U.S.A
| | - Yuan Yuan
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, TX, U.S.A
| | - Riten Mitra
- Department of Mathematics, University of Texas Austin, Austin, TX, U.S.A
| | - Shoudan Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, U.S.A
| | - Peter Müller
- Department of Mathematics, University of Texas Austin, Austin, TX, U.S.A
| | - Yuan Ji
- NorthShore University HealthSystem, Chicago, IL, U.S.A
| |
Collapse
|