1
|
Ito T, Sugasawa S. Grouped generalized estimating equations for longitudinal data analysis. Biometrics 2023; 79:1868-1879. [PMID: 35819419 DOI: 10.1111/biom.13718] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 07/05/2022] [Indexed: 11/29/2022]
Abstract
Generalized estimating equation (GEE) is widely adopted for regression modeling for longitudinal data, taking account of potential correlations within the same subjects. Although the standard GEE assumes common regression coefficients among all the subjects, such an assumption may not be realistic when there is potential heterogeneity in regression coefficients among subjects. In this paper, we develop a flexible and interpretable approach, called grouped GEE analysis, to modeling longitudinal data with allowing heterogeneity in regression coefficients. The proposed method assumes that the subjects are divided into a finite number of groups and subjects within the same group share the same regression coefficient. We provide a simple algorithm for grouping subjects and estimating the regression coefficients simultaneously, and show the asymptotic properties of the proposed estimator. The number of groups can be determined by the cross validation with averaging method. We demonstrate the proposed method through simulation studies and an application to a real data set.
Collapse
Affiliation(s)
- Tsubasa Ito
- Faculty of Economics and Business, Hokkaido University, Hokkaido, Japan
| | - Shonosuke Sugasawa
- Center for Spatial Information Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
2
|
Zhou J, Zhang Y, Tu W. clusterMLD: An Efficient Hierarchical Clustering Method for Multivariate Longitudinal Data. J Comput Graph Stat 2023; 32:1131-1144. [PMID: 37859643 PMCID: PMC10584088 DOI: 10.1080/10618600.2022.2149540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 11/11/2022] [Indexed: 11/24/2022]
Abstract
Longitudinal data clustering is challenging because the grouping has to account for the similarity of individual trajectories in the presence of sparse and irregular times of observation. This paper puts forward a hierarchical agglomerative clustering method based on a dissimilarity metric that quantifies the cost of merging two distinct groups of curves, which are depicted by B-splines for the repeatedly measured data. Extensive simulations show that the proposed method has superior performance in determining the number of clusters, classifying individuals into the correct clusters, and in computational efficiency. Importantly, the method is not only suitable for clustering multivariate longitudinal data with sparse and irregular measurements but also for intensely measured functional data. Towards this end, we provide an R package for the implementation of such analyses. To illustrate the use of the proposed clustering method, two large clinical data sets from real-world clinical studies are analyzed.
Collapse
Affiliation(s)
- Junyi Zhou
- Department of Biostatistics and Health Data Science, Indiana University
| | - Ying Zhang
- Department of Biostatistics, University of Nebraska Medical Center
| | - Wanzhu Tu
- Department of Biostatistics and Health Data Science, Indiana University
| |
Collapse
|
3
|
Yu S, Liu J. Ensemble calibration model of near-infrared spectroscopy based on functional data analysis. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2022; 280:121569. [PMID: 35780759 DOI: 10.1016/j.saa.2022.121569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 05/26/2022] [Accepted: 06/25/2022] [Indexed: 06/15/2023]
Abstract
As a nondestructive detection technology, near-infrared spectroscopy has been widely applied in various fields. With the wide application of near-infrared spectroscopy, the research on data processing has attracted more attention. Different from the existing discrete data model and based on the functional data analysis method, an ensemble calibration model FDA-EM-PLS (functional data analysis-ensemble learning-partial least squares) of near-infrared spectroscopy is proposed in this paper. Firstly, the near-infrared spectroscopy of each sample is divided into several intervals, and the functional data analysis is carried out on each interval. Then, the samples are clustered according to the generated functions, which can not only reduce the influence of noise, but also provide a theoretical basis for selecting variables. Further, Monte Carlo sampling is used to generate training subsets from clustering samples for ensemble learning, which not only solves the problem of small samples, but also improves the robustness of the model. The relevant experimental results show that the absolute relative error of FDA-EM-PLS for the corn and soil data are both less than 10%.
Collapse
Affiliation(s)
- Shaohui Yu
- School of Mathematics and Statistics, Hefei Normal University, Hefei 230061, China
| | - Jing Liu
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei 230031, China.
| |
Collapse
|
4
|
Iorio C, Frasso G, D’Ambrosio A, Siciliano R. Boosted-oriented probabilistic smoothing-spline clustering of series. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-022-00665-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
AbstractFuzzy clustering methods allow the objects to belong to several clusters simultaneously, with different degrees of membership. However, a factor that influences the performance of fuzzy algorithms is the value of fuzzifier parameter. In this paper, we propose a fuzzy clustering procedure for data (time) series that does not depend on the definition of a fuzzifier parameter. It comes from two approaches, theoretically motivated for unsupervised and supervised classification cases, respectively. The first is the Probabilistic Distance clustering procedure. The second is the well known Boosting philosophy. Our idea is to adopt a boosting prospective for unsupervised learning problems, in particular we face with non hierarchical clustering problems. The global performance of the proposed method is investigated by various experiments.
Collapse
|
5
|
Wang T, Yu L, Leurgans SE, Wilson RS, Bennett DA, Boyle PA. Conditional functional clustering for longitudinal data with heterogeneous nonlinear patterns. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Tianhao Wang
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | - Lei Yu
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | - Sue E. Leurgans
- Rush Alzheimer’s Disease Center, Rush University Medical Center
| | | | | | | |
Collapse
|
6
|
Fang K, Chen Y, Ma S, Zhang Q. Biclustering analysis of functionals via penalized fusion. J MULTIVARIATE ANAL 2022; 189:104874. [PMID: 36817965 PMCID: PMC9937451 DOI: 10.1016/j.jmva.2021.104874] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the "natural next step" and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Kuangnan Fang
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Yuanxing Chen
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, United States of America
| | - Qingzhao Zhang
- MOE Key Laboratory of Econometrics, Department of Statistics and Data Science, School of Economics, Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University, China,Corresponding author. (Q. Zhang)
| |
Collapse
|
7
|
Exploring the longitudinal dynamics of herd BVD antibody test results using model-based clustering. Sci Rep 2019; 9:11353. [PMID: 31388019 PMCID: PMC6684638 DOI: 10.1038/s41598-019-47339-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 07/15/2019] [Indexed: 11/08/2022] Open
Abstract
Determining the Bovine Viral Diarrhoea (BVD) infection status of cattle herds is a challenge for control and eradication schemes. Given the changing dynamics of BVD virus (BVDV) antibody responses in cattle, classifying herds based on longitudinal changes in the results of BVDV antibody tests could offer a novel, complementary approach to categorising herds that is less likely than the present system to result in a herd's status changing from year to year, as it is more likely to capture the true exposure dynamics of the farms. This paper describes the dynamics of BVDV antibody test values (measured as percentage positivity (PP)) obtained from 15,500 bovines between 2007 and 2010 from thirty nine cattle herds located in Scotland and Northern England. It explores approaches of classifying herds based on trend, magnitude and shape of their antibody PP trajectories and investigates the epidemiological similarities between farms within the same cluster. Gaussian mixture models were used for the magnitude and shape clustering. Epidemiologically meaningful clusters were obtained. Farm cluster membership depends on clustering approach used. Moderate concordance was found between the shape and magnitude clusters. These methods hold potential for application to enhance control efforts for BVD and other infectious livestock diseases.
Collapse
|
8
|
Meng Y, Liang J, Cao F, He Y. A new distance with derivative information for functional k-means clustering algorithm. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.06.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
9
|
Libbrecht R, Oxley PR, Kronauer DJC. Clonal raider ant brain transcriptomics identifies candidate molecular mechanisms for reproductive division of labor. BMC Biol 2018; 16:89. [PMID: 30103762 PMCID: PMC6090591 DOI: 10.1186/s12915-018-0558-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 07/31/2018] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Division of labor between reproductive queens and workers that perform brood care is a hallmark of insect societies. However, studies of the molecular basis of this fundamental dichotomy are limited by the fact that the caste of an individual cannot typically be experimentally manipulated at the adult stage. Here we take advantage of the unique biology of the clonal raider ant, Ooceraea biroi, to study brain gene expression dynamics during experimentally induced transitions between reproductive and brood care behavior. RESULTS Introducing larvae that inhibit reproduction and induce brood care behavior causes much faster changes in adult gene expression than removing larvae. In addition, the general patterns of gene expression differ depending on whether ants transition from reproduction to brood care or vice versa, indicating that gene expression changes between phases are cyclic rather than pendular. Finally, we identify genes that could play upstream roles in regulating reproduction and behavior because they show large and early expression changes in one or both transitions. CONCLUSIONS Our analyses reveal that the nature and timing of gene expression changes differ substantially depending on the direction of the transition, and identify a suite of promising candidate molecular regulators of reproductive division of labor that can now be characterized further in both social and solitary animal models. This study contributes to understanding the molecular regulation of reproduction and behavior, as well as the organization and evolution of insect societies.
Collapse
Affiliation(s)
- Romain Libbrecht
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Johannes-von-Müller-Weg 6, 55128, Mainz, Germany.
| | - Peter R Oxley
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA
- Samuel J. Wood Library, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
| | - Daniel J C Kronauer
- Laboratory of Social Evolution and Behavior, The Rockefeller University, 1230 York Avenue, New York, NY, 10065, USA.
| |
Collapse
|
10
|
Dawson M, Müller HG. Dynamic Modeling of Conditional Quantile Trajectories, With Application to Longitudinal Snippet Data. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1356321] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Matthew Dawson
- Graduate Group in Biostatistics, University of California, Davis, Davis, CA
| | - Hans-Georg Müller
- Department of Statistics, University of California, Davis, Davis, CA
| |
Collapse
|
11
|
Trevisani M, Tuzzi A. Learning the evolution of disciplines from scientific literature: A functional clustering approach to normalized keyword count trajectories. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.01.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
12
|
|
13
|
Abstract
In the present paper, we introduce k-centres functional clustering ( k-centres FC), a person-centered method that clusters people with similar patterns of complex, highly nonlinear change over time. We review fundamentals of the methodology and argue how it addresses some of the limitations of the traditional approaches to modeling repeated measures data. The usefulness of k-centres FC is demonstrated by applying the method to weekly measured commitment data from 109 participants who reported psychological contract breach events. The k-centres FC analysis shows two substantively meaningful clusters, the first cluster showing reaction patterns with general growth in commitment after breach and the second cluster showing general decline in commitment after breach. Further, the reaction patterns in the second cluster appear to be the result of a combination of two interesting reaction logics: immediate and delayed reactions. We conclude by outlining how future organizational research can incorporate this methodology.
Collapse
|
14
|
Hamel S, Yoccoz NG, Gaillard JM. Assessing variation in life-history tactics within a population using mixture regression models: a practical guide for evolutionary ecologists. Biol Rev Camb Philos Soc 2016; 92:754-775. [PMID: 26932678 DOI: 10.1111/brv.12254] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 12/21/2015] [Accepted: 01/08/2016] [Indexed: 02/06/2023]
Abstract
Mixed models are now well-established methods in ecology and evolution because they allow accounting for and quantifying within- and between-individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi-modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life-history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life-history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long-term studies of large mammals to illustrate the potential of using mixture models for assessing within-population variation in life-history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life-history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users.
Collapse
Affiliation(s)
- Sandra Hamel
- Faculty of Biosciences, Fisheries and Economics, Department of Arctic and Marine Biology, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Nigel G Yoccoz
- Faculty of Biosciences, Fisheries and Economics, Department of Arctic and Marine Biology, UiT The Arctic University of Norway, 9037 Tromsø, Norway
| | - Jean-Michel Gaillard
- CNRS, UMR 5558 'Biométrie et Biologie Evolutive', Université de Lyon, Université Lyon 1, F-69622, Villeurbanne, France
| |
Collapse
|
15
|
Böhning D, Hennig C, McLachlan GJ, McNicholas PD. The 2nd special issue on advances in mixture models. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.10.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|