1
|
Offorha BC, Walters SJ, Jacques RM. Analysing cluster randomised controlled trials using GLMM, GEE1, GEE2, and QIF: results from four case studies. BMC Med Res Methodol 2023; 23:293. [PMID: 38093221 PMCID: PMC10717070 DOI: 10.1186/s12874-023-02107-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 11/17/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Using four case studies, we aim to provide practical guidance and recommendations for the analysis of cluster randomised controlled trials. METHODS Four modelling approaches (Generalized Linear Mixed Models with parameters estimated by maximum likelihood/restricted maximum likelihood; Generalized Linear Models with parameters estimated by Generalized Estimating Equations (1st order or second order) and Quadratic Inference Function, for analysing correlated individual participant level outcomes in cluster randomised controlled trials were identified after we reviewed the literature. We systematically searched the online bibliography databases of MEDLINE, EMBASE, PsycINFO (via OVID), CINAHL (via EBSCO), and SCOPUS. We identified the above-mentioned four statistical analytical approaches and applied them to four case studies of cluster randomised controlled trials with the number of clusters ranging from 10 to 100, and individual participants ranging from 748 to 9,207. Results were obtained for both continuous and binary outcomes using R and SAS statistical packages. RESULTS The intracluster correlation coefficient (ICC) estimates for the case studies were less than 0.05 and are consistent with the observed ICC values commonly reported in primary care and community-based cluster randomised controlled trials. In most cases, the four methods produced similar results. However, in a few analyses, quadratic inference function produced different results compared to the generalized linear mixed model, first-order generalized estimating equations, and second-order generalized estimating equations, especially in trials with small to moderate numbers of clusters. CONCLUSION This paper demonstrates the analysis of cluster randomised controlled trials with four modelling approaches. The results obtained were similar in most cases, however, for trials with few clusters we do recommend that the quadratic inference function should be used with caution, and where possible a small sample correction should be used. The generalisability of our results is limited to studies with similar features to our case studies, for example, studies with a similar-sized ICC. It is important to conduct simulation studies to comprehensively evaluate the performance of the four modelling approaches.
Collapse
Affiliation(s)
- Bright C Offorha
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK.
| | - Stephen J Walters
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK
| | - Richard M Jacques
- Division of Population Health, School of Medicine & Population Health, University of Sheffield, Sheffield, UK
| |
Collapse
|
2
|
Hector EC, Song PXK. Joint integrative analysis of multiple data sources with correlated vector outcomes. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
3
|
Wang H, Zhang J, Klump KL, Alexandra Burt S, Cui Y. Multivariate partial linear varying coefficients model for gene-environment interactions with multiple longitudinal traits. Stat Med 2022; 41:3643-3660. [PMID: 35582816 PMCID: PMC9308731 DOI: 10.1002/sim.9440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 04/26/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022]
Abstract
Correlated phenotypes often share common genetic determinants. Thus, a multi‐trait analysis can potentially increase association power and help in understanding pleiotropic effect. When multiple traits are jointly measured over time, the correlation information between multivariate longitudinal responses can help to gain power in association analysis, and the longitudinal traits can provide insights on the dynamic gene effect over time. In this work, we propose a multivariate partially linear varying coefficients model to identify genetic variants with their effects potentially modified by environmental factors. We derive a testing framework to jointly test the association of genetic factors and illustrated with a bivariate phenotypic trait, while taking the time varying genetic effects into account. We extend the quadratic inference functions to deal with the longitudinal correlations and used penalized splines for the approximation of nonparametric coefficient functions. Theoretical results such as consistency and asymptotic normality of the estimates are established. The performance of the testing procedure is evaluated through Monte Carlo simulation studies. The utility of the method is demonstrated with a real data set from the Twin Study of Hormones and Behavior across the menstrual cycle project, in which single nucleotide polymorphisms associated with emotional eating behavior are identified.
Collapse
Affiliation(s)
- Honglang Wang
- Department of Mathematical Sciences, Indiana University-Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Jingyi Zhang
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA.,Amazon Lab126, Sunnyvale, California, USA
| | - Kelly L Klump
- Department of Psychology, Michigan State University, East Lansing, Michigan, USA
| | - Sybil Alexandra Burt
- Department of Psychology, Michigan State University, East Lansing, Michigan, USA
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
4
|
Yu H, Tong G, Li F. A note on the estimation and inference with quadratic inference functions for correlated outcomes. COMMUN STAT-SIMUL C 2022; 51:6525-6536. [PMID: 36568127 PMCID: PMC9782733 DOI: 10.1080/03610918.2020.1805463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
The quadratic inference function approach is a popular method in the analysis of correlated data. The quadratic inference function is formulated based on multiple sets of score equations (or extended score equations) that over-identify the regression parameters of interest, and improves efficiency over the generalized estimating equations under correlation misspecification. In this note, we provide an alternative solution to the quadratic inference function by separately solving each set of score equations and combining the solutions. We provide an insight that an optimally weighted combination of estimators obtained separately from the distinct sets of score equations is asymptotically equivalent to the estimator obtained via the quadratic inference function. We further establish results on inference for the optimally weighted estimator and extend these insights to the general setting with over-identified estimating equations. A simulation study is carried out to confirm the analytical insights and connections in finite samples.
Collapse
Affiliation(s)
- Hengshi Yu
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A.
| | - Guangyu Tong
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, U.S.A.
| | - Fan Li
- Department of Biostatistics, Yale University, New Haven, Connecticut, U.S.A
| |
Collapse
|
5
|
Lai P, Liang W, Wang F, Zhang Q. Feature screening of quadratic inference functions for ultrahigh dimensional longitudinal data. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1783666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Peng Lai
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing, People's Republic of China
| | - Weijuan Liang
- School of Statistics, Renmin University of China, Beijing, People's Republic of China
| | - Fangjian Wang
- School of Mathematics and Statistics, Nanjing University of Information Science & Technology, Nanjing, People's Republic of China
| | - Qingzhao Zhang
- Department of Statistics, School of Economics, Xiamen University, Xiamen, People's Republic of China
- Key Laboratory of Econometrics, Ministry of Education, Xiamen University, Xiamen, People's Republic of China
- The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, People's Republic of China
| |
Collapse
|
6
|
Zhao M, Gao Y, Cui Y. Variable selection for longitudinal varying coefficient errors-in-variables models. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1801738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Mingtao Zhao
- School of Statistics and Applied Mathematics, Anhui University of Finance & Economics, Bengbu, Anhui, China
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA
| | - Yuzhao Gao
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
7
|
Yu H, Li F, Turner EL. An evaluation of quadratic inference functions for estimating intervention effects in cluster randomized trials. Contemp Clin Trials Commun 2020; 19:100605. [PMID: 32728648 PMCID: PMC7381491 DOI: 10.1016/j.conctc.2020.100605] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2020] [Revised: 06/15/2020] [Accepted: 06/28/2020] [Indexed: 01/02/2023] Open
Abstract
Cluster randomized trials (CRTs) usually randomize groups of individuals to interventions, and outcomes are typically measured at the individual level. Marginal intervention effects are frequently of interest in CRTs due to their population-averaged interpretations. Such effects are estimated using generalized estimating equations (GEE), or a recent alternative called the quadratic inference function (QIF). However, the performance of QIF relative to GEE have not been extensively evaluated in the CRT context, especially when the marginal mean model includes additional covariates. Motivated by the HALI trial, we conduct simulation studies to compare the finite-sample operating characteristics of QIF and GEE. We demonstrate that QIF and GEE are equivalent under some conditions. When the marginal mean model includes individual-level covariates, QIF shows an efficiency improvement over GEE with overall larger power, but its test size may be more liberal than GEE and GEE achieves better coverage than QIF. The test size inflation may not by fully addressed from using finite-sample bias corrections. The estimates of QIF tend to be closer to GEE in the HALI data, although the former presents a small standard error. Overall, we confirm that the QIF approach generally has potentially better efficiency than GEE in our simulation studies but might be more cautiously used as a viable approach for the analysis of CRTs. More research is needed, however, to address the finite-sample bias in the variance estimation of the QIF to better control its test size.
Collapse
Affiliation(s)
- Hengshi Yu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109, USA
- Corresponding author.
| | - Fan Li
- Department of Biostatistics, Yale University, New Haven, CT, 06510, USA
- Center for Methods in Implementation and Prevention Science, Yale University, New Haven, CT, 06510, USA
| | - Elizabeth L. Turner
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, 27710, USA
- Duke Global Health Institute, Durham, NC, 27710, USA
| |
Collapse
|
8
|
Zhang X, Li L, Zhou H, Zhou Y, Shen D. TENSOR GENERALIZED ESTIMATING EQUATIONS FOR LONGITUDINAL IMAGING ANALYSIS. Stat Sin 2019; 29:1977-2005. [PMID: 32523321 DOI: 10.5705/ss.202017.0153] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Longitudinal neuroimaging studies are becoming increasingly prevalent, where brain images are collected on multiple subjects at multiple time points. Analyses of such data are scientifically important, but also challenging. Brain images are in the form of multidimensional arrays, or tensors, which are characterized by both ultrahigh dimensionality and a complex structure. Longitudinally repeated images and induced temporal correlations add a further layer of complexity. Despite some recent efforts, there exist very few solutions for longitudinal imaging analyses. In response to the increasing need to analyze longitudinal imaging data, we propose several tensor generalized estimating equations (GEEs). The proposed GEE approach accounts for intra-subject correlation, and an imposed low-rank structure on the coefficient tensor effectively reduces the dimensionality. We also propose a scalable estimation algorithm, establish the asymptotic properties of the solution to the tensor GEEs, and investigate sparsity regularization for the purpose of region selection. We demonstrate the proposed method using simulations and by analyzing a real data set from the Alzheimer's Disease Neuroimaging Initiative.
Collapse
Affiliation(s)
- Xiang Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC 27697, USA
| | - Lexin Li
- Division of Biostatistcs, University of California, Berkeley, CA 94720, USA
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Yeqing Zhou
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, 200433, China
| | - Dinggang Shen
- Department of Radiology, University of North Carolina, Chapel Hill, NC 27599, USA
| | | |
Collapse
|
9
|
Strickland JC, Chen IC, Wang C, Fardo DW. Longitudinal data methods for evaluating genome-by-epigenome interactions in families. BMC Genet 2018; 19:82. [PMID: 30255767 PMCID: PMC6156905 DOI: 10.1186/s12863-018-0642-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Longitudinal measurement is commonly employed in health research and provides numerous benefits for understanding disease and trait progression over time. More broadly, it allows for proper treatment of correlated responses within clusters. We evaluated 3 methods for analyzing genome-by-epigenome interactions with longitudinal outcomes from family data. RESULTS Linear mixed-effect models, generalized estimating equations, and quadratic inference functions were used to test a pharmacoepigenetic effect in 200 simulated posttreatment replicates. Adjustment for baseline outcome provided greater power and more accurate control of Type I error rates than computation of a pre-to-post change score. CONCLUSIONS Comparison of all modeling approaches indicated a need for bias correction in marginal models and similar power for each method, with quadratic inference functions providing a minor decrement in power compared to generalized estimating equations and linear mixed-effects models.
Collapse
Affiliation(s)
- Justin C. Strickland
- Department of Psychology, College of Arts and Sciences, University of Kentucky, 171 Funkhouser Drive, Lexington, KY 40506 USA
| | - I-Chen Chen
- Department of Biostatistics, College of Public Health, University of Kentucky, 725 Rose St, Lexington, KY 40536 USA
| | - Chanung Wang
- Department of Biology, College of Arts and Sciences, University of Kentucky, 334 T.H. Morgan Building, Lexington, KY 40506 USA
| | - David W. Fardo
- Department of Biostatistics, College of Public Health, University of Kentucky, 725 Rose St, Lexington, KY 40536 USA
| |
Collapse
|
10
|
Geraili-Afra Z, Abadi A, Yazdani-Charati J, Gooraji SA, Zarghami M, Saadat S. Comparison of Efficiency GEE and QIF Methods for Predicting Factors Affecting on Bipolar I Disorder Under Complete-case in a Longitudinal Studies. Acta Inform Med 2018; 26:111-114. [PMID: 30061782 PMCID: PMC6029899 DOI: 10.5455/aim.2018.26.111-114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 05/16/2018] [Indexed: 11/03/2022] Open
Abstract
BACKGROUND Mood variation in manic and depression phases during time is common in type I of Bipolar disorder. Analyzing recurrence require to the related statistical methods. In this paper, we compare the two methods of estimating the GEE and the QIF in recurrence data. METHODS In this study, data of 255 patients with Bipolar I disorder hospitalized during years of 2007-2011. Recurrence in Bipolar I disorder was as outcome. Patients' characteristics were gender, age of onset, recurrence history in first degree family, and economic status. Under simulation, percentage of missing were generated to vary and handled by complete-case(cc) strategy. Data were analyzed using GEE and QIF methods. Performance of the methods was assessed using Relative Efficiency. RESULTS QIF method had more efficiency than GEE method in the data with missing /without missing. Odds of recurrence in a first-degree family history was 30% more than those without a family history (p=0.009). Also, odds of recurrence in high/moderate level of economic status was 23% more than low level status (p=0.014). CONCLUSION QIF method was more appropriated for modeling recurrence during time with the structure of more correlation and low dropout rate in data. Family history and economic status were more affected recurrence in type I of Bipolar disorder.
Collapse
Affiliation(s)
- Zahra Geraili-Afra
- Department of Biostatistics & Epidemiology, Babol University of Medical Sciences, Babol, Iran
| | - Alireza Abadi
- Department of Social Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Jamshid Yazdani-Charati
- Department of Biostatistics, Psychiatric Research Center, School of Health, Mazandaran University of Medical Sciences, Sari, Iran
| | - Somayeh Ahmadi Gooraji
- Department of Social Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehran Zarghami
- Department of Psychiatry & Behavioral Sciences Research Center, School of Medicine, Mazandaran University of Medical Sciences, Sari, Iran
| | - Samaneh Saadat
- Research Committee, School of Health, Mazandaran University of Medical Sciences, Sari, Iran
| |
Collapse
|
11
|
Zhao W, Lian H, Bandyopadhyay D. A partially linear additive model for clustered proportion data. Stat Med 2018; 37:1009-1030. [PMID: 29243338 DOI: 10.1002/sim.7573] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2017] [Revised: 08/29/2017] [Accepted: 10/29/2017] [Indexed: 11/09/2022]
Abstract
Proportion data with support lying in the interval [0,1] are a commonplace in various domains of medicine and public health. When these data are available as clusters, it is important to correctly incorporate the within-cluster correlation to improve the estimation efficiency while conducting regression-based risk evaluation. Furthermore, covariates may exhibit a nonlinear relationship with the (proportion) responses while quantifying disease status. As an alternative to various existing classical methods for modeling proportion data (such as augmented Beta regression) that uses maximum likelihood, or generalized estimating equations, we develop a partially linear additive model based on the quadratic inference function. Relying on quasi-likelihood estimation techniques and polynomial spline approximation for unknown nonparametric functions, we obtain the estimators for both parametric part and nonparametric part of our model and study their large-sample theoretical properties. We illustrate the advantages and usefulness of our proposition over other alternatives via extensive simulation studies, and application to a real dataset from a clinical periodontal study.
Collapse
Affiliation(s)
- Weihua Zhao
- School of Science, Nantong University, Nantong, P. R. China
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong
| | | |
Collapse
|
12
|
Turner EL, Prague M, Gallis JA, Li F, Murray DM. Review of Recent Methodological Developments in Group-Randomized Trials: Part 2-Analysis. Am J Public Health 2017; 107:1078-1086. [PMID: 28520480 PMCID: PMC5463203 DOI: 10.2105/ajph.2017.303707] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/05/2017] [Indexed: 12/13/2022]
Abstract
In 2004, Murray et al. reviewed methodological developments in the design and analysis of group-randomized trials (GRTs). We have updated that review with developments in analysis of the past 13 years, with a companion article to focus on developments in design. We discuss developments in the topics of the earlier review (e.g., methods for parallel-arm GRTs, individually randomized group-treatment trials, and missing data) and in new topics, including methods to account for multiple-level clustering and alternative estimation methods (e.g., augmented generalized estimating equations, targeted maximum likelihood, and quadratic inference functions). In addition, we describe developments in analysis of alternative group designs (including stepped-wedge GRTs, network-randomized trials, and pseudocluster randomized trials), which require clustering to be accounted for in their design and analysis.
Collapse
Affiliation(s)
- Elizabeth L Turner
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - Melanie Prague
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - John A Gallis
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - Fan Li
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| | - David M Murray
- Elizabeth L. Turner and John A. Gallis are with the Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, and the Duke Global Health Institute, Duke University. Melanie Prague is with the Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, and Inria, project team SISTM, Bordeaux, France. Fan Li is with the Department of Biostatistics and Bioinformatics, Duke University. David M. Murray is with the Office of Disease Prevention, Division of Program Coordination and Strategic Planning, and the Office of the Director, National Institutes of Health, Rockville, MD
| |
Collapse
|
13
|
|
14
|
Yang W, Liao S. A study of quadratic inference functions with alternative weighting matrices. COMMUN STAT-SIMUL C 2017. [DOI: 10.1080/03610918.2014.988255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Weiming Yang
- College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing, China
| | - Shu Liao
- College of Mathematics and Statistics, Chongqing Technology and Business University, Chongqing, China
| |
Collapse
|
15
|
Kwon Y, Choi YG, Park T, Ziegler A, Paik MC. Generalized estimating equations with stabilized working correlation structure. Comput Stat Data Anal 2017. [DOI: 10.1016/j.csda.2016.08.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Changes in Obesity Odds Ratio among Iranian Adults, since 2000: Quadratic Inference Functions Method. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2016; 2016:7101343. [PMID: 27803729 PMCID: PMC5075634 DOI: 10.1155/2016/7101343] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 09/14/2016] [Indexed: 11/18/2022]
Abstract
Background. Monitoring changes in obesity prevalence by risk factors is relevant to public health programs that focus on reducing or preventing obesity. The purpose of this paper was to study trends in obesity odds ratios (ORs) for individuals aged 20 years and older in Iran by using a new statistical methodology. Methods. Data collected by the National Surveys in Iran, from 2000 through 2011. Since responses of the member of each cluster are correlated, the quadratic inference functions (QIF) method was used to model the relationship between the odds of obesity and risk factors. Results. During the study period, the prevalence rate of obesity increased from 12% to 22%. By using QIF method and a model selection criterion for performing stepwise regression analysis, we found that while obesity prevalence generally increased in both sexes, all ages, all employment, residence, and smoking levels, it seems to have changes in obesity ORs since 2000. Conclusions. Because obesity is one of the main risk factors for many diseases, awareness of the differences by factors allows development of targets for prevention and early intervention.
Collapse
|
17
|
|
18
|
Morphometry Predicts Early GFR Change in Primary Proteinuric Glomerulopathies: A Longitudinal Cohort Study Using Generalized Estimating Equations. PLoS One 2016; 11:e0157148. [PMID: 27285824 PMCID: PMC4902229 DOI: 10.1371/journal.pone.0157148] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 05/25/2016] [Indexed: 12/17/2022] Open
Abstract
OBJECTIVE Most predictive models of kidney disease progression have not incorporated structural data. If structural variables have been used in models, they have generally been only semi-quantitative. METHODS We examined the predictive utility of quantitative structural parameters measured on the digital images of baseline kidney biopsies from the NEPTUNE study of primary proteinuric glomerulopathies. These variables were included in longitudinal statistical models predicting the change in estimated glomerular filtration rate (eGFR) over up to 55 months of follow-up. RESULTS The participants were fifty-six pediatric and adult subjects from the NEPTUNE longitudinal cohort study who had measurements made on their digital biopsy images; 25% were African-American, 70% were male and 39% were children; 25 had focal segmental glomerular sclerosis, 19 had minimal change disease, and 12 had membranous nephropathy. We considered four different sets of candidate predictors, each including four quantitative structural variables (for example, mean glomerular tuft area, cortical density of patent glomeruli and two of the principal components from the correlation matrix of six fractional cortical areas-interstitium, atrophic tubule, intact tubule, blood vessel, sclerotic glomerulus, and patent glomerulus) along with 13 potentially confounding demographic and clinical variables (such as race, age, diagnosis, and baseline eGFR, quantitative proteinuria and BMI). We used longitudinal linear models based on these 17 variables to predict the change in eGFR over up to 55 months. All 4 models had a leave-one-out cross-validated R2 of about 62%. CONCLUSIONS Several combinations of quantitative structural variables were significantly and strongly associated with changes in eGFR. The structural variables were generally stronger than any of the confounding variables, other than baseline eGFR. Our findings suggest that quantitative assessment of diagnostic renal biopsies may play a role in estimating the baseline risk of succeeding loss of renal function in future clinical studies, and possibly in clinical practice.
Collapse
|
19
|
Wang F, Wang L, Song PXK. Fused lasso with the adaptation of parameter ordering in combining multiple studies with repeated measurements. Biometrics 2016; 72:1184-1193. [PMID: 26909642 DOI: 10.1111/biom.12496] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Revised: 11/01/2015] [Accepted: 12/01/2015] [Indexed: 12/01/2022]
Abstract
Combining multiple studies is frequently undertaken in biomedical research to increase sample sizes for statistical power improvement. We consider the marginal model for the regression analysis of repeated measurements collected in several similar studies with potentially different variances and correlation structures. It is of great importance to examine whether there exist common parameters across study-specific marginal models so that simpler models, sensible interpretations, and meaningful efficiency gain can be obtained. Combining multiple studies via the classical means of hypothesis testing involves a large number of simultaneous tests for all possible subsets of common regression parameters, in which it results in unduly large degrees of freedom and low statistical power. We develop a new method of fused lasso with the adaptation of parameter ordering (FLAPO) to scrutinize only adjacent-pair parameter differences, leading to a substantial reduction for the number of involved constraints. Our method enjoys the oracle properties as does the full fused lasso based on all pairwise parameter differences. We show that FLAPO gives estimators with smaller error bounds and better finite sample performance than the full fused lasso. We also establish a regularized inference procedure based on bias-corrected FLAPO. We illustrate our method through both simulation studies and an analysis of HIV surveillance data collected over five geographic regions in China, in which the presence or absence of common covariate effects is reflective to relative effectiveness of regional policies on HIV control and prevention.
Collapse
Affiliation(s)
- Fei Wang
- Global Analytics, Ford Motor Credit, Dearborn, Michigan, U.S.A. 48126
| | - Lu Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A. 48109
| | - Peter X-K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, U.S.A. 48109
| |
Collapse
|
20
|
Wang F, Song PXK, Wang L. Merging multiple longitudinal studies with study-specific missing covariates: A joint estimating function approach. Biometrics 2015; 71:929-40. [PMID: 26193911 DOI: 10.1111/biom.12356] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 04/01/2015] [Accepted: 05/01/2015] [Indexed: 11/28/2022]
Abstract
Merging multiple datasets collected from studies with identical or similar scientific objectives is often undertaken in practice to increase statistical power. This article concerns the development of an effective statistical method that enables to merge multiple longitudinal datasets subject to various heterogeneous characteristics, such as different follow-up schedules and study-specific missing covariates (e.g., covariates observed in some studies but missing in other studies). The presence of study-specific missing covariates presents great statistical methodology challenge in data merging and analysis. We propose a joint estimating function approach to addressing this challenge, in which a novel nonparametric estimating function constructed via splines-based sieve approximation is utilized to bridge estimating equations from studies with missing covariates to those with fully observed covariates. Under mild regularity conditions, we show that the proposed estimator is consistent and asymptotically normal. We evaluate finite-sample performances of the proposed method through simulation studies. In comparison to the conventional multiple imputation approach, our method exhibits smaller estimation bias. We provide an illustrative data analysis using longitudinal cohorts collected in Mexico City to assess the effect of lead exposures on children's somatic growth.
Collapse
Affiliation(s)
- Fei Wang
- Global Analytics, Ford Motor Credit, Dearborn, Michigan 48126, U.S.A
| | - Peter X-K Song
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| | - Lu Wang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
21
|
Westgate PM. A Comparison of Utilized and Theoretical Covariance Weighting Matrices on the Estimation Performance of Quadratic Inference Functions. COMMUN STAT-SIMUL C 2014. [DOI: 10.1080/03610918.2012.752839] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
22
|
Westgate PM. Improving the correlation structure selection approach for generalized estimating equations and balanced longitudinal data. Stat Med 2014; 33:2222-37. [DOI: 10.1002/sim.6106] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2013] [Revised: 01/07/2014] [Accepted: 01/16/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Philip M. Westgate
- Department of Biostatistics, College of Public Health; University of Kentucky; Lexington KY 40536 U.S.A
| |
Collapse
|
23
|
Westgate PM. Criterion for the simultaneous selection of a working correlation structure and either generalized estimating equations or the quadratic inference function approach. Biom J 2014; 56:461-76. [DOI: 10.1002/bimj.201300098] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2013] [Revised: 09/30/2013] [Accepted: 10/27/2013] [Indexed: 11/07/2022]
Affiliation(s)
- Philip M. Westgate
- Department of Biostatistics, College of Public Health; University of Kentucky; Lexington KY 40536 USA
| |
Collapse
|
24
|
Asgari F, Biglarian A, Seifi B, Bakhshi A, Miri HH, Bakhshi E. Using quadratic inference functions to determine the factors associated with obesity: findings from the STEPS Survey in Iran. Ann Epidemiol 2013; 23:534-8. [DOI: 10.1016/j.annepidem.2013.07.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 06/27/2013] [Accepted: 07/06/2013] [Indexed: 10/26/2022]
|
25
|
Ghahroodi ZR, Ganjali M. A Bayesian approach for analysing longitudinal nominal outcomes using random coefficients transitional generalized logit model: an application to the labour force survey data. J Appl Stat 2013. [DOI: 10.1080/02664763.2013.785653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
26
|
A bias-corrected covariance estimator for improved inference when using an unstructured correlation with quadratic inference functions. Stat Probab Lett 2013. [DOI: 10.1016/j.spl.2013.02.021] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
27
|
Westgate PM, Braun TM. An improved quadratic inference function for parameter estimation in the analysis of correlated data. Stat Med 2012; 32:3260-73. [DOI: 10.1002/sim.5715] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2012] [Accepted: 12/03/2012] [Indexed: 11/07/2022]
Affiliation(s)
- Philip M. Westgate
- Department of Biostatistics, College of Public Health; University of Kentucky; Lexington KY 40536 U.S.A
| | - Thomas M. Braun
- Department of Biostatistics, School of Public Health; University of Michigan; Ann Arbor MI 48109 U.S.A
| |
Collapse
|
28
|
Westgate PM. A bias correction for covariance estimators to improve inference with generalized estimating equations that use an unstructured correlation matrix. Stat Med 2012; 32:2850-8. [DOI: 10.1002/sim.5709] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 11/27/2012] [Indexed: 11/09/2022]
Affiliation(s)
- Philip M. Westgate
- Department of Biostatistics; College of Public Health, University of Kentucky; Lexington KY 40536 U.S.A
| |
Collapse
|
29
|
Westgate PM. A bias-corrected covariance estimate for improved inference with quadratic inference functions. Stat Med 2012; 31:4003-22. [PMID: 22807168 DOI: 10.1002/sim.5479] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 05/14/2012] [Indexed: 11/12/2022]
Abstract
The method of quadratic inference functions (QIF) is an increasingly popular method for the analysis of correlated data because of its multiple advantages over generalized estimating equations (GEE). One advantage is that it is more efficient for parameter estimation when the working covariance structure for the data is misspecified. In the QIF literature, the asymptotic covariance formula is used to obtain standard errors. We show that in small to moderately sized samples, these standard error estimates can be severely biased downward, therefore inflating test size and decreasing coverage probability. We propose adjustments to the asymptotic covariance formula that eliminate finite-sample biases and, as shown via simulation, lead to substantial improvements in standard error estimates, inference, and coverage. The proposed method is illustrated in application to a cluster randomized trial and a longitudinal study. Furthermore, QIF and GEE are contrasted via simulation and these applications.
Collapse
Affiliation(s)
- Philip M Westgate
- Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, KY 40536, USA.
| |
Collapse
|
30
|
Hu Y, Song PXK. Sample size determination for quadratic inference functions in longitudinal design with dichotomous outcomes. Stat Med 2012; 31:787-800. [PMID: 22362611 DOI: 10.1002/sim.4458] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2010] [Accepted: 09/16/2011] [Indexed: 11/06/2022]
Abstract
Quadratic inference functions (QIF) methodology is an important alternative to the generalized estimating equations (GEE) method in the longitudinal marginal model, as it offers higher estimation efficiency than the GEE when correlation structure is misspecified. The focus of this paper is on sample size determination and power calculation for QIF based on the Wald test in a marginal logistic model with covariates of treatment, time, and treatment-time interaction. We have made three contributions in this paper: (i) we derived formulas of sample size and power for QIF and compared their performance with those given by the GEE; (ii) we proposed an optimal scheme of sample size determination to overcome the difficulty of unknown true correlation matrix in the sense of minimal average risk; and (iii) we studied properties of both QIF and GEE sample size formulas in relation to the number of follow-up visits and found that the QIF gave more robust sample sizes than the GEE. Using numerical examples, we illustrated that without sacrificing statistical power, the QIF design leads to sample size saving and hence lower study cost in comparison with the GEE analysis. We conclude that the QIF analysis is appealing for longitudinal studies.
Collapse
Affiliation(s)
- Youna Hu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
31
|
Westgate PM, Braun TM. The effect of cluster size imbalance and covariates on the estimation performance of quadratic inference functions. Stat Med 2012; 31:2209-22. [PMID: 22415948 DOI: 10.1002/sim.5329] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2011] [Accepted: 01/10/2012] [Indexed: 11/08/2022]
Abstract
Generalized estimating equations (GEE) are commonly used for the analysis of correlated data. However, use of quadratic inference functions (QIFs) is becoming popular because it increases efficiency relative to GEE when the working covariance structure is misspecified. Although shown to be advantageous in the literature, the impacts of covariates and imbalanced cluster sizes on the estimation performance of the QIF method in finite samples have not been studied. This cluster size variation causes QIF's estimating equations and GEE to be in separate classes when an exchangeable correlation structure is implemented, causing QIF and GEE to be incomparable in terms of efficiency. When utilizing this structure and the number of clusters is not large, we discuss how covariates and cluster size imbalance can cause QIF, rather than GEE, to produce estimates with the larger variability. This occurrence is mainly due to the empirical nature of weighting QIF employs, rather than differences in estimating equations classes. We demonstrate QIF's lost estimation precision through simulation studies covering a variety of general cluster randomized trial scenarios and compare QIF and GEE in the analysis of data from a cluster randomized trial.
Collapse
Affiliation(s)
- Philip M Westgate
- Department of Biostatistics, College of Public Health, University of Kentucky, Lexington, KY 40536, USA.
| | | |
Collapse
|
32
|
Guerra MW, Shults J, Amsterdam J, Ten-Have T. The analysis of binary longitudinal data with time-dependent covariates. Stat Med 2012; 31:931-48. [PMID: 22246815 DOI: 10.1002/sim.4465] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 10/03/2011] [Indexed: 11/06/2022]
Abstract
We consider longitudinal studies with binary outcomes that are measured repeatedly on subjects over time. The goal of our analysis was to fit a logistic model that relates the expected value of the outcomes with explanatory variables that are measured on each subject. However, additional care must be taken to adjust for the association between the repeated measurements on each subject. We propose a new maximum likelihood method for covariates that may be fixed or time varying. We also implement and make comparisons with two other approaches: generalized estimating equations, which may be more robust to misspecification of the true correlation structure, and alternating logistic regression, which models association via odds ratios that are subject to less restrictive constraints than are correlations. The proposed estimation procedure will yield consistent and asymptotically normal estimates of the regression and correlation parameters if the correlation on consecutive measurements on a subject is correctly specified. Simulations demonstrate that our approach can yield improved efficiency in estimation of the regression parameter; for equally spaced and complete data, the gains in efficiency were greatest for the parameter associated with a time-by-group interaction term and for stronger values of the correlation. For unequally spaced data and with dropout according to a missing-at-random mechanism, MARK1ML with correctly specified consecutive correlations yielded substantial improvements in terms of both bias and efficiency. We present an analysis to demonstrate application of the methods we consider. We also offer an R function for easy implementation of our approach.
Collapse
Affiliation(s)
- Matthew W Guerra
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | | | | | | |
Collapse
|