101
|
Niu F, Zhou J, Le TH, Ma JZ. Testing the trajectory difference in a semi-parametric longitudinal model. Stat Methods Med Res 2017; 26:1519-1531. [PMID: 25972495 PMCID: PMC4644124 DOI: 10.1177/0962280215584109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Motivated by a genetic investigation on the progressive decline in renal function in a clinical trial study of kidney disease, we develop a practical test for evaluating the group difference in trajectories under a semi-parametric modeling framework. For the temporal patterns or trajectories of longitudinal data, B-splines are used to approximate the function non-parametrically. Such approximation asymptotically converts the problem of testing trajectory difference into the significance test of regression coefficients that can be simply estimated by generalized estimating equations. To select the optimal number of inner knots for B-splines, a cross-validation procedure is performed using the criterion of the generalized residual sum of squares. The new proposed test successfully detects a significant difference of underlying genetic impact on the progression of renal disease, which is not captured by the parametric approach.
Collapse
|
102
|
Ying GS, Maguire MG, Glynn R, Rosner B. Tutorial on Biostatistics: Statistical Analysis for Correlated Binary Eye Data. Ophthalmic Epidemiol 2017; 25:1-12. [PMID: 28532207 DOI: 10.1080/09286586.2017.1320413] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
PURPOSE To describe and demonstrate methods for analyzing correlated binary eye data. METHODS We describe non-model based (McNemar's test, Cochran-Mantel-Haenszel test) and model-based methods (generalized linear mixed effects model, marginal model) for analyses involving both eyes. These methods were applied to: (1) CAPT (Complications of Age-related Macular Degeneration Prevention Trial) where one eye was treated and the other observed (paired design); (2) ETROP (Early Treatment for Retinopathy of Prematurity) where bilaterally affected infants had one eye treated conventionally and the other treated early and unilaterally affected infants had treatment assigned randomly; and (3) AREDS (Age-Related Eye Disease Study) where treatment was systemic and outcome was eye-specific (both eyes in the same treatment group). RESULTS In the CAPT (n = 80), treatment group (30% vision loss in treated vs. 44% in observed eyes) was not statistically significant (p = 0.07) when inter-eye correlation was ignored, but was significant (p = 0.01) with McNemar's test and the marginal model. Using standard logistic regression for unfavorable vision in ETROP, standard errors and p-values were larger for person-level covariates and were smaller for ocular covariates than using models accounting for inter-eye correlation. For risk factors of geographic atrophy in AREDS, two-eye analyses accounting for inter-eye correlation yielded more power than one-eye analyses and provided larger standard errors and p-values than invalid two-eye analyses ignoring inter-eye correlation. CONCLUSION Ignoring inter-eye correlation can lead to larger p-values for paired designs and smaller p-values when both eyes are in the same group. Marginal models or mixed effects models using the eye as the unit of analysis provide valid inference.
Collapse
|
103
|
Chen IC, Westgate PM. Improved methods for the marginal analysis of longitudinal data in the presence of time-dependent covariates. Stat Med 2017; 36:2533-2546. [PMID: 28436045 DOI: 10.1002/sim.7307] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 01/27/2017] [Accepted: 03/15/2017] [Indexed: 11/06/2022]
Abstract
Generalized estimating equations (GEEs) are commonly used for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, in the presence of certain types of time-dependent covariates, these equations can be biased unless they incorporate the independence working correlation structure. Moreover, in this case, regression parameter estimation can be very inefficient because not all valid moment conditions are incorporated within the corresponding estimating equations. Therefore, approaches based on the generalized method of moments or quadratic inference functions have been proposed in order to utilize all valid moment conditions. However, we have found in previous studies, as well as the current study, that such methods will not always provide valid inference and can also be improved upon in terms of finite-sample regression parameter estimation. Therefore, we propose both a modified GEE approach and a method selection strategy in order to ensure valid inference with the goal of improving regression parameter estimation. In a simulation study and application example, we compare existing and proposed methods and demonstrate that our modified GEE approach performs well, and the correlation information criterion has good accuracy with respect to selecting the best approach in terms of regression parameter estimation. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
|
104
|
Westgate PM, Burchett WW. On the analysis of very small samples of Gaussian repeated measurements: an alternative approach. Stat Med 2017; 36:958-970. [PMID: 28064473 PMCID: PMC5291809 DOI: 10.1002/sim.7199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Revised: 10/13/2016] [Accepted: 11/21/2016] [Indexed: 11/06/2022]
Abstract
The analysis of very small samples of Gaussian repeated measurements can be challenging. First, due to a very small number of independent subjects contributing outcomes over time, statistical power can be quite small. Second, nuisance covariance parameters must be appropriately accounted for in the analysis in order to maintain the nominal test size. However, available statistical strategies that ensure valid statistical inference may lack power, whereas more powerful methods may have the potential for inflated test sizes. Therefore, we explore an alternative approach to the analysis of very small samples of Gaussian repeated measurements, with the goal of maintaining valid inference while also improving statistical power relative to other valid methods. This approach uses generalized estimating equations with a bias-corrected empirical covariance matrix that accounts for all small-sample aspects of nuisance correlation parameter estimation in order to maintain valid inference. Furthermore, the approach utilizes correlation selection strategies with the goal of choosing the working structure that will result in the greatest power. In our study, we show that when accurate modeling of the nuisance correlation structure impacts the efficiency of regression parameter estimation, this method can improve power relative to existing methods that yield valid inference. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
|
105
|
Ying GS, Maguire MG, Glynn R, Rosner B. Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data. Ophthalmic Epidemiol 2017; 24:130-140. [PMID: 28102741 DOI: 10.1080/09286586.2016.1259636] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
PURPOSE To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. METHODS We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. RESULTS When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. CONCLUSION In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Collapse
|
106
|
Daza EJ, Hudgens MG, Herring AH. Estimating inverse-probability weights for longitudinal data with dropout or truncation: The xtrccipw command. THE STATA JOURNAL 2017; 17:253-278. [PMID: 29755297 PMCID: PMC5947963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Individuals may drop out of a longitudinal study, rendering their outcomes unobserved but still well defined. However, they may also undergo truncation (for example, death), beyond which their outcomes are no longer meaningful. Kurland and Heagerty (2005, Biostatistics 6: 241-258) developed a method to conduct regression conditioning on nontruncation, that is, regression conditioning on continuation (RCC), for longitudinal outcomes that are monotonically missing at random (for example, because of dropout). This method first estimates the probability of dropout among continuing individuals to construct inverse-probability weights (IPWs), then fits generalized estimating equations (GEE) with these IPWs. In this article, we present the xtrccipw command, which can both estimate the IPWs required by RCC and then use these IPWs in a GEE estimator by calling the glm command from within xtrccipw. In the absence of truncation, the xtrccipw command can also be used to run a weighted GEE analysis. We demonstrate the xtrccipw command by analyzing an example dataset and the original Kurland and Heagerty (2005) data. We also use xtrccipw to illustrate some empirical properties of RCC through a simulation study.
Collapse
|
107
|
Yang K, Tao L, Mahara G, Yan Y, Cao K, Liu X, Chen S, Xu Q, Liu L, Wang C, Huang F, Zhang J, Yan A, Ping Z, Guo X. An association of platelet indices with blood pressure in Beijing adults: Applying quadratic inference function for a longitudinal study. Medicine (Baltimore) 2016; 95:e4964. [PMID: 27684843 PMCID: PMC5265936 DOI: 10.1097/md.0000000000004964] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The quadratic inference function (QIF) method becomes more acceptable for correlated data because of its advantages over generalized estimating equations (GEE). This study aimed to evaluate the relationship between platelet indices and blood pressure using QIF method, which has not been studied extensively in real data settings.A population-based longitudinal study was conducted in Beijing from 2007 to 2012, and the median of follow-up was 6 years. A total of 6515 cases, who were aged between 20 and 65 years at baseline and underwent routine physical examinations every year from 3 Beijing hospitals were enrolled to explore the association between platelet indices and blood pressure by QIF method. The original continuous platelet indices were categorized into 4 levels (Q1-Q4) using the 3 quartiles of P25, P50, and P75 as a critical value. GEE was performed to make a comparison with QIF.After adjusting for age, usage of drugs, and other confounding factors, mean platelet volume was negatively associated with diastolic blood pressure (DBP) (Equation is included in full-text article.)in males and positively linked with systolic blood pressure (SBP) (Equation is included in full-text article.). Platelet distribution width was negatively associated with SBP (Equation is included in full-text article.). Blood platelet count was associated with DBP (Equation is included in full-text article.)in males.Adults in Beijing with prolonged exposure to extreme value of platelet indices have elevated risk for future hypertension and evidence suggesting using some platelet indices for early diagnosis of high blood pressure was provided.
Collapse
|
108
|
Vujkovic M, Aplenc R, Alonzo TA, Gamis AS, Li Y. Comparing Analytic Methods for Longitudinal GWAS and a Case-Study Evaluating Chemotherapy Course Length in Pediatric AML. A Report from the Children's Oncology Group. Front Genet 2016; 7:139. [PMID: 27547214 PMCID: PMC4974249 DOI: 10.3389/fgene.2016.00139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 07/19/2016] [Indexed: 12/11/2022] Open
Abstract
Regression analysis is commonly used in genome-wide association studies (GWAS) to test genotype-phenotype associations but restricts the phenotype to a single observation for each individual. There is an increasing need for analytic methods for longitudinally collected phenotype data. Several methods have been proposed to perform longitudinal GWAS for family-based studies but few methods are described for unrelated populations. We compared the performance of three statistical approaches for longitudinal GWAS in unrelated subjectes: (1) principal component-based generalized estimating equations (PC-GEE); (2) principal component-based linear mixed effects model (PC-LMEM); (3) kinship coefficient matrix-based linear mixed effects model (KIN-LMEM), in a study of single-nucleotide polymorphisms (SNPs) on the duration of 4 courses of chemotherapy in 624 unrelated children with de novo acute myeloid leukemia (AML) genotyped on the Illumina 2.5 M OmniQuad from the COG studies AAML0531 and AAML1031. In this study we observed an exaggerated type I error with PC-GEE in SNPs with minor allele frequencies < 0.05, wheras KIN-LMEM produces more than expected type II errors. PC-MEM showed balanced type I and type II errors for the observed vs. expected P-values in comparison to competing approaches. In general, a strong concordance was observed between the P-values with the different approaches, in particular among P < 0.01 where the between-method AUCs exceed 99%. PC-LMEM accounts for genetic relatedness and correlations among repeated phenotype measures, shows minimal genome-wide inflation of type I errors, and yields high power. We therefore recommend PC-LMEM as a robust analytic approach for GWAS of longitudinal data in unrelated populations.
Collapse
|
109
|
Gelman R, Smith RT, Tsang SH. DIAGNOSTIC ACCURACY EVALUATION OF VISUAL ACUITY AND FUNDUS AUTOFLUORESCENCE MACULAR GEOGRAPHIC ATROPHY AREA FOR THE DISCRIMINATION OF STARGARDT GROUPS. Retina 2016; 36:1596-601. [PMID: 26818478 PMCID: PMC4961576 DOI: 10.1097/iae.0000000000000960] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE To evaluate diagnostic accuracy of visual acuity and fundus autofluorescence (FAF) macular geographic atrophy (GA) area for the discrimination of autosomal recessive Stargardt groups. METHODS Subjects aged <50 years old with confirmed molecular diagnoses were classified to Groups 1, 2, or 3 according to a full-field electroretinogram reference standard. Diagnostic accuracy of visual acuity and the FAF macular GA area was assessed with generalized estimating equations, receiver operating characteristic curve area under the curve, and support vector machines. RESULTS Ten eyes were classified as Group 1 and 7 as Group 2. The mean log minimum angle resolution (Snellen equivalent) was 0.64 (20/87) for group 1 and 0.96 (20/182) for group 2. Mean FAF macular GA area was 0.96 mm for Group 1 and 3.23 mm for Group 2. The generalized estimating equation analysis showed an 8.3% increase in odds of Group 2 classification with each 0.1-unit increase in log minimum angle resolution and a 24% increase with each 1-mm increase in FAF macular GA area. Multivariate generalized estimating equation analysis showed that only the FAF macular GA area was significant. Area under the curve was 0.79 for log minimum angle resolution and 0.89 for FAF macular GA area. The support vector machine classification accuracy was 71% for log minimum angle resolution and 82% for FAF macular GA area. CONCLUSION Visual acuity and FAF macular GA area had good independent accuracy for the discrimination of groups 1 and 2, indicating that they may serve as useful diagnostic parameters.
Collapse
|
110
|
Muth C, Bales KL, Hinde K, Maninger N, Mendoza SP, Ferrer E. Alternative Models for Small Samples in Psychological Research: Applying Linear Mixed Effects Models and Generalized Estimating Equations to Repeated Measures Data. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 2016; 76:64-87. [PMID: 29795857 PMCID: PMC5965574 DOI: 10.1177/0013164415580432] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Unavoidable sample size issues beset psychological research that involves scarce populations or costly laboratory procedures. When incorporating longitudinal designs these samples are further reduced by traditional modeling techniques, which perform listwise deletion for any instance of missing data. Moreover, these techniques are limited in their capacity to accommodate alternative correlation structures that are common in repeated measures studies. Researchers require sound quantitative methods to work with limited but valuable measures without degrading their data sets. This article provides a brief tutorial and exploration of two alternative longitudinal modeling techniques, linear mixed effects models and generalized estimating equations, as applied to a repeated measures study (n = 12) of pairmate attachment and social stress in primates. Both techniques provide comparable results, but each model offers unique information that can be helpful when deciding the right analytic tool.
Collapse
|
111
|
Chien LC, Hsu FC, Bowden DW, Chiu YF. Generalization of Rare Variant Association Tests for Longitudinal Family Studies. Genet Epidemiol 2016; 40:101-12. [PMID: 26783077 DOI: 10.1002/gepi.21951] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 11/19/2015] [Accepted: 11/19/2015] [Indexed: 11/06/2022]
Abstract
Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree-based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree-based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed-effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes.
Collapse
|
112
|
Klisz M, Koprowski M, Ukalska J, Nabais C. Does the Genotype Have a Significant Effect on the Formation of Intra-Annual Density Fluctuations? A Case Study Using Larix decidua from Northern Poland. FRONTIERS IN PLANT SCIENCE 2016; 7:691. [PMID: 27242883 PMCID: PMC4873497 DOI: 10.3389/fpls.2016.00691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2016] [Accepted: 05/05/2016] [Indexed: 05/09/2023]
Abstract
Intra-annual density fluctuations (IADFs) can imprint environmental conditions within the growing season and most of the research on IADFs has been focused on their climatic signal. However, to our knowledge, the genetic influence on the frequency and type of IADFs has not been evaluated. To understand if the genotype can affect the formation of IADFs we have used a common garden experiment using eight families of Larix decidua established in two neighboring forest stands in northern Poland. Four types of IADFs were identified using X-ray density profiles: latewood-like cells within earlywood (IADF-type E), latewood-like cells in the transition from early- to latewood (IADF type E+), earlywood-like cells within latewood (IADF-type L), and earlywood-like cells in the border zone between the previous and present annual ring (IADF-type L+). The influence of explanatory variables i.e., families, sites, and years on identified density fluctuations was analyzed using generalized estimating equations (GEE). We hypothesized that trees from different families will differ in terms of frequency and type of IADFs because each family will react to precipitation and temperature in a different way, depending on the origin of those trees. The most frequent fluctuation was E+ and L types on both sites. The most important factors in the formation of IADFs were the site and year, the last one reflecting the variable climatic conditions, with no significant effect of the family. However, the relation between the formation of IADFs and selected climate parameters was different between families. Although, our results did not give a significant effect of the genotype on the formation of IADFs, the different sensitivity to climatic parameters among different families indicate that there is a genetic influence.
Collapse
|
113
|
Wang M, Kong L, Li Z, Zhang L. Covariance estimators for generalized estimating equations (GEE) in longitudinal analysis with small samples. Stat Med 2015; 35:1706-21. [PMID: 26585756 DOI: 10.1002/sim.6817] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2015] [Revised: 09/09/2015] [Accepted: 10/28/2015] [Indexed: 11/07/2022]
Abstract
Generalized estimating equations (GEE) is a general statistical method to fit marginal models for longitudinal data in biomedical studies. The variance-covariance matrix of the regression parameter coefficients is usually estimated by a robust "sandwich" variance estimator, which does not perform satisfactorily when the sample size is small. To reduce the downward bias and improve the efficiency, several modified variance estimators have been proposed for bias-correction or efficiency improvement. In this paper, we provide a comprehensive review on recent developments of modified variance estimators and compare their small-sample performance theoretically and numerically through simulation and real data examples. In particular, Wald tests and t-tests based on different variance estimators are used for hypothesis testing, and the guideline on appropriate sample sizes for each estimator is provided for preserving type I error in general cases based on numerical results. Moreover, we develop a user-friendly R package "geesmv" incorporating all of these variance estimators for public usage in practice.
Collapse
|
114
|
Burkett KM, Roy-Gagnon MH, Lefebvre JF, Wang C, Fontaine-Bisson B, Dubois L. A Comparison of Statistical Methods for the Discovery of Genetic Risk Factors Using Longitudinal Family Study Designs. Front Immunol 2015; 6:589. [PMID: 26635803 PMCID: PMC4652172 DOI: 10.3389/fimmu.2015.00589] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 11/02/2015] [Indexed: 01/04/2023] Open
Abstract
The etiology of immune-related diseases or traits is often complex, involving many genetic and environmental factors and their interactions. While methodological approaches focusing on an outcome measured at one time point have succeeded in identifying genetic factors involved in immune-related traits, they fail to capture complex disease mechanisms that fluctuate over time. It is increasingly recognized that longitudinal studies, where an outcome is measured at multiple time points, have great potential to shed light on complex disease mechanisms involving genetic factors. However, longitudinal data require specialized statistical methods, especially in family studies where multiple sources of correlation in the data must be modeled. Using simulated data with known genetic effects, we examined the performance of different analytical methods for investigating associations between genetic factors and longitudinal phenotypes in twin data. The simulations were modeled on data from the Québec Newborn Twin Study, an ongoing population-based longitudinal study of twin births with multiple phenotypes, such as cortisol levels and body mass index, collected multiple times in infancy and early childhood and with sequencing data on immune-related genes and pathways. We compared approaches that we classify as (1) family-based methods applied to summaries of the observations over time, (2) longitudinal-based methods with simplifications of the familial correlation, and (3) Bayesian family-based method with simplifications of the temporal correlation. We found that for estimation of the genetic main and interaction effects, all methods gave estimates close to the true values and had similar power. If heritability estimation is desired, approaches of type (1) also provide heritability estimates close to the true value. Our work shows that the simpler approaches are likely adequate to detect genetic effects; however, interpretation of these effects is more challenging.
Collapse
|
115
|
Westgate PM. A covariance correction that accounts for correlation estimation to improve finite-sample inference with generalized estimating equations: A study on its applicability with structured correlation matrices. J STAT COMPUT SIM 2015; 86:1891-1900. [PMID: 27818539 DOI: 10.1080/00949655.2015.1089873] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
When generalized estimating equations (GEE) incorporate an unstructured working correlation matrix, the variances of regression parameter estimates can inflate due to the estimation of the correlation parameters. In previous work, an approximation for this inflation that results in a corrected version of the sandwich formula for the covariance matrix of regression parameter estimates was derived. Use of this correction for correlation structure selection also reduces the over-selection of the unstructured working correlation matrix. In this manuscript, we conduct a simulation study to demonstrate that an increase in variances of regression parameter estimates can occur when GEE incorporates structured working correlation matrices as well. Correspondingly, we show the ability of the corrected version of the sandwich formula to improve the validity of inference and correlation structure selection. We also study the relative influences of two popular corrections to a different source of bias in the empirical sandwich covariance estimator.
Collapse
|
116
|
Bureau A, Croteau J, Couture C, Vohl MC, Bouchard C, Pérusse L. Estimating genetic effect sizes under joint disease-endophenotype models in presence of gene-environment interactions. Front Genet 2015; 6:248. [PMID: 26284107 PMCID: PMC4516976 DOI: 10.3389/fgene.2015.00248] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 07/08/2015] [Indexed: 12/18/2022] Open
Abstract
Effects of genetic variants on the risk of complex diseases estimated from association studies are typically small. Nonetheless, variants may have important effects in presence of specific levels of environmental exposures, and when a trait related to the disease (endophenotype) is either normal or impaired. We propose polytomous and transition models to represent the relationship between disease, endophenotype, genotype and environmental exposure in family studies. Model coefficients were estimated using generalized estimating equations and were used to derive gene-environment interaction effects and genotype effects at specific levels of exposure. In a simulation study, estimates of the effect of a genetic variant were substantially higher when both an endophenotype and an environmental exposure modifying the variant effect were taken into account, particularly under transition models, compared to the alternative of ignoring the endophenotype. Illustration of the proposed modeling with the metabolic syndrome, abdominal obesity, physical activity and polymorphisms in the NOX3 gene in the Quebec Family Study revealed that the positive association of the A allele of rs1375713 with the metabolic syndrome at high levels of physical activity was only detectable in subjects without abdominal obesity, illustrating the importance of taking into account the abdominal obesity endophenotype in this analysis.
Collapse
|
117
|
Tang W, Lu N, Chen T, Wang W, Gunzler DD, Han Y, Tu XM. On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses. Stat Med 2015; 34:3235-45. [PMID: 26078035 DOI: 10.1002/sim.6560] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 02/15/2015] [Accepted: 05/26/2015] [Indexed: 11/07/2022]
Abstract
Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data.
Collapse
|
118
|
Associations Between Fast-Food Consumption and Body Mass Index: A Cross-Sectional Study in Adult Twins. Twin Res Hum Genet 2015; 18:375-82. [PMID: 26005202 DOI: 10.1017/thg.2015.33] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Obesity is a substantial health problem in the United States, and is associated with many chronic diseases. Previous studies have linked poor dietary habits to obesity. This cross-sectional study aimed to identify the association between body mass index (BMI) and fast-food consumption among 669 same-sex adult twin pairs residing in the Puget Sound region around Seattle, Washington. We calculated twin-pair correlations for BMI and fast-food consumption. We next regressed BMI on fast-food consumption using generalized estimating equations (GEE), and finally estimated the within-pair difference in BMI associated with a difference in fast-food consumption, which controls for all potential genetic and environment characteristics shared between twins within a pair. Twin-pair correlations for fast-food consumption were similar for identical (monozygotic; MZ) and fraternal (dizygotic; DZ) twins, but were substantially higher in MZ than DZ twins for BMI. In the unadjusted GEE model, greater fast-food consumption was associated with larger BMI. For twin pairs overall, and for MZ twins, there was no association between within-pair differences in fast-food consumption and BMI in any model. In contrast, there was a significant association between within-pair differences in fast-food consumption and BMI among DZ twins, suggesting that genetic factors play a role in the observed association. Thus, although variance in fast-food consumption itself is largely driven by environmental factors, the overall association between this specific eating behavior and BMI is largely due to genetic factors.
Collapse
|
119
|
Fulton KA, Liu D, Haynie DL, Albert PS. MIXED MODEL AND ESTIMATING EQUATION APPROACHES FOR ZERO INFLATION IN CLUSTERED BINARY RESPONSE DATA WITH APPLICATION TO A DATING VIOLENCE STUDY. Ann Appl Stat 2015; 9:275-299. [PMID: 26937263 DOI: 10.1214/14-aoas791] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian-Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored.
Collapse
|
120
|
Liu D, Liu R, Xie M. Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness. J Am Stat Assoc 2015; 110:326-340. [PMID: 26190875 DOI: 10.1080/01621459.2014.899235] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Meta-analysis has been widely used to synthesize evidence from multiple studies for common hypotheses or parameters of interest. However, it has not yet been fully developed for incorporating heterogeneous studies, which arise often in applications due to different study designs, populations or outcomes. For heterogeneous studies, the parameter of interest may not be estimable for certain studies, and in such a case, these studies are typically excluded from conventional meta-analysis. The exclusion of part of the studies can lead to a non-negligible loss of information. This paper introduces a metaanalysis for heterogeneous studies by combining the confidence density functions derived from the summary statistics of individual studies, hence referred to as the CD approach. It includes all the studies in the analysis and makes use of all information, direct as well as indirect. Under a general likelihood inference framework, this new approach is shown to have several desirable properties, including: i) it is asymptotically as efficient as the maximum likelihood approach using individual participant data (IPD) from all studies; ii) unlike the IPD analysis, it suffices to use summary statistics to carry out the CD approach. Individual-level data are not required; and iii) it is robust against misspecification of the working covariance structure of the parameter estimates. Besides its own theoretical significance, the last property also substantially broadens the applicability of the CD approach. All the properties of the CD approach are further confirmed by data simulated from a randomized clinical trials setting as well as by real data on aircraft landing performance. Overall, one obtains an unifying approach for combining summary statistics, subsuming many of the existing meta-analysis methods as special cases.
Collapse
|
121
|
Smith SS, Fiore MC, Baker TB. Smoking cessation in smokers who smoke menthol and non-menthol cigarettes. Addiction 2014; 109:2107-17. [PMID: 24938369 PMCID: PMC4443703 DOI: 10.1111/add.12661] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 04/29/2014] [Accepted: 06/06/2014] [Indexed: 11/30/2022]
Abstract
AIMS To assess the relations of menthol cigarette use with measures of cessation success in a large comparative effectiveness trial (CET). DESIGN Participants were randomized to one of six medication treatment conditions in a randomized double-blind, placebo-controlled clinical trial. All participants received six individual counseling sessions. SETTING Community-based smokers in two communities in Wisconsin, USA. PARTICIPANTS A total of 1504 adult smokers who smoked at least 10 cigarettes per day during the past 6 months and reported being motivated to quit smoking. The analysis sample comprised 1439 participants: 814 white non-menthol smokers, 439 white menthol smokers and 186 African American (AA) menthol smokers. There were too few AA non-menthol smokers (n = 16) to be included in the analyses. INTERVENTIONS Nicotine lozenge, nicotine patch, bupropion sustained release, nicotine patch + nicotine lozenge, bupropion + nicotine lozenge and placebo. MEASUREMENTS Biochemically confirmed 7-day point-prevalence abstinence assessed at 4, 8 and 26 weeks post-quit. FINDINGS In longitudinal abstinence analyses (generalized estimating equations) controlling for cessation treatment, menthol smoking was associated with reduced likelihood of smoking cessation success relative to non-menthol smoking [model-based estimates of abstinence = 31 versus 38%, respectively; odds ratio (OR) = 0.71, 95% confidence interval (CI) = 0.59, 0.86]. In addition, among menthol smokers, AA women were at especially high risk of cessation failure relative to white women (estimated abstinence = 17 versus 35%, respectively; OR = 2.63, 95% CI = 1.75, 3.96; estimated abstinence rates for AA males and white males were both 30%, OR = 1.06, 95% CI = 0.60, 1.66). CONCLUSION In the United States, smoking menthol cigarettes appears to be associated with reduced cessation success compared with non-menthol smoking, especially in African American females.
Collapse
|
122
|
Leffondre K, Boucquemont J, Tripepi G, Stel VS, Heinze G, Dunkler D. Analysis of risk factors associated with renal function trajectory over time: a comparison of different statistical approaches. Nephrol Dial Transplant 2014; 30:1237-43. [PMID: 25326471 DOI: 10.1093/ndt/gfu320] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Accepted: 08/30/2014] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND The most commonly used methods to investigate risk factors associated with renal function trajectory over time include linear regression on individual glomerular filtration rate (GFR) slopes, linear mixed models and generalized estimating equations (GEEs). The objective of this study was to explain the principles of these three methods and to discuss their advantages and limitations in particular when renal function trajectories are not completely observable due to dropout. METHODS We generated data from a hypothetical cohort of 200 patients with chronic kidney disease at inclusion and seven subsequent annual measurements of GFR. The data were generated such that both baseline level and slope of GFR over time were associated with baseline albuminuria status. In a second version of the dataset, we assumed that patients systematically dropped out after a GFR measurement of <15 mL/min/1.73 m(2). Each dataset was analysed with the three methods. RESULTS The estimated effects of baseline albuminuria status on GFR slope were similar among the three methods when no patient dropped out. When 32.7% dropped out, standard GEE provided biased estimates of the mean GFR slope in normo-, micro- and macroalbuminuric patients. Linear regression on individual slopes and linear mixed models provided slope estimates of the same magnitude, likely because most patients had at least three GFR measurements. However, the linear mixed model was the only method to provide effect estimates on both slope and baseline level of GFR unaffected by dropout. CONCLUSION This study illustrates that the linear mixed model is the preferred method to investigate risk factors associated with renal function trajectories in studies, where patients may dropout during the study period because of initiation of renal replacement therapy.
Collapse
|
123
|
Moineddin R, Meaney C, Grunfeld E. On the analysis of composite measures of quality in medical research. Stat Methods Med Res 2014; 26:633-660. [PMID: 25296866 DOI: 10.1177/0962280214553330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Composite endpoints are commonplace in biomedical research. The complex nature of many health conditions and medical interventions demand that composite endpoints be employed. Different approaches exist for the analysis of composite endpoints. A Monte Carlo simulation study was employed to assess the statistical properties of various regression methods for analyzing binary composite endpoints. We also applied these methods to data from the BETTER trial which employed a binary composite endpoint. We demonstrated that type 1 error rates are poor for the Negative Binomial regression model and the logistic generalized linear mixed model (GLMM). Bias was minimal and power was highest in the binomial logistic regression model, the linear regression model, the Poisson (corrected for over-dispersion) regression model and the common effect logistic generalized estimating equation (GEE) model. Convergence was poor in the distinct effect GEE models, the logistic GLMM and some of the zero-one inflated beta regression models. Considering the BETTER trial data, the distinct effect GEE model struggled with convergence and the collapsed composite method estimated an effect, which was greatly attenuated compared to other models. All remaining models suggested an intervention effect of similar magnitude. In our simulation study, the binomial logistic regression model (corrected for possible over/under-dispersion), the linear regression model, the Poisson regression model (corrected for over-dispersion) and the common effect logistic GEE model appeared to be unbiased, with good type 1 error rates, power and convergence properties.
Collapse
|
124
|
Aloisio KM, Swanson SA, Micali N, Field A, Horton NJ. Analysis of partially observed clustered data using generalized estimating equations and multiple imputation. THE STATA JOURNAL 2014; 14:863-883. [PMID: 25642154 PMCID: PMC4306281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Clustered data arise in many settings, particularly within the social and biomedical sciences. As an example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (e.g. parent and adolescent) to provide a holistic view of a subject's symptomatology. Fitzmaurice et al. (1995) have described estimation of multiple source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data due to additional stages of consent and assent required. The usual GEE is unbiased when missingness is Missing Completely at Random (MCAR) in the sense of Little and Rubin (2002). This is a strong assumption that may not be tenable. Other options such as weighted generalized estimating equations (WEEs) are computationally challenging when missingness is non-monotone. Multiple imputation is an attractive method to fit incomplete data models while only requiring the less restrictive Missing at Random (MAR) assumption. Previously estimation of partially observed clustered data was computationally challenging however recent developments in Stata have facilitated their use in practice. We demonstrate how to utilize multiple imputation in conjunction with a GEE to investigate the prevalence of disordered eating symptoms in adolescents reported by parents and adolescents as well as factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children (ALSPAC), a cohort study that enrolled more than 14,000 pregnant mothers in 1991-92 and has followed the health and development of their children at regular intervals. While point estimates were fairly similar to the GEE under MCAR, the MAR model had smaller standard errors, while requiring less stringent assumptions regarding missingness.
Collapse
|
125
|
Scott JM, deCamp A, Juraska M, Fay MP, Gilbert PB. Finite-sample corrected generalized estimating equation of population average treatment effects in stepped wedge cluster randomized trials. Stat Methods Med Res 2014; 26:583-597. [PMID: 25267551 DOI: 10.1177/0962280214552092] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Stepped wedge designs are increasingly commonplace and advantageous for cluster randomized trials when it is both unethical to assign placebo, and it is logistically difficult to allocate an intervention simultaneously to many clusters. We study marginal mean models fit with generalized estimating equations for assessing treatment effectiveness in stepped wedge cluster randomized trials. This approach has advantages over the more commonly used mixed models that (1) the population-average parameters have an important interpretation for public health applications and (2) they avoid untestable assumptions on latent variable distributions and avoid parametric assumptions about error distributions, therefore, providing more robust evidence on treatment effects. However, cluster randomized trials typically have a small number of clusters, rendering the standard generalized estimating equation sandwich variance estimator biased and highly variable and hence yielding incorrect inferences. We study the usual asymptotic generalized estimating equation inferences (i.e., using sandwich variance estimators and asymptotic normality) and four small-sample corrections to generalized estimating equation for stepped wedge cluster randomized trials and for parallel cluster randomized trials as a comparison. We show by simulation that the small-sample corrections provide improvement, with one correction appearing to provide at least nominal coverage even with only 10 clusters per group. These results demonstrate the viability of the marginal mean approach for both stepped wedge and parallel cluster randomized trials. We also study the comparative performance of the corrected methods for stepped wedge and parallel designs, and describe how the methods can accommodate interval censoring of individual failure times and incorporate semiparametric efficient estimators.
Collapse
|