76
|
Mitani AA, Kaye EK, Nelson KP. Marginal analysis of ordinal clustered longitudinal data with informative cluster size. Biometrics 2019; 75:938-949. [PMID: 30859544 DOI: 10.1111/biom.13050] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Accepted: 02/26/2019] [Indexed: 11/30/2022]
Abstract
The issue of informative cluster size (ICS) often arises in the analysis of dental data. ICS describes a situation where the outcome of interest is related to cluster size. Much of the work on modeling marginal inference in longitudinal studies with potential ICS has focused on continuous outcomes. However, periodontal disease outcomes, including clinical attachment loss, are often assessed using ordinal scoring systems. In addition, participants may lose teeth over the course of the study due to advancing disease status. Here we develop longitudinal cluster-weighted generalized estimating equations (CWGEE) to model the association of ordinal clustered longitudinal outcomes with participant-level health-related covariates, including metabolic syndrome and smoking status, and potentially decreasing cluster size due to tooth-loss, by fitting a proportional odds logistic regression model. The within-teeth correlation coefficient over time is estimated using the two-stage quasi-least squares method. The motivation for our work stems from the Department of Veterans Affairs Dental Longitudinal Study in which participants regularly received general and oral health examinations. In an extensive simulation study, we compare results obtained from CWGEE with various working correlation structures to those obtained from conventional GEE which does not account for ICS. Our proposed method yields results with very low bias and excellent coverage probability in contrast to a conventional generalized estimating equations approach.
Collapse
|
77
|
Bounthavong M, Lau MK, Popish SJ, Kay CL, Wells DL, Himstreet JE, Harvey MA, Christopher MLD. Impact of academic detailing on benzodiazepine use among veterans with posttraumatic stress disorder. Subst Abus 2019; 41:101-109. [PMID: 30870137 DOI: 10.1080/08897077.2019.1573777] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Background: Benzodiazepine use in the US Veterans Administration (VA) has been decreasing; however, a small number of veterans with posttraumatic stress disorder (PTSD) continue to receive benzodiazepine. Academic detailing, a targeted-educational outreach intervention, was implemented at VA to help reduce the disparity between existing and evidence-based practices, including the reduction in benzodiazepine use in veterans with PTSD. Since evidence to support the national implementation of academic detailing in this clinical scenario was scarce, we performed a quality improvement evaluation on academic detailing's impact on benzodiazepine use in veterans with PTSD. Methods: A retrospective cohort design was used to evaluate the impact of academic detailing on benzodiazepine prescribing in veterans with PTSD from January 1, 2016, to December 31, 2016. Providers exposed to academic detailing (AD-exposed) were compared with providers unexposed to academic detailing (AD-unexposed) using generalized estimating equations (GEEs) controlling for baseline covariates. Secondary aims evaluated academic detailing's impact on average lorazepam equivalent daily dose (LEDD), total LEDD, and benzodiazepine day supply. Results: Overall, there was a decrease in the prevalence in benzodiazepine use in veterans with PTSD from 115.5 to 103.3 per 1000 population (P < .001). However, the decrease was greater in AD-exposed providers (18.37%; P < .001) compared with AD-unexposed providers (8.74%; P < .001). In the GEE models, AD-exposed providers had greater reduction in the monthly prevalence of veterans with PTSD and a benzodiazepine prescription compared with AD-unexposed providers, by -1.30 veterans per 1000 population (95% confidence interval [CI]: -2.14, -0.46). Similar findings were reported for the benzodiazepine day supply; however, no significant differences were reported for total and average LEDD. Conclusions: Although benzodiazepine use has been decreasing in veterans with PTSD, opportunities to improve prescribing continue to exist at the VA. In this quality improvement evaluation, AD-exposed providers were associated with a greater reduction in the prevalence of veterans with PTSD and a benzodiazepine prescription compared with AD-unexposed providers.
Collapse
|
78
|
Spiess M, Jordan P, Wendt M. Simplified Estimation and Testing in Unbalanced Repeated Measures Designs. PSYCHOMETRIKA 2019; 84:212-235. [PMID: 29736784 DOI: 10.1007/s11336-018-9620-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 04/20/2018] [Indexed: 06/08/2023]
Abstract
In this paper we propose a simple estimator for unbalanced repeated measures design models where each unit is observed at least once in each cell of the experimental design. The estimator does not require a model of the error covariance structure. Thus, circularity of the error covariance matrix and estimation of correlation parameters and variances are not necessary. Together with a weak assumption about the reason for the varying number of observations, the proposed estimator and its variance estimator are unbiased. As an alternative to confidence intervals based on the normality assumption, a bias-corrected and accelerated bootstrap technique is considered. We also propose the naive percentile bootstrap for Wald-type tests where the standard Wald test may break down when the number of observations is small relative to the number of parameters to be estimated. In a simulation study we illustrate the properties of the estimator and the bootstrap techniques to calculate confidence intervals and conduct hypothesis tests in small and large samples under normality and non-normality of the errors. The results imply that the simple estimator is only slightly less efficient than an estimator that correctly assumes a block structure of the error correlation matrix, a special case of which is an equi-correlation matrix. Application of the estimator and the bootstrap technique is illustrated using data from a task switch experiment based on an experimental within design with 32 cells and 33 participants.
Collapse
|
79
|
Hoover DR, Shi Q, Burstyn I, Anastos K. Repeated Measures Regression in Laboratory, Clinical and Environmental Research: Common Misconceptions in the Matter of Different Within- and between-Subject Slopes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E504. [PMID: 30754731 PMCID: PMC6388388 DOI: 10.3390/ijerph16030504] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 02/04/2019] [Accepted: 02/06/2019] [Indexed: 11/16/2022]
Abstract
When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs more than a second indicate higher cholesterol in the heavier adult. A 10-lb weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.
Collapse
|
80
|
Friedel JE, DeHart WB, Foreman AM, Andrew ME. A Monte Carlo method for comparing generalized estimating equations to conventional statistical techniques for discounting data. J Exp Anal Behav 2019; 111:207-224. [PMID: 30677137 DOI: 10.1002/jeab.497] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Accepted: 12/21/2018] [Indexed: 02/03/2023]
Abstract
Discounting is the process by which outcomes lose value. Much of discounting research has focused on differences in the degree of discounting across various groups. This research has relied heavily on conventional null hypothesis significance tests that are familiar to psychologists, such as t-tests and ANOVAs. As discounting research questions have become more complex by simultaneously focusing on within-subject and between-group differences, conventional statistical testing is often not appropriate for the obtained data. Generalized estimating equations (GEE) are one type of mixed-effects model that are designed to handle autocorrelated data, such as within-subject repeated-measures data, and are therefore more appropriate for discounting data. To determine if GEE provides similar results as conventional statistical tests, we compared the techniques across 2,000 simulated data sets. The data sets were created using a Monte Carlo method based on an existing data set. Across the simulated data sets, the GEE and the conventional statistical tests generally provided similar patterns of results. As the GEE and more conventional statistical tests provide the same pattern of result, we suggest researchers use the GEE because it was designed to handle data that has the structure that is typical of discounting data.
Collapse
|
81
|
Westling T, Juraska M, Seaton KE, Tomaras GD, Gilbert PB, Janes H. Methods for comparing durability of immune responses between vaccine regimens in early-phase trials. Stat Methods Med Res 2019; 29:78-93. [PMID: 30623732 DOI: 10.1177/0962280218820881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The ability to produce a long-lasting, or durable, immune response is a crucial characteristic of many highly effective vaccines. A goal of early-phase vaccine trials is often to compare the immune response durability of multiple tested vaccine regimens. One parameter for measuring immune response durability is the area under the mean post-peak log immune response profile. In this paper, we compare immune response durability across vaccine regimens within and between two phase I trials of DNA-primed HIV vaccine regimens, HVTN 094 and HVTN 096. We compare four estimators of this durability parameter and the resulting statistical inferences for comparing vaccine regimens. Two of these estimators use the trapezoid rule as an empirical approximation of the area under the marginal log response curve, and the other two estimators are based on linear and nonlinear models for the marginal mean log response. We conduct a simulation study to compare the four estimators, provide guidance on estimator selection, and use the nonlinear marginal mean model to analyze immunogenicity data from the two HIV vaccine trials.
Collapse
|
82
|
Comelli NC, Romero OE, Diez PA, Marinho CF, Schliserman P, Carrizo A, Ortiz EV, Duchowicz PR. QSAR Study of Biologically Active Essential Oils against Beetles Infesting the Walnut in Catamarca, Argentina. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2018; 66:12855-12865. [PMID: 30418029 DOI: 10.1021/acs.jafc.8b04161] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Essential oils from six species of aromatic plants collected in the Catamarca Province of Argentina were evaluated for their chemical composition and repellent and insecticidal activities against beetles of the genus Carpophilus (Coleoptera: Nitidulidae) and Oryzaephilus (Coleoptera: Silvanidae) that infest the local walnut production. Experimental data were analyzed using generalized estimating equations, with normal distribution and the identity link function. From the spectral information from the tested essential oils, we worked their molecular modeling as mixtures by developing mixture descriptors ( Dmix) that combined the molecular descriptor of each component in the mixture ( d i) and its relative concentration ( x i), i.e., Dmix = f( d i, x i). The application of chemoinformatic approaches determined that a combination of mixture descriptors related to molecular size, branchedness, charge distribution, and electronegativity were useful to explain the bioactivity profile against Carpophilus spp. and Oryzaephilus spp. The reported models were rigorously validated using stringent statistical parameters and essential oils reported with repellent activity against other beetle species from the Nitidulidae and Silvanidae families. This model confirmed each essential oil as a repellent with a comparable performance to the experimental reports.
Collapse
|
83
|
Xu C, Li Z, Xue Y, Zhang L, Wang M. An R package for model fitting, model selection and the simulation for longitudinal data with dropout missingness. COMMUN STAT-SIMUL C 2018; 48:2812-2829. [PMID: 32346220 PMCID: PMC7188076 DOI: 10.1080/03610918.2018.1468457] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 03/08/2018] [Accepted: 04/15/2018] [Indexed: 01/10/2023]
Abstract
Missing data arise frequently in clinical and epidemiological fields, in particular in longitudinal studies. This paper describes the core features of an R package wgeesel, which implements marginal model fitting (i.e., weighted generalized estimating equations, WGEE; doubly robust GEE) for longitudinal data with dropouts under the assumption of missing at random. More importantly, this package comprehensively provide existing information criteria for WGEE model selection on marginal mean or correlation structures. Also, it can serve as a valuable tool for simulating longitudinal data with missing outcomes. Lastly, a real data example and simulations are presented to illustrate and validate our package.
Collapse
|
84
|
Westgate PM. A readily available improvement over method of moments for intra-cluster correlation estimation in the context of cluster randomized trials and fitting a GEE-type marginal model for binary outcomes. Clin Trials 2018; 16:41-51. [PMID: 30295512 DOI: 10.1177/1740774518803635] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
BACKGROUND/AIMS Cluster randomized trials are popular in health-related research due to the need or desire to randomize clusters of subjects to different trial arms as opposed to randomizing each subject individually. As outcomes from subjects within the same cluster tend to be more alike than outcomes from subjects within other clusters, an exchangeable correlation arises that is measured via the intra-cluster correlation coefficient. Intra-cluster correlation coefficient estimation is especially important due to the increasing awareness of the need to publish such values from studies in order to help guide the design of future cluster randomized trials. Therefore, numerous methods have been proposed to accurately estimate the intra-cluster correlation coefficient, with much attention given to binary outcomes. As marginal models are often of interest, we focus on intra-cluster correlation coefficient estimation in the context of fitting such a model with binary outcomes using generalized estimating equations. Traditionally, intra-cluster correlation coefficient estimation with generalized estimating equations has been based on the method of moments, although such estimators can be negatively biased. Furthermore, alternative estimators that work well, such as the analysis of variance estimator, are not as readily applicable in the context of practical data analyses with generalized estimating equations. Therefore, in this article we assess, in terms of bias, the readily available residual pseudo-likelihood approach to intra-cluster correlation coefficient estimation with the GLIMMIX procedure of SAS (SAS Institute, Cary, NC). Furthermore, we study a possible corresponding approach to confidence interval construction for the intra-cluster correlation coefficient. METHODS We utilize a simulation study and application example to assess bias in intra-cluster correlation coefficient estimates obtained from GLIMMIX using residual pseudo-likelihood. This estimator is contrasted with method of moments and analysis of variance estimators which are standards of comparison. The approach to confidence interval construction is assessed by examining coverage probabilities. RESULTS Overall, the residual pseudo-likelihood estimator performs very well. It has considerably less bias than moment estimators, which are its competitor for general generalized estimating equation-based analyses, and therefore, it is a major improvement in practice. Furthermore, it works almost as well as analysis of variance estimators when they are applicable. Confidence intervals have near-nominal coverage when the intra-cluster correlation coefficient estimate has negligible bias. CONCLUSION Our results show that the residual pseudo-likelihood estimator is a good option for intra-cluster correlation coefficient estimation when conducting a generalized estimating equation-based analysis of binary outcome data arising from cluster randomized trials. The estimator is practical in that it is simply a result from fitting a marginal model with GLIMMIX, and a confidence interval can be easily obtained. An additional advantage is that, unlike most other options for performing generalized estimating equation-based analyses, GLIMMIX provides analysts the option to utilize small-sample adjustments that ensure valid inference.
Collapse
|
85
|
Tatsumi T, Ishida E, Tatsumi K, Okada Y, Saito T, Kubota T, Saito H. Advanced paternal age alone does not adversely affect pregnancy or live-birth rates or sperm parameters following intrauterine insemination. Reprod Med Biol 2018; 17:459-465. [PMID: 30377400 PMCID: PMC6194307 DOI: 10.1002/rmb2.12222] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Accepted: 06/30/2018] [Indexed: 11/12/2022] Open
Abstract
PURPOSE This study aimed to evaluate the effect of advanced paternal age on pregnancy outcomes and sperm parameters following intrauterine insemination (IUI). We used IUI data rather than assisted reproductive technology data, which might mask the effects of sperm impairments. METHODS We retrospectively analyzed 1576 IUI cycles in women under 40 years old between April 2012 and May 2016 at the National Center for Child Health and Development in Japan. The main outcomes were clinical pregnancy and live birth. RESULTS The mean male age was significantly lower in cycles that resulted in pregnancy compared with those without pregnancy (38.0 vs 39.1 years; P < 0.001), with a similar trend for live-birth cycles. However, there was no relationship between advanced paternal age and pregnancy outcomes after adjusting for confounding factors and correlations within patients using generalized estimating equations, and the age of the female partner was the only factor affecting pregnancy rate. Furthermore, advanced paternal age had no effect on sperm parameters. CONCLUSIONS Advanced paternal age alone does not adversely affect pregnancy or live-birth rates or sperm parameters following IUI.
Collapse
|
86
|
Chen IC, Westgate PM. A novel approach to selecting classification types for time-dependent covariates in the marginal analysis of longitudinal data. Stat Methods Med Res 2018; 28:3176-3186. [PMID: 30203725 DOI: 10.1177/0962280218799529] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Generalized estimating equations are routinely utilized for the marginal analysis of longitudinal data. In order to obtain consistent regression parameter estimates, these estimating equations must be unbiased. However, when certain types of time-dependent covariates are presented, these equations can be biased unless the working independence structure is used. Unfortunately, regression parameter estimation can be very inefficient with this structure because not all valid moment conditions are incorporated within the corresponding equations. Therefore, approaches have been proposed to utilize all valid moment conditions. However, these approaches assume that the data analyst knows the type of time-dependent covariate, although this likely is not the case in practice. Whereas hypothesis testing has been used to determine covariate type, we propose a novel strategy to select a working covariate type in order to avoid potentially high type II error rates with these hypothesis testing procedures. Parameter estimates resulting from our proposed method are consistent and have overall improved mean squared error relative to hypothesis testing approaches. Existing and proposed methods are compared in a simulation study and application example.
Collapse
|
87
|
Sabriá E, Lequerica-Fernández P, Lafuente-Ganuza P, Eguia-Ángeles E, Escudero AI, Martínez-Morillo E, Barceló C, Álvarez FV. Addition of N-terminal pro-B natriuretic peptide to soluble fms-like tyrosine kinase-1/placental growth factor ratio > 38 improves prediction of pre-eclampsia requiring delivery within 1 week: a longitudinal cohort study. ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2018; 51:758-767. [PMID: 29498431 DOI: 10.1002/uog.19040] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 01/19/2018] [Accepted: 02/19/2018] [Indexed: 06/08/2023]
Abstract
OBJECTIVE Short-term prediction of pre-eclampsia (PE) using the soluble fms-like tyrosine kinase-1 (sFlt-1)/placental growth factor (PlGF) ratio is characterized by frequent false-positive results. As such, no treatment can be recommended to test-positive patients and multiple measurements are often required. The aim of this study was to evaluate the effectiveness of N-terminal pro-B natriuretic peptide (NT-proBNP), uric acid and the sFlt-1/PlGF ratio for prediction of delivery with PE within 1 week in singleton pregnancies with suspected PE and sFlt-1/PlGF ratio > 38. METHODS This was a longitudinal prospective cohort study of singleton pregnancies presenting at 24 + 0 to 36 + 6 weeks of gestation with clinically suspected PE and sFlt-1/PlGF ratio > 38, enrolled between January 2015 and June 2017. Multiple samples per patient were allowed but were restricted to one sample per gestational week. From 495 enrolled patients, 270 blood samples from 134 patients were ultimately analyzed. By using generalized estimating equations (GEE), the best-fit model was selected for prediction of delivery with PE within 1 week. The predictive value of this model was then assessed using area under the paired-ROC curve (AUC) analysis. RESULTS The best-fit model included the sFlt-1/PlGF ratio, NT-proBNP and the gestational week at the time of the measurement. This combined model was compared with the GEE model based on the sFlt-1/PlGF ratio and the gestational week at the time of the measurement (reduced model). The AUC for the combined model was 0.845 (95% CI, 0.787-0.896), which was significantly greater (P = 0.011) than that of the reduced model (0.786 (95% CI, 0.722-0.844)). CONCLUSION The addition of NT-proBNP assessment improves the short-term prediction of delivery as a result of PE compared with sFlt-1/PlGF ratio alone, when the sFlt-1/PlGF ratio is > 38. This finding should be considered in future research on the assessment of short-term risk of delivery as a result of PE. Copyright © 2018 ISUOG. Published by John Wiley & Sons Ltd.
Collapse
|
88
|
Niu Y, Song L, Liu Y, Peng Y. Modeling clustered long-term survivors using marginal mixture cure model. Biom J 2018; 60:780-796. [PMID: 29733452 DOI: 10.1002/bimj.201700114] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 10/26/2017] [Accepted: 01/02/2018] [Indexed: 12/29/2022]
Abstract
There is a great deal of recent interests in modeling right-censored clustered survival time data with a possible fraction of cured subjects who are nonsusceptible to the event of interest using marginal mixture cure models. In this paper, we consider a semiparametric marginal mixture cure model for such data and propose to extend an existing generalized estimating equation approach by a new unbiased estimating equation for the regression parameters in the latency part of the model. The large sample properties of the regression effect estimators in both incidence and the latency parts are established. The finite sample properties of the estimators are studied in simulation studies. The proposed method is illustrated with a bone marrow transplantation data and a tonsil cancer data.
Collapse
|
89
|
Chen CS, Shen CW. Model selection based on resampling approaches for cluster longitudinal data with missingness in outcomes. Stat Med 2018; 37:2982-2997. [PMID: 29736918 DOI: 10.1002/sim.7801] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Revised: 04/04/2018] [Accepted: 04/04/2018] [Indexed: 11/07/2022]
Abstract
In medical and health studies, longitudinal and cluster longitudinal data are often collected, where the response variable of interest is observed repeatedly over time and along with a set of covariates. Model selection becomes an active research topic but has not been explored largely due to the complex correlation structure of the data set. To address this important issue, in this paper, we concentrate on model selection of cluster longitudinal data especially when data are subject to missingness. Motivated from the expected weighted quadratic loss of a given model, data perturbation and bootstrapping methods are used to estimate the loss and then the model that has the smallest expected loss is selected as the best model. To justify the proposed model selection method, we provide various numerical assessments and a real application regarding the asthma data set is also analyzed for illustration.
Collapse
|
90
|
Davenport CA, Maity A, Sullivan PF, Tzeng JY. A Powerful Test for SNP Effects on Multivariate Binary Outcomes using Kernel Machine Regression. STATISTICS IN BIOSCIENCES 2018; 10:117-138. [PMID: 30420901 PMCID: PMC6226013 DOI: 10.1007/s12561-017-9189-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 12/20/2016] [Accepted: 03/15/2017] [Indexed: 10/19/2022]
Abstract
Evaluating multiple binary outcomes is common in genetic studies of complex diseases. These outcomes are often correlated because they are collected from the same individual and they may share common marker effects. In this paper, we propose a procedure to test for effect of a SNP-set on multiple, possibly correlated, binary responses. We develop a score-based test using a nonparametric modeling framework that jointly models the global effect of the marker set. We account for the nonlinear effects and potentially complicated interaction between markers using reproducing kernels. Our testing procedure only requires estimation under the null hypothesis and we use multivariate generalized estimating equations (GEEs) to estimate the model components to account for the correlation among the outcomes. We evaluate finite sample performance of our test via simulation study and demonstrated our methods using the CATIE antibody study data and the CoLaus Study data.
Collapse
|
91
|
Proudfoot J, Faig W, Natarajan L, Xu R. A joint marginal-conditional model for multivariate longitudinal data. Stat Med 2018; 37:813-828. [PMID: 29205414 PMCID: PMC5799029 DOI: 10.1002/sim.7552] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2017] [Revised: 09/28/2017] [Accepted: 10/13/2017] [Indexed: 11/10/2022]
Abstract
Multivariate longitudinal data frequently arise in biomedical applications; however, their analyses are often performed one outcome at a time, or jointly using existing software in an ad hoc fashion. A main challenge in the proper analysis of such data is the fact that the different outcomes are measured on different unknown scales. Methodology for handling the scale problem has been previously proposed for cross-sectional data, and here we extend it to the longitudinal setting. We consider modeling the longitudinal data using random effects, while leaving the joint distribution of the multiple outcomes unspecified. We propose an estimating equation together with an expectation-maximization-type (expectation-substitution) algorithm. The consistency and the asymptotic distribution of the parameter estimates are established. The method is evaluated using extensive simulations and applied to a longitudinal nutrition data set from a large dietary intervention trial on breast cancer survivors, the Women's Healthy Eating and Living Study.
Collapse
|
92
|
Abstract
In resource-limited settings, long-term evaluation of national antiretroviral treatment (ART) programs often relies on aggregated data, the analysis of which may be subject to ecological bias. As researchers and policy makers consider evaluating individual-level outcomes such as treatment adherence or mortality, the well-known case-control design is appealing in that it provides efficiency gains over random sampling. In the context that motivates this article, valid estimation and inference requires acknowledging any clustering, although, to our knowledge, no statistical methods have been published for the analysis of case-control data for which the underlying population exhibits clustering. Furthermore, in the specific context of an ongoing collaboration in Malawi, rather than performing case-control sampling across all clinics, case-control sampling within clinics has been suggested as a more practical strategy. To our knowledge, although similar outcome-dependent sampling schemes have been described in the literature, a case-control design specific to correlated data settings is new. In this article, we describe this design, discuss balanced versus unbalanced sampling techniques, and provide a general approach to analyzing case-control studies in cluster-correlated settings based on inverse probability-weighted generalized estimating equations. Inference is based on a robust sandwich estimator with correlation parameters estimated to ensure appropriate accounting of the outcome-dependent sampling scheme. We conduct comprehensive simulations, based in part on real data on a sample of N = 78,155 program registrants in Malawi between 2005 and 2007, to evaluate small-sample operating characteristics and potential trade-offs associated with standard case-control sampling or when case-control sampling is performed within clusters.
Collapse
|
93
|
Schildcrout JS, Schisterman EF, Aldrich MC, Rathouz PJ. Outcome-related, Auxiliary Variable Sampling Designs for Longitudinal Binary Data. Epidemiology 2018; 29:58-66. [PMID: 29068841 PMCID: PMC5718926 DOI: 10.1097/ede.0000000000000765] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND Epidemiologists have long used case-control and related study designs to enhance variability of response and information available to estimate exposure-disease associations. Less has been done for longitudinal data. METHODS We discuss an epidemiological study design and analysis approach for longitudinal binary response data. We seek to gain statistical efficiency by oversampling relatively informative subjects for inclusion into the sample. In this methodological demonstration, we develop this concept by sampling repeatedly from an existing cohort study to estimate the relationship of chronic obstructive pulmonary disease to past-year smoking in a panel of baseline smokers. To account for oversampling, we describe a sequential offsetted regressions approach for valid inferences in this setting. RESULTS Targeted sampling can lead to increased statistical efficiency when combined with sequential offsetted regressions. Efficiency gains are degraded with increased prevalence of the disease response variable, with decreased association between the sampling variable and the response, and with other design and analysis parameters, providing guidance to those wishing to use these types of designs in the future. CONCLUSIONS These designs hold promise for efficient use of resources in longitudinal cohort studies.
Collapse
|
94
|
Jiang Z, Liu Y, Wahed AS, Molenberghs G. Joint modeling of multiple ordinal adherence outcomes via generalized estimating equations with flexible correlation structure. Stat Med 2017; 37:983-995. [PMID: 29235127 DOI: 10.1002/sim.7560] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 10/07/2017] [Accepted: 10/20/2017] [Indexed: 11/10/2022]
Abstract
Adherence to medication is critical in achieving effectiveness of many treatments. Factors that influence adherence behavior have been the subject of many clinical studies. Analyzing adherence is complicated because it is often measured on multiple drugs over a period, resulting in a multivariate longitudinal outcome. This paper is motivated by the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C study, where adherence is measured on two drugs as a bivariate ordinal longitudinal outcome. To analyze such outcome, we propose a joint model assuming the multivariate ordinal outcome arose from a partitioned latent multivariate normal process. We also provide a flexible multilevel association structure covering both between and within outcome correlation. In simulation studies, we show that the joint model provides unbiased estimators for regression parameters, which are more efficient than those obtained through fitting separate model for each outcome. The joint method also yields unbiased estimators for the correlation parameters when the correlation structure is correctly specified. Finally, we analyze the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C adherence data and discuss the findings.
Collapse
|
95
|
Ferreira DS, Kaushik S, Smith CL, Dharmage SC, Benke GP, Thompson BR, Walters EH, Wolfe R, Abramson MJ. Associations of atopy and asthma during aging of an adult population over a 20-year follow-up. J Asthma 2017; 55:994-1001. [PMID: 28976229 DOI: 10.1080/02770903.2017.1386669] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
OBJECTIVE Atopy is associated with asthma, but cross-sectional studies suggest this association may be weaker in older adults. It remains unclear if atopy predicts asthma later in adult life. We aimed to investigate whether atopy in young adults predicted asthma 20 years later and to quantify the contemporaneous relationship of atopy and asthma as adults age. METHODS Participants of the European Community Respiratory Health Survey (ECRHS) in Melbourne aged 20-44 years were followed for 20 years and completed questionnaires, skin prick tests (SPT) and allergen specific immunoglobulin E measurement at a baseline and two subsequent surveys. Using logistic regression and generalized estimating equations, we tested if atopy at baseline predicted current asthma later in life and estimated the association between current atopy measured at each survey and current asthma, while adjusting for potential confounders. RESULTS The analysis included 220 participants: 50.9% male. Mean (SD) age at baseline was 35.7 (5.7) years. Asthma and atopy prevalence remained stable over 20 years. Baseline atopy (SPT) was associated with current asthma (OR 9.74, 95%CI 4.22, 22.5) over 20 years, and current atopy (SPT) with concurrent asthma (3.12; 1.70, 5.74). CONCLUSIONS Atopy remains strongly associated with current asthma in 40 to 64 year-old adults, both prospectively and contemporaneously, but the prospective association is stronger.
Collapse
|
96
|
Huang J, Huang J, Chen Y, Ying GS. Evaluation of Approaches to Analyzing Continuous Correlated Eye Data When Sample Size Is Small. Ophthalmic Epidemiol 2017; 25:45-54. [PMID: 28891730 DOI: 10.1080/09286586.2017.1339809] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
PURPOSE To evaluate the performance of commonly used statistical methods for analyzing continuous correlated eye data when sample size is small. METHODS We simulated correlated continuous data from two designs: (1) two eyes of a subject in two comparison groups; (2) two eyes of a subject in the same comparison group, under various sample size (5-50), inter-eye correlation (0-0.75) and effect size (0-0.8). Simulated data were analyzed using paired t-test, two sample t-test, Wald test and score test using the generalized estimating equations (GEE) and F-test using linear mixed effects model (LMM). We compared type I error rates and statistical powers, and demonstrated analysis approaches through analyzing two real datasets. RESULTS In design 1, paired t-test and LMM perform better than GEE, with nominal type 1 error rate and higher statistical power. In design 2, no test performs uniformly well: two sample t-test (average of two eyes or a random eye) achieves better control of type I error but yields lower statistical power. In both designs, the GEE Wald test inflates type I error rate and GEE score test has lower power. CONCLUSION When sample size is small, some commonly used statistical methods do not perform well. Paired t-test and LMM perform best when two eyes of a subject are in two different comparison groups, and t-test using the average of two eyes performs best when the two eyes are in the same comparison group. When selecting the appropriate analysis approach the study design should be considered.
Collapse
|
97
|
Movassagh EZ, Baxter-Jones ADG, Kontulainen S, Whiting SJ, Vatanparast H. Tracking Dietary Patterns over 20 Years from Childhood through Adolescence into Young Adulthood: The Saskatchewan Pediatric Bone Mineral Accrual Study. Nutrients 2017; 9:nu9090990. [PMID: 28885565 PMCID: PMC5622750 DOI: 10.3390/nu9090990] [Citation(s) in RCA: 158] [Impact Index Per Article: 22.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Revised: 08/31/2017] [Accepted: 09/05/2017] [Indexed: 02/06/2023] Open
Abstract
Dietary patterns established during adolescence might play a role in adulthood disease. We examined the stability of dietary patterns (DPs) from childhood through adolescence and into young adulthood (from age 8 to 34 years). Data from 130 participants (53 females) of Saskatchewan Pediatric Bone Mineral Accrual Study (aged 8–15 years, at baseline) were included. Multiple 24-h recalls were collected annually from 1991 to 1997, 2002 to 2005, and 2010 and 2011. Using principal component analysis, “Vegetarian-style”, “Western-like”, “High-fat, high-protein”, “Mixed”, and “Snack” DPs were derived at baseline. Applied DP scores for all annual measurements were calculated using factor loading of baseline DPs and energy-adjusted food group intakes. We analyzed data using generalized estimating equations. The tracking coefficient represents correlation between baseline dietary pattern scores and all other follow-up dietary pattern scores. We found a moderate tracking for the “Vegetarian-style” (β = 0.44, p < 0.001) and “High-fat, high-protein” (β = 0.39, p < 0.001) DPs in females and “Vegetarian-style” DP (β = 0.30, p < 0.001) in males. The remaining DPs showed poor-to-fair tracking in both sexes. No tracking for “Western-like” DP in females was observed. Assessing overall change in DP scores from childhood to young adulthood showed an increasing trend in adherence to “Vegetarian-style” DP and decreasing trend in adherence to “High-fat, high-protein” DP by age in both sexes (p < 0.001), while “Western-like” and “Mixed” DP scores increased only in males (p < 0.001). These findings suggest that healthy dietary habits established during childhood and adolescence moderately continue into adulthood.
Collapse
|
98
|
King KM, Pedersen SL, Louie KT, Pelham WE, Molina BS. Between- and within-person associations between negative life events and alcohol outcomes in adolescents with ADHD. PSYCHOLOGY OF ADDICTIVE BEHAVIORS 2017; 31:699-711. [PMID: 28703610 PMCID: PMC5593772 DOI: 10.1037/adb0000295] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Escalations in alcohol use during adolescence may be linked with exposure to negative life events, but most of this research has focused on between-person associations. Moreover, adolescents with attention-deficit hyperactivity disorder (ADHD) may be an especially vulnerable population, reporting more life events and alcohol involvement and may even be more sensitive to the effects of life events on alcohol outcomes compared with those without ADHD. We tested the between- and within-person effects of the number and perceptions of negative life events on the development of alcohol use outcomes from age 14 to 17 years in 259 adolescents with and without ADHD using generalized estimating equations. Between-person differences in exposure to negative life events across adolescence, but not the perception of those events, were associated with a higher likelihood of alcohol use and drunkenness at age 17 years. Within-person differences in life events were associated with alcohol use above and beyond that predicted by an adolescents' typical trajectory over time. Parent- and teacher-reported ADHD symptoms were associated with more negative perceptions of life events and with greater alcohol use and drunkenness at age 17 years, but symptoms did not moderate the life event-alcohol association. Interventions should consider the variables that produce vulnerability to life events as well as the immediate impact of life events. That the accumulation of life events, rather than their perceived negativity, was associated with alcohol outcomes indicates that interventions targeting the reduction of negative events, rather than emotional response, may be more protective against alcohol use in adolescence. (PsycINFO Database Record
Collapse
|
99
|
Li F, Turner EL, Heagerty PJ, Murray DM, Vollmer WM, DeLong ER. An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes. Stat Med 2017; 36:3791-3806. [PMID: 28786223 DOI: 10.1002/sim.7410] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Revised: 06/07/2017] [Accepted: 06/20/2017] [Indexed: 01/18/2023]
Abstract
Group-randomized trials are randomized studies that allocate intact groups of individuals to different comparison arms. A frequent practical limitation to adopting such research designs is that only a limited number of groups may be available, and therefore, simple randomization is unable to adequately balance multiple group-level covariates between arms. Therefore, covariate-based constrained randomization was proposed as an allocation technique to achieve balance. Constrained randomization involves generating a large number of possible allocation schemes, calculating a balance score that assesses covariate imbalance, limiting the randomization space to a prespecified percentage of candidate allocations, and randomly selecting one scheme to implement. When the outcome is binary, a number of statistical issues arise regarding the potential advantages of such designs in making inference. In particular, properties found for continuous outcomes may not directly apply, and additional variations on statistical tests are available. Motivated by two recent trials, we conduct a series of Monte Carlo simulations to evaluate the statistical properties of model-based and randomization-based tests under both simple and constrained randomization designs, with varying degrees of analysis-based covariate adjustment. Our results indicate that constrained randomization improves the power of the linearization F-test, the KC-corrected GEE t-test (Kauermann and Carroll, 2001, Journal of the American Statistical Association 96, 1387-1396), and two permutation tests when the prognostic group-level variables are controlled for in the analysis and the size of randomization space is reasonably small. We also demonstrate that constrained randomization reduces power loss from redundant analysis-based adjustment for non-prognostic covariates. Design considerations such as the choice of the balance metric and the size of randomization space are discussed.
Collapse
|
100
|
Riemer V, Frommel J, Layher G, Neumann H, Schrader C. Identifying Features of Bodily Expression As Indicators of Emotional Experience during Multimedia Learning. Front Psychol 2017; 8:1303. [PMID: 28798717 PMCID: PMC5529426 DOI: 10.3389/fpsyg.2017.01303] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 07/17/2017] [Indexed: 11/30/2022] Open
Abstract
The importance of emotions experienced by learners during their interaction with multimedia learning systems, such as serious games, underscores the need to identify sources of information that allow the recognition of learners’ emotional experience without interrupting the learning process. Bodily expression is gaining in attention as one of these sources of information. However, to date, the question of how bodily expression can convey different emotions has largely been addressed in research relying on acted emotion displays. Following a more contextualized approach, the present study aims to identify features of bodily expression (i.e., posture and activity of the upper body and the head) that relate to genuine emotional experience during interaction with a serious game. In a multimethod approach, 70 undergraduates played a serious game relating to financial education while their bodily expression was captured using an off-the-shelf depth-image sensor (Microsoft Kinect). In addition, self-reports of experienced enjoyment, boredom, and frustration were collected repeatedly during gameplay, to address the dynamic changes in emotions occurring in educational tasks. Results showed that, firstly, the intensities of all emotions indeed changed significantly over the course of the game. Secondly, by using generalized estimating equations, distinct features of bodily expression could be identified as significant indicators for each emotion under investigation. A participant keeping their head more turned to the right was positively related to frustration being experienced, whereas keeping their head more turned to the left was positively related to enjoyment. Furthermore, having their upper body positioned more closely to the gaming screen was also positively related to frustration. Finally, increased activity of a participant’s head emerged as a significant indicator of boredom being experienced. These results confirm the value of bodily expression as an indicator of emotional experience in multimedia learning systems. Furthermore, the findings may guide developers of emotion recognition procedures by focusing on the identified features of bodily expression.
Collapse
|