1
|
Herzog JM, Verkade A, Sick V. Quantitative and Rapid In Vivo Imaging of Human Lenticular Fluorescence. Invest Ophthalmol Vis Sci 2024; 65:41. [PMID: 39565304 PMCID: PMC11585068 DOI: 10.1167/iovs.65.13.41] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Accepted: 09/19/2024] [Indexed: 11/21/2024] Open
Abstract
Purpose To quantitatively investigate the chemical origins of near-UV excited fluorescence in the crystalline lens, and demonstrate the potential usefulness of a rapid and noninvasive diagnostic approach for screening and monitoring of lens damage. Methods Anterior segment UV fluorescence imaging was applied to a population of 30 healthy adults, ages 18 to 64 years. Absolute fluorescence intensities and intensity ratios were compared across the population as a function of age. Fluorescence quantum yield (FQY) was calculated from imaging results based on a previous radiometric characterization. Results Typical FQYs at 365 nm excitation are approximately 0.2% for healthy adults. Intensity and FQY were observed to increase significantly with age, consistent with ex vivo and confocal microscopy studies. The ratio of blue to green fluorescence is strongly correlated with FQY and age, suggesting that both increases in fluorophore concentration and changes in composition occur with age. Fluorescence data is quantitatively and qualitatively consistent with pyridine nucleotides in young adults, and changes with age are consistent with formation of β-carbolines or advanced glycation end products. Intralens variation is consistent with increased oxidation or glycation in the lens nucleus relative to the cortex. Conclusions Lenticular fluorescence can be measured rapidly, accurately, and quantitatively in vivo which provides a spatially resolved, quantitative measure of lens chemistry, including damage from oxidation and glycation. Careful interpretation of fluorescence intensities and intensity ratios can provide chemical insight into lens health. Anterior segment UV fluorescence imaging can thus serve as a useful tool for screening, monitoring, and research of lens damage and cataract formation.
Collapse
Affiliation(s)
- Joshua M. Herzog
- Department of Mechanical Engineering, University of Michigan, Michigan, United States
| | - Angela Verkade
- Department of Ophthalmology and Visual Sciences, University of Michigan, Michigan, United States
| | - Volker Sick
- Department of Mechanical Engineering, University of Michigan, Michigan, United States
| |
Collapse
|
2
|
Yaremych HE, Preacher KJ. Understanding the Consequences of Collinearity for Multilevel Models: The Importance of Disaggregation Across Levels. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:693-715. [PMID: 38721945 DOI: 10.1080/00273171.2024.2315549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
In multilevel models, disaggregating predictors into level-specific parts (typically accomplished via centering) benefits parameter estimates and their interpretations. However, the importance of level-specificity has been sparsely addressed in multilevel literature concerning collinearity. In this study, we develop novel insights into the interactivity of centering and collinearity in multilevel models. After integrating the broad literatures on centering and collinearity, we review level-specific and conflated correlations in multilevel data. Next, by deriving formal relationships between predictor collinearity and multilevel model estimates, we demonstrate how the consequences of collinearity change across different centering specifications and identify data characteristics that may exacerbate or mitigate those consequences. We show that when all or some level-1 predictors are uncentered, slope estimates can be greatly biased by collinearity. Disaggregation of all predictors eliminates the possibility that fixed effect estimates will be biased due to collinearity alone; however, under some data conditions, collinearity is associated with biased standard errors and random effect (co)variance estimates. Finally, we illustrate the importance of disaggregation for diagnosing collinearity in multilevel data and provide recommendations for the use of level-specific collinearity diagnostics. Overall, the necessity of disaggregation for identifying and managing collinearity's consequences in multilevel models is clarified in novel ways.
Collapse
Affiliation(s)
- Haley E Yaremych
- Department of Psychology & Human Development, Vanderbilt University
| | | |
Collapse
|
3
|
Guo Y, Dhaliwal J, Rights JD. Disaggregating level-specific effects in cross-classified multilevel models. Behav Res Methods 2024; 56:3023-3057. [PMID: 37993674 DOI: 10.3758/s13428-023-02238-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2023] [Indexed: 11/24/2023]
Abstract
In psychology and other fields, data often have a cross-classified structure, whereby observations are nested within multiple types of non-hierarchical clusters (e.g., repeated measures cross-classified by persons and stimuli). This paper discusses ways that, in cross-classified multilevel models, slopes of lower-level predictors can implicitly reflect an ambiguous blend of multiple effects (for instance, a purely observation-level effect as well as a unique between-cluster effect for each type of cluster). The possibility of conflating multiple effects of lower-level predictors is well recognized for non-cross-classified multilevel models, but has not been fully discussed or clarified for cross-classified contexts. Consequently, in published cross-classified modeling applications, this possibility is almost always ignored, and researchers routinely specify models that conflate multiple effects. In this paper, we show why this common practice can be problematic, and show how to disaggregate level-specific effects in cross-classified models. We provide a novel suite of options that include fully cluster-mean-centered, partially cluster-mean-centered, and contextual effect models, each of which provides a unique interpretation of model parameters. We further clarify how to avoid both fixed and random conflation, the latter of which is widely misunderstood even in non-cross-classified models. We provide simulation results showing the possible deleterious impact of such conflation in cross-classified models, and walk through pedagogical examples to illustrate the disaggregation of level-specific effects. We conclude by considering additional model complexities that can arise with cross-classification, providing guidance for researchers in choosing among model specifications, and describing newly available software to aid researchers who wish to disaggregate effects in practice.
Collapse
Affiliation(s)
- Yingchi Guo
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T1Z4, Canada.
| | - Jeneesha Dhaliwal
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T1Z4, Canada
| | - Jason D Rights
- Department of Psychology, University of British Columbia, 2136 West Mall, Vancouver, BC, V6T1Z4, Canada
| |
Collapse
|
4
|
Rights JD, Sterba SK. On the Common but Problematic Specification of Conflated Random Slopes in Multilevel Models. MULTIVARIATE BEHAVIORAL RESEARCH 2023; 58:1106-1133. [PMID: 37038722 DOI: 10.1080/00273171.2023.2174490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
For multilevel models (MLMs) with fixed slopes, it has been widely recognized that a level-1 variable can have distinct between-cluster and within-cluster fixed effects, and that failing to disaggregate these effects yields a conflated, uninterpretable fixed effect. For MLMs with random slopes, however, we clarify that two different types of slope conflation can occur: that of the fixed component (termed fixed conflation) and that of the random component (termed random conflation). The latter is rarely recognized and not well understood. Here we explain that a model commonly used to disaggregate the fixed component-the contextual effect model with random slopes-troublingly still yields a conflated random component. Negative consequences of such random conflation have not been demonstrated. Here we show that they include erroneous interpretation and inferences about the substantively important extent of between-cluster differences in slopes, including either underestimating or overestimating such slope heterogeneity. Furthermore, we show that this random conflation can yield inappropriate standard errors for fixed effects. To aid researchers in practice, we delineate which types of random slope specifications yield an unconflated random component. We demonstrate the advantages of these unconflated models in terms of estimating and testing random slope variance (i.e., improved power, Type I error, and bias) and in terms of standard error estimation for fixed effects (i.e., more accurate standard errors), and make recommendations for which specifications to use for particular research purposes.
Collapse
|
5
|
Liu J, Liu L, James AS, Colditz GA. An overview of optimal designs under a given budget in cluster randomized trials with a binary outcome. Stat Methods Med Res 2023; 32:1420-1441. [PMID: 37284817 PMCID: PMC11020688 DOI: 10.1177/09622802231172026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Cluster randomized trial design may raise financial concerns because the cost to recruit an additional cluster is much higher than to enroll an additional subject in subject-level randomized trials. Therefore, it is desirable to develop an optimal design. For local optimal designs, optimization means the minimum variance of the estimated treatment effect under the total budget. The local optimal design derived from the variance needs the input of an association parameter ρ in terms of a "working" correlation structure R ( ρ ) in the generalized estimating equation models. When the range of ρ instead of an exact value is available, the parameter space is defined as the range of ρ and the design space is defined as enrollment feasibility, for example, the number of clusters or cluster size. For any value ρ within the range, the optimal design and relative efficiency for each design in the design space is obtained. Then, for each design in the design space, the minimum relative efficiency within the parameter space is calculated. MaxiMin design is the optimal design that maximizes the minimum relative efficiency among all designs in the design space. Our contributions are threefold. First, for three common measures (risk difference, risk ratio, and odds ratio), we summarize all available local optimal designs and MaxiMin designs utilizing generalized estimating equation models when the group allocation proportion is predetermined for two-level and three-level parallel cluster randomized trials. We then propose the local optimal designs and MaxiMin designs using the same models when the group allocation proportion is undecided. Second, for partially nested designs, we develop the optimal designs for three common measures under the setting of equal number of subjects per cluster and exchangeable working correlation structure in the intervention group. Third, we create three new Statistical Analysis System (SAS) macros and update two existing SAS macros for all the optimal designs. We provide two examples to illustrate our methods.
Collapse
Affiliation(s)
- Jingxia Liu
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine (WUSM), St Louis, Missouri, USA
- Division of Biostatistics, Washington University School of Medicine (WUSM), St Louis, Missouri, USA
| | - Lei Liu
- Division of Biostatistics, Washington University School of Medicine (WUSM), St Louis, Missouri, USA
| | - Aimee S James
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine (WUSM), St Louis, Missouri, USA
| | - Graham A Colditz
- Division of Public Health Sciences, Department of Surgery, Washington University School of Medicine (WUSM), St Louis, Missouri, USA
| |
Collapse
|
6
|
Sanders EA, Konold TR. X matters too: How the blended slope problem manifests differently in unilevel vs. multilevel models. METHODOLOGY-EUROPEAN JOURNAL OF RESEARCH METHODS FOR THE BEHAVIORAL AND SOCIAL SCIENCES 2023. [DOI: 10.5964/meth.9925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023] Open
Abstract
Aside from multilevel models (MLMs), several analytic approaches are available for handling cluster-induced dependencies. Nevertheless, the literature on MLM alternatives has called less explicit attention to the potential bias in level-1 (L1) slope coefficients resulting from the “blended slope” problem—a problem that arises when dependencies in predictors (Xs) exist and when L1 predictor-outcome (X-Y) relations differ from those at level-2 (L2). As such, applied researchers may be drawing incorrect inferences about their L1 predictor effects when they specify models without considering clustering in Xs. The present paper reviews this “blended slope” problem and uses Monte Carlo simulation to illustrate how the problem manifests more for unilevel models compared with MLMs. In short, analyses of clustered data should always: 1) report outcome and predictor ICCs, 2) cluster-mean center L1 predictors or incorporate L2 aggregate predictors, and 3) employ a model that takes clustered residuals into account.
Collapse
|
7
|
Sutradhar BC. Regression analysis for exponential family data in a finite population setup using two-stage cluster sample. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00850-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
8
|
Baker DL, Cummings K, Smolkowski K. Diagnostic accuracy of Spanish and English screeners with Spanish and English criterion measures for bilingual students in Grades 1 and 2. J Sch Psychol 2022; 92:299-323. [PMID: 35618376 DOI: 10.1016/j.jsp.2022.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 11/13/2021] [Accepted: 04/05/2022] [Indexed: 01/04/2023]
Abstract
The purpose of this study was to examine the diagnostic accuracy of English and Spanish language screeners when predicting reading comprehension outcomes in both languages at the end of Grade 1 and Grade 2. Participants were 1221 Latino/a bilingual students in Grade 1 and 1004 in Grade 2 who were attending bilingual programs in the Pacific Northwest and in Texas. We used ROC curve analyses to calculate the area under the curve (AUC; A) for each measure. The decision thresholds we selected resulted in 71% of all comparisons having accuracy of at least 0.75. Letter naming, decoding, and oral reading fluency in Spanish were accurate in predicting reading risk on criterion measures in Spanish and in English in Grades 1 and 2 (A value of 0.75 or above). English screeners, however, only predicted reading risk on the English criterion measure, but not on the Spanish criterion measure, with a few exceptions. Implications for practice and future research are discussed.
Collapse
|
9
|
Sutradhar BC. Cluster Correlations and Complexity in Binary Regression Analysis Using Two-stage Cluster Samples. SANKHYA A 2022. [DOI: 10.1007/s13171-022-00281-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Lyu Z, Welsh A. Increasing cluster size asymptotics for nested error regression models. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2021.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
11
|
Ntani G, Inskip H, Osmond C, Coggon D. Consequences of ignoring clustering in linear regression. BMC Med Res Methodol 2021; 21:139. [PMID: 34233609 PMCID: PMC8265092 DOI: 10.1186/s12874-021-01333-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Accepted: 06/15/2021] [Indexed: 11/25/2022] Open
Abstract
Background Clustering of observations is a common phenomenon in epidemiological and clinical research. Previous studies have highlighted the importance of using multilevel analysis to account for such clustering, but in practice, methods ignoring clustering are often employed. We used simulated data to explore the circumstances in which failure to account for clustering in linear regression could lead to importantly erroneous conclusions. Methods We simulated data following the random-intercept model specification under different scenarios of clustering of a continuous outcome and a single continuous or binary explanatory variable. We fitted random-intercept (RI) and ordinary least squares (OLS) models and compared effect estimates with the “true” value that had been used in simulation. We also assessed the relative precision of effect estimates, and explored the extent to which coverage by 95% confidence intervals and Type I error rates were appropriate. Results We found that effect estimates from both types of regression model were on average unbiased. However, deviations from the “true” value were greater when the outcome variable was more clustered. For a continuous explanatory variable, they tended also to be greater for the OLS than the RI model, and when the explanatory variable was less clustered. The precision of effect estimates from the OLS model was overestimated when the explanatory variable varied more between than within clusters, and was somewhat underestimated when the explanatory variable was less clustered. The cluster-unadjusted model gave poor coverage rates by 95% confidence intervals and high Type I error rates when the explanatory variable was continuous. With a binary explanatory variable, coverage rates by 95% confidence intervals and Type I error rates deviated from nominal values when the outcome variable was more clustered, but the direction of the deviation varied according to the overall prevalence of the explanatory variable, and the extent to which it was clustered. Conclusions In this study we identified circumstances in which application of an OLS regression model to clustered data is more likely to mislead statistical inference. The potential for error is greatest when the explanatory variable is continuous, and the outcome variable more clustered (intraclass correlation coefficient is ≥ 0.01). Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01333-7.
Collapse
Affiliation(s)
- Georgia Ntani
- Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK. .,Medical Research Council Versus Arthritis Centre for Musculoskeletal Health and Work, Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK.
| | - Hazel Inskip
- Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK.,NIHR Southampton Biomedical Research Centre, University of Southampton and University Hospital Southampton NHS Foundation Trust, Southampton, UK
| | - Clive Osmond
- Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK
| | - David Coggon
- Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK.,Medical Research Council Versus Arthritis Centre for Musculoskeletal Health and Work, Medical Research Council Lifecourse Epidemiology Unit, University of Southampton, Southampton, UK
| |
Collapse
|
12
|
Olali AZ, Sharma A, Shi Q, Hoover DR, Weber KM, French AL, McKay HS, Tien PC, Al-Harthi L, Yin MT, Ross RD. Change in Circulating Undercarboxylated Osteocalcin (ucOCN) Is Associated With Fat Accumulation in HIV-Seropositive Women. J Acquir Immune Defic Syndr 2021; 86:e139-e145. [PMID: 33399313 PMCID: PMC7933097 DOI: 10.1097/qai.0000000000002617] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 12/22/2020] [Indexed: 11/25/2022]
Abstract
BACKGROUND Bone mineral density loss and fat accumulation are common in people living with HIV. The bone-derived hormone, undercarboxylated osteocalcin (ucOCN) regulates fat metabolism. We investigated the relationship between ucOCN change and body fat change among perimenopausal/postmenopausal HIV-seronegative and HIV-seropositive women on long-term antiretrovirals. METHODS Perimenopausal and postmenopausal women enrolled in the Women's Interagency HIV Study MSK substudy underwent trunk and total fat assessment by dual energy x-ray absorptiometry (DXA) at study enrollment (index visit) and again 2 years later. Circulating ucOCN and cOCN were also measured at the index and 2-year visits. The correlation between the 2-year change in ucOCN and cOCN and change in trunk and total fat was assessed as a function of HIV serostatus using linear regression modeling. Multivariate linear regression assessed the association between ucOCN and cOCN change and total and trunk fat change after adjusting for sociodemographic variables. Linear regression models restricted to HIV-seropositive women were performed to examine the contributions of HIV-specific factors (index CD4 count, viral load, and combined antiretroviral therapy use) on the associations. RESULTS Increased ucOCN over the 2-year follow-up was associated with less trunk and total fat accumulation in models adjusting for HIV serostatus and participants sociodemographics, whereas there was no association with cOCN and the fat parameters. None of the HIV-specific factors evaluated influenced the association between ucOCN and fat parameters. CONCLUSION The current study suggests that increases in ucOCN are associated with decreased fat accumulation in HIV-seronegative and HIV-seropositive postmenopausal women on long-term antiretroviral therapy.
Collapse
Affiliation(s)
- Arnold Z. Olali
- Department of Cell & Molecular Medicine, Rush University Medical Center, Chicago, IL
- Department of Microbial Pathogens and immunity, Rush University Medical Center, Chicago, IL
| | | | - Qiuhu Shi
- New York Medical College, Valhalla, NY
| | - Donald R. Hoover
- Department of Statistics and Institute for Health, Health Care Policy and Aging Research, Rutgers University, Piscataway, NJ
| | - Kathleen M. Weber
- Cook County Health/CORE Center and Hektoen Institute of Medicine, Chicago, IL
| | - Audrey L. French
- Department of Medicine, Stroger Hospital of Cook County/CORE Center, Rush University, Chicago, IL
| | - Heather S. McKay
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD
| | - Phyllis C. Tien
- Department of Medicine, University of California, San Francisco and Medical Service, Department of Veteran Affairs Medical Center, San Francisco, CA
| | - Lena Al-Harthi
- Department of Microbial Pathogens and immunity, Rush University Medical Center, Chicago, IL
| | | | - Ryan D. Ross
- Department of Cell & Molecular Medicine, Rush University Medical Center, Chicago, IL
| |
Collapse
|
13
|
Ji B, He X, Zhai J, Zhang Y, Man VH, Wang J. Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction. Brief Bioinform 2021; 22:6184410. [PMID: 33758923 DOI: 10.1093/bib/bbab054] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 01/06/2021] [Accepted: 02/02/2021] [Indexed: 01/01/2023] Open
Abstract
Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.
Collapse
Affiliation(s)
- Beihong Ji
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Xibing He
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Jingchen Zhai
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yuzhao Zhang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Viet Hoang Man
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, PA 15261, USA
| |
Collapse
|
14
|
Rights JD, Preacher KJ, Cole DA. The danger of conflating level-specific effects of control variables when primary interest lies in level-2 effects. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2020; 73 Suppl 1:194-211. [PMID: 31853965 DOI: 10.1111/bmsp.12194] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 10/21/2019] [Indexed: 06/10/2023]
Abstract
In the multilevel modelling literature, methodologists widely acknowledge that a level-1 variable can have distinct within-cluster and between-cluster effects, and that failing to disaggregate these can yield a slope estimate that is an uninterpretable, conflated blend of the two. Methodologists have stated, however, that including conflated slopes of level-1 variables in a model is not problematic if substantive interest lies only in effects of level-2 predictors. Researchers commonly follow this advice and use methods that do not disaggregate effects of level-1 control variables (e.g., grand mean centering) when examining effects of level-2 predictors. The primary purpose of this paper is to show that this is a dangerous practice. When level-specific effects of level-1 variables differ, failing to disaggregate them can severely bias estimation of level-2 predictor slopes. We show mathematically why this is the case and highlight factors that can exacerbate such bias. We corroborate these findings with simulations and present an empirical example, showing how such distortions can severely alter substantive conclusions. We ultimately recommend that simply including the cluster mean of the level-1 variable as a control will alleviate the problem.
Collapse
Affiliation(s)
- Jason D Rights
- University of British Columbia, Vancouver, British Columbia, Canada
| | | | - David A Cole
- Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
15
|
Mukandwal PS, Cantor DE, Grimm CM, Elking I, Hofer C. Do Firms Spend More on Suppliers That Have Environmental Expertise? An Empirical Study of U.S. Manufacturers’ Procurement Spend. JOURNAL OF BUSINESS LOGISTICS 2020. [DOI: 10.1111/jbl.12248] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
16
|
Dong N, Stuart EA, Lenis D, Quynh Nguyen T. Using Propensity Score Analysis of Survey Data to Estimate Population Average Treatment Effects: A Case Study Comparing Different Methods. EVALUATION REVIEW 2020; 44:84-108. [PMID: 32672113 DOI: 10.1177/0193841x20938497] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
BACKGROUND Many studies in psychological and educational research aim to estimate population average treatment effects (PATE) using data from large complex survey samples, and many of these studies use propensity score methods. Recent advances have investigated how to incorporate survey weights with propensity score methods. However, to this point, that work had not been well summarized, and it was not clear how much difference the different PATE estimation methods would make empirically. PURPOSE The purpose of this study is to systematically summarize the appropriate use of survey weights in propensity score analysis of complex survey data and use a case study to empirically compare the PATE estimates using multiple analysis methods that include ordinary least squares regression, weighted least squares regression, and various propensity score applications. METHODS We first summarize various propensity score methods that handle survey weights. We then demonstrate the performance of various analysis methods using a nationally representative data set, the Early Childhood Longitudinal Study-Kindergarten to estimate the effects of preschool on children's academic achievement. The correspondence of the results was evaluated using multiple criteria. RESULTS AND CONCLUSIONS It is important for researchers to think carefully about their estimand of interest and use methods appropriate for that estimand. If interest is in drawing inferences to the survey target population, it is important to take the survey weights into account, particularly in the outcome analysis stage for estimating the PATE. The case study shows, however, not much difference among various analysis methods in one applied example.
Collapse
Affiliation(s)
- Nianbo Dong
- School of Education, University of North Carolina at Chapel Hill, NC, USA
| | - Elizabeth A Stuart
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Trang Quynh Nguyen
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| |
Collapse
|
17
|
Yoon HJ, Welsh AH. On the effect of ignoring correlation in the covariates when fitting linear mixed models. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2019.04.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
18
|
Hoover DR, Shi Q, Burstyn I, Anastos K. Repeated Measures Regression in Laboratory, Clinical and Environmental Research: Common Misconceptions in the Matter of Different Within- and between-Subject Slopes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:E504. [PMID: 30754731 PMCID: PMC6388388 DOI: 10.3390/ijerph16030504] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Revised: 02/04/2019] [Accepted: 02/06/2019] [Indexed: 11/16/2022]
Abstract
When using repeated measures linear regression models to make causal inference in laboratory, clinical and environmental research, it is typically assumed that the within-subject association of differences (or changes) in predictor variable values across replicates is the same as the between-subject association of differences in those predictor variable values. However, this is often false. For example, with body weight as the predictor variable and blood cholesterol (which increases with higher body fat) as the outcome: (i) a 10-lb weight increase in the same adult affects more greatly an increase in cholesterol in that adult than does (ii) one adult weighing 10 lbs more than a second indicate higher cholesterol in the heavier adult. A 10-lb weight gain in the first adult more likely reflects a build-up of body fat in that person, while a second person being 10 lbs heavier than the first could be influenced by other factors, such as the second person being taller. Hence, to make causal inferences, different within- and between-subject slopes should be separately modeled. A related misconception commonly made using generalized estimation equations (GEE) and mixed models on repeated measures (i.e., for fitting cross-sectional regression) is that the working correlation structure only influences variance of the parameter estimates. However, only independence working correlation guarantees that the modeled parameters have interpretability. We illustrate this with an example where changing working correlation from independence to equicorrelation qualitatively biases parameters of GEE models and show that this happens because within- and between-subject slopes for the outcomes regressed on the predictor variables differ. We then systematically describe several common mechanisms that cause within- and between-subject slopes to differ: change effects, lag/reverse-lag and spillover causality, shared within-subject measurement bias or confounding, and predictor variable measurement error. The misconceptions we describe should be better publicized. Repeated measures analyses should compare within- and between-subject slopes of predictors and when they do differ, investigate the causal reasons for this.
Collapse
Affiliation(s)
- Donald R Hoover
- Department of Statistics and Biostatistics and Institute for Health, Health Care Policy and Aging Research, Rutgers University, Piscataway, NJ 08854, USA.
| | - Qiuhu Shi
- School of Health Sciences and Practice, New York Medical College, Valhalla, NY 10595, USA.
| | - Igor Burstyn
- Environmental and Occupational Health Dornsife School of Public Health, Philadelphia, PA 19104, USA.
| | - Kathryn Anastos
- Albert Einstein College of Medicine, Montefiore Medical Center, Bronx, NY 10467, USA.
| |
Collapse
|
19
|
Bhuyan P, Biswas J, Ghosh P, Das K. A Bayesian two-stage regression approach of analysing longitudinal outcomes with endogeneity and incompleteness. STAT MODEL 2018. [DOI: 10.1177/1471082x17747806] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Abstract: Two-stage regression methods are typically used for handling endogeneity in the simultaneous equations models in economics and other social sciences. However, the problem is challenging in the presence of incomplete response and/or incomplete endogenous covariate(s). We propose a Bayesian approach for the joint modelling of incomplete longitudinal continuous response and an incomplete count endogenous covariate, where the incompleteness is caused by the censorship through a selection mechanism. We define latent continuous variables which are left-censored at zero and develop a Gibbs sampling algorithm for the simultaneous estimation of the model parameters. We consider partially varying coefficients regression models containing covariates with fixed and time-varying effects on the response. Our work is motivated by a sample dataset from the Health and Retirement Study (HRS) for modelling the out-of-pocket medical cost, where the number of hospital admissions is considered as an endogenous covariate. Our analysis addresses some of the previously unanswered questions on the physical and financial health of the older population based on HRS data. Simulation studies are performed for assessing the usefulness of the proposed method compared to its competitors.
Collapse
Affiliation(s)
- Prajamitra Bhuyan
- Applied Statistics Unit, Indian Statistical Institute, Kolkata, India
| | - Jayabrata Biswas
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
| | - Pulak Ghosh
- Department of Quantitative Methods and Information Sciences, Indian Institute of Management, Bangalore, India
| | - Kiranmoy Das
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
20
|
How multiple networks help in creating knowledge: evidence from alternative energy patents. Scientometrics 2018. [DOI: 10.1007/s11192-018-2638-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
Bakbergenuly I, Kulinskaya E. Beta-binomial model for meta-analysis of odds ratios. Stat Med 2017; 36:1715-1734. [PMID: 28124446 PMCID: PMC5434808 DOI: 10.1002/sim.7233] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 11/11/2016] [Accepted: 01/03/2017] [Indexed: 11/08/2022]
Abstract
In meta-analysis of odds ratios (ORs), heterogeneity between the studies is usually modelled via the additive random effects model (REM). An alternative, multiplicative REM for ORs uses overdispersion. The multiplicative factor in this overdispersion model (ODM) can be interpreted as an intra-class correlation (ICC) parameter. This model naturally arises when the probabilities of an event in one or both arms of a comparative study are themselves beta-distributed, resulting in beta-binomial distributions. We propose two new estimators of the ICC for meta-analysis in this setting. One is based on the inverted Breslow-Day test, and the other on the improved gamma approximation by Kulinskaya and Dollinger (2015, p. 26) to the distribution of Cochran's Q. The performance of these and several other estimators of ICC on bias and coverage is studied by simulation. Additionally, the Mantel-Haenszel approach to estimation of ORs is extended to the beta-binomial model, and we study performance of various ICC estimators when used in the Mantel-Haenszel or the inverse-variance method to combine ORs in meta-analysis. The results of the simulations show that the improved gamma-based estimator of ICC is superior for small sample sizes, and the Breslow-Day-based estimator is the best for n⩾100. The Mantel-Haenszel-based estimator of OR is very biased and is not recommended. The inverse-variance approach is also somewhat biased for ORs≠1, but this bias is not very large in practical settings. Developed methods and R programs, provided in the Web Appendix, make the beta-binomial model a feasible alternative to the standard REM for meta-analysis of ORs. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Collapse
Affiliation(s)
| | - Elena Kulinskaya
- School of Computing SciencesUniversity of East AngliaNorwichU.K.
| |
Collapse
|
22
|
Coelho CA, Roy A. Testing the hypothesis of a block compound symmetric covariance matrix for elliptically contoured distributions. TEST-SPAIN 2016. [DOI: 10.1007/s11749-016-0512-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
23
|
Liu J, Colditz GA. Optimal design of longitudinal data analysis using generalized estimating equation models. Biom J 2016; 59:315-330. [PMID: 27878852 DOI: 10.1002/bimj.201600107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 08/18/2016] [Accepted: 09/22/2016] [Indexed: 11/10/2022]
Abstract
Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which "working correlation structure" is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two-group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for CRTs and multicenter trials. To overcome this shortcoming, Van Breukelen et al. consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where "optimal" refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different.
Collapse
Affiliation(s)
- Jingxia Liu
- Division of Public Health Sciences, Department of Surgery, Washington University in Saint Louis (WUSTL), St Louis, MO, 63110, USA
| | - Graham A Colditz
- Department of Surgery, Washington University in Saint Louis (WUSTL), St Louis, MO, 63110, USA
| |
Collapse
|
24
|
Spittal MJ, Carlin JB, Currier D, Downes M, English DR, Gordon I, Pirkis J, Gurrin L. The Australian longitudinal study on male health sampling design and survey weighting: implications for analysis and interpretation of clustered data. BMC Public Health 2016; 16:1062. [PMID: 28185562 PMCID: PMC5103251 DOI: 10.1186/s12889-016-3699-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Australian Longitudinal Study on Male Health (Ten to Men) used a complex sampling scheme to identify potential participants for the baseline survey. This raises important questions about when and how to adjust for the sampling design when analyzing data from the baseline survey. METHODS We describe the sampling scheme used in Ten to Men focusing on four important elements: stratification, multi-stage sampling, clustering and sample weights. We discuss how these elements fit together when using baseline data to estimate a population parameter (e.g., population mean or prevalence) or to estimate the association between an exposure and an outcome (e.g., an odds ratio). We illustrate this with examples using a continuous outcome (weight in kilograms) and a binary outcome (smoking status). RESULTS Estimates of a population mean or disease prevalence using Ten to Men baseline data are influenced by the extent to which the sampling design is addressed in an analysis. Estimates of mean weight and smoking prevalence are larger in unweighted analyses than weighted analyses (e.g., mean = 83.9 kg vs. 81.4 kg; prevalence = 18.0 % vs. 16.7 %, for unweighted and weighted analyses respectively) and the standard error of the mean is 1.03 times larger in an analysis that acknowledges the hierarchical (clustered) structure of the data compared with one that does not. For smoking prevalence, the corresponding standard error is 1.07 times larger. Measures of association (mean group differences, odds ratios) are generally similar in unweighted or weighted analyses and whether or not adjustment is made for clustering. CONCLUSIONS The extent to which the Ten to Men sampling design is accounted for in any analysis of the baseline data will depend on the research question. When the goals of the analysis are to estimate the prevalence of a disease or risk factor in the population or the magnitude of a population-level exposure-outcome association, our advice is to adopt an analysis that respects the sampling design.
Collapse
Affiliation(s)
- Matthew J Spittal
- Centre for Mental Health, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, 3010, Australia.
| | - John B Carlin
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, 3010, Australia.,Murdoch Childrens Research Institute, Royal Children's Hospital, Parkville, 3052, Australia
| | - Dianne Currier
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, 3010, Australia
| | - Marnie Downes
- Murdoch Childrens Research Institute, Royal Children's Hospital, Parkville, 3052, Australia.,Department of Paediatrics, Melbourne Medical School, The University of Melbourne, Melbourne, 3010, Australia
| | - Dallas R English
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, 3010, Australia
| | - Ian Gordon
- Statistical Consulting Centre, School of Mathematics and Statistics, The University of Melbourne, Melbourne, 3010, Australia
| | - Jane Pirkis
- Centre for Mental Health, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC, 3010, Australia
| | - Lyle Gurrin
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, 3010, Australia
| |
Collapse
|
25
|
McMorris T, Tallon M, Williams C, Sproule J, Draper S, Swain J, Potter J, Clayton N. Incremental Exercise, Plasma Concentrations of Catecholamines, Reaction Time, and Motor Time during Performance of a Noncompatible Choice Response Time Task. Percept Mot Skills 2016; 97:590-604. [PMID: 14620248 DOI: 10.2466/pms.2003.97.2.590] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The primary purpose was to examine the effect of incremental exercise on a noncompatible response time task. Participants ( N = 9) undertook a 4-choice noncompatible response time task under 3 conditions, following rest and during exercise at 70% and 100% of their maximum power output. Reaction and movement times were the dependent variables. Maximum power output had been previously established on an incremental test to exhaustion. A repeated-measures multivariate analysis of variance yielded a significant effect of exercise intensity on the task, but observation of the separate univariate repeated-measures analyses of variance showed that only movement time was significantly affected. Post hoc Tukey tests indicated movement time during maximal intensity exercise was significantly faster than in the other two conditions. The secondary purpose of the study was to assess whether increases in plasma concentrations of adrenaline and nor-adrenaline during exercise and power output would act as predictor variables of reaction and movement times during exercise. Catecholamine concentrations were based on venous blood samples taken during the maximum power output test. None of the variables were significant predictors of reaction time. Only power output was a significant predictor of movement time ( R2 = .24). There was little support for the notion that peripheral concentrations of catecholamines directly induce a central nervous system response.
Collapse
Affiliation(s)
- Terry McMorris
- Centre for Sports Science and Medicine, University College Chichester, West Sussex, UK.
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Darlington GA, Donner A. Meta-analysis of community-based cluster randomization trials with binary outcomes. Clin Trials 2016; 4:491-8. [DOI: 10.1177/1740774507083389] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Cluster randomization trials are widely used to test the effect of an intervention when individuals are naturally found in groups such as communities. For several separate studies of a similar intervention, it may be of interest to combine their results using meta-analysis procedures. However, this task requires consideration of both the likely dependencies among cluster members (intracluster correlation) and stratification based on the studies considered. Purpose In this article, several possible approaches for meta-analysis are considered for cluster randomization trials having a binary outcome. Methods It is first noted that the standard Mantel—Haenszel test is invalid in this context since it ignores dependencies among cluster members. Two modifications are therefore considered as well as a general inverse variance approach and a procedure based on the Woolf statistic which does not require the availability of trial-specific design effects. Empirical Type I errors and powers for the different procedures considered are evaluated using Monte Carlo simulation. To illustrate the techniques, data are used from trials performed in four countries to compare two antenatal care programs with respect to their effects on the risk of hypertension during pregnancy. Results For the simulation scenarios considered, an adjusted Mantel—Haenszel procedure provides a valid test with the greatest power slightly outperforming the general inverse variance approach. Limitations The potential need to adjust for possible confounding was not considered. However, more detailed information on confounders would not likely be available for most meta-analyses. Conclusion Two procedures performed well. However, the choice of analysis approach also inevitably depends on the nature and extent of the available data. Clinical Trials 2007; 4: 491—498. http://ctj.sagepub.com
Collapse
Affiliation(s)
- Gerarda A. Darlington
- Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, Canada,
| | - Allan Donner
- Department of Epidemiology and Biostatistics and Robarts Clinical Trials, Robarts Research Institute, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
27
|
Bottomley C, Kirby MJ, Lindsay SW, Alexander N. Can the buck always be passed to the highest level of clustering? BMC Med Res Methodol 2016; 16:29. [PMID: 26956373 PMCID: PMC4784323 DOI: 10.1186/s12874-016-0127-1] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2015] [Accepted: 02/18/2016] [Indexed: 11/10/2022] Open
Abstract
Background Clustering commonly affects the uncertainty of parameter estimates in epidemiological studies. Cluster-robust variance estimates (CRVE) are used to construct confidence intervals that account for single-level clustering, and are easily implemented in standard software. When data are clustered at more than one level (e.g. village and household) the level for the CRVE must be chosen. CRVE are consistent when used at the higher level of clustering (village), but since there are fewer clusters at the higher level, and consistency is an asymptotic property, there may be circumstances under which coverage is better from lower- rather than higher-level CRVE. Here we assess the relative importance of adjusting for clustering at the higher and lower level in a logistic regression model. Methods We performed a simulation study in which the coverage of 95 % confidence intervals was compared between adjustments at the higher and lower levels. Results Confidence intervals adjusted for the higher level of clustering had coverage close to 95 %, even when there were few clusters, provided that the intra-cluster correlation of the predictor was less than 0.5 for models with a single predictor and less than 0.2 for models with multiple predictors. Conclusions When there are multiple levels of clustering it is generally preferable to use confidence intervals that account for the highest level of clustering. This only fails if there are few clusters at this level and the intra-cluster correlation of the predictor is high. Electronic supplementary material The online version of this article (doi:10.1186/s12874-016-0127-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Christian Bottomley
- MRC Tropical Epidemiology Group, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| | - Matthew J Kirby
- MRC Tropical Epidemiology Group, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| | - Steve W Lindsay
- MRC Tropical Epidemiology Group, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| | - Neal Alexander
- MRC Tropical Epidemiology Group, London School of Hygiene & Tropical Medicine, Keppel Street, London, UK.
| |
Collapse
|
28
|
Nandram B, Bhatta D, Sedransk J, Bhadra D. A Bayesian test of independence in a two-way contingency table using surrogate sampling. J Stat Plan Inference 2013. [DOI: 10.1016/j.jspi.2013.03.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
Bouwmeester W, Moons KGM, Kappen TH, van Klei WA, Twisk JWR, Eijkemans MJC, Vergouwe Y. Internal validation of risk models in clustered data: a comparison of bootstrap schemes. Am J Epidemiol 2013; 177:1209-17. [PMID: 23660796 DOI: 10.1093/aje/kws396] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Internal validity of a risk model can be studied efficiently with bootstrapping to assess possible optimism in model performance. Assumptions of the regular bootstrap are violated when the development data are clustered. We compared alternative resampling schemes in clustered data for the estimation of optimism in model performance. A simulation study was conducted to compare regular resampling on only the patient level with resampling on only the cluster level and with resampling sequentially on both the cluster and patient levels (2-step approach). Optimism for the concordance index and calibration slope was estimated. Resampling of only patients or only clusters showed accurate estimates of optimism in model performance. The 2-step approach overestimated the optimism in model performance. If the number of centers or intraclass correlation coefficient was high, resampling of clusters showed more accurate estimates than resampling of patients. The 3 bootstrap schemes also were applied to empirical data that were clustered. The results presented in this paper support the use of resampling on only the clusters for estimation of optimism in model performance when data are clustered.
Collapse
Affiliation(s)
- W Bouwmeester
- Julius Center for Health Sciences and Primary Care, UMC Utrecht, P.O. Box 85500, 3508 GA Utrecht, The Netherlands
| | | | | | | | | | | | | |
Collapse
|
30
|
Tran NL, Barraj LM, Bi X, Schuda LC, Moya J. Estimated long-term fish and shellfish intake--national health and nutrition examination survey. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2013; 23:128-136. [PMID: 23047318 DOI: 10.1038/jes.2012.96] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2011] [Accepted: 08/09/2012] [Indexed: 06/01/2023]
Abstract
Usual intake estimates describe long-term average intake of food and nutrients and food contaminants. The frequencies of fish and shellfish intake over a 30-day period from National Health and Examination Survey (NHANES 1999-2006) were combined with 24-h dietary recall data from NHANES 2003-2004 using a Monte Carlo procedure to estimate the usual intake of fish and shellfish in this study. Usual intakes were estimated for the US population including children 1 to <11 years, males/females 11 to <16 years, 16 to <21 years, and adults 21+ years. Estimated mean fish intake (consumers only) was highest among children 1 to <2 years and 2 to <3 years, at 0.37 g/kg-day for both age groups, and lowest for females 11 to <16 years, at 0.13 g/kg-day. In all age groups, daily intake estimates were highest for breaded fish, salmon, and mackerel. Among children and teenage consumers, tuna, salmon, and breaded fish were the most frequently consumed fish; shrimp, scallops, and crabs were the most frequently consumed shellfish. The intake estimates from this study better reflect long-term average intake rates and are preferred to assess long-term intake of nutrients and possible exposure to environmental contaminants from fish and shellfish sources than 2-day average estimates.
Collapse
|
31
|
Bouwmeester W, Twisk JWR, Kappen TH, van Klei WA, Moons KGM, Vergouwe Y. Prediction models for clustered data: comparison of a random intercept and standard regression model. BMC Med Res Methodol 2013; 13:19. [PMID: 23414436 PMCID: PMC3658967 DOI: 10.1186/1471-2288-13-19] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2011] [Accepted: 12/16/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. METHODS Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. RESULTS The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. CONCLUSION The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Collapse
Affiliation(s)
- Walter Bouwmeester
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands.
| | | | | | | | | | | |
Collapse
|
32
|
|
33
|
Schildcrout JS, Mumford SL, Chen Z, Heagerty PJ, Rathouz PJ. Outcome-dependent sampling for longitudinal binary response data based on a time-varying auxiliary variable. Stat Med 2011; 31:2441-56. [PMID: 22086716 DOI: 10.1002/sim.4359] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2011] [Accepted: 07/01/2011] [Indexed: 11/06/2022]
Abstract
Outcome-dependent sampling (ODS) study designs are commonly implemented with rare diseases or when prospective studies are infeasible. In longitudinal data settings, when a repeatedly measured binary response is rare, an ODS design can be highly efficient for maximizing statistical information subject to resource limitations that prohibit covariate ascertainment of all observations. This manuscript details an ODS design where individual observations are sampled with probabilities determined by an inexpensive, time-varying auxiliary variable that is related but is not equal to the response. With the goal of validly estimating marginal model parameters based on the resulting biased sample, we propose a semi-parametric, sequential offsetted logistic regressions (SOLR) approach. The SOLR strategy first estimates the relationship between the auxiliary variable and the response and covariate data by using an offsetted logistic regression analysis where the offset is used to adjust for the biased design. Results from the auxiliary variable model are then combined with the known or estimated sampling probabilities to formulate a second offset that is used to correct for the biased design in the ultimate target model relating the longitudinal binary response to covariates. Because the target model offset is estimated with SOLR, we detail asymptotic standard error estimates that account for uncertainty associated with the auxiliary variable model. Motivated by an analysis of the BioCycle Study (Gaskins et al., Effect of daily fiber intake on reproductive function: the BioCycle Study. American Journal of Clinical Nutrition 2009; 90(4): 1061-1069) that aims to describe the relationship between reproductive health (determined by luteinizing hormone levels) and fiber consumption, we examine properties of SOLR estimators and compare them with other common approaches.
Collapse
Affiliation(s)
- Jonathan S Schildcrout
- Department of Biostatistics, Vanderbilt University School of Medicine, 1161 21st Ave South, S-2323 Medical Center North, Nashville, TN 37232-2158, USA.
| | | | | | | | | |
Collapse
|
34
|
Morris SS, Wright PM, Trevor J, Stiles P, Stahl GÃK, Snell S, Paauwe J, Farndale E. Global challenges to replicating HR: The role of people, processes, and systems. HUMAN RESOURCE MANAGEMENT 2009. [DOI: 10.1002/hrm.20325] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
35
|
Hunsberger S, Graubard BI, Korn EL. Testing logistic regression coefficients with clustered data and few positive outcomes. Stat Med 2008; 27:1305-24. [PMID: 17705348 DOI: 10.1002/sim.3011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Applications frequently involve logistic regression analysis with clustered data where there are few positive outcomes in some of the independent variable categories. For example, an application is given here that analyzes the association of asthma with various demographic variables and risk factors using data from the third National Health and Nutrition Examination Survey, a weighted multi stage cluster sample. Although there are 742 asthma cases in all (out of 18,395 individuals), for one of the categories of one of the independent variables there are only 25 asthma cases (out of 695 individuals). Generalized Wald and score hypothesis tests, which use appropriate cluster-level variance estimators, and a bootstrap hypothesis test have been proposed for testing logistic regression coefficients with cluster samples. When there are few positive outcomes, simulations presented in this paper show that these tests can sometimes have either inflated or very conservative levels. A simulation-based method is proposed for testing logistic regression coefficients with cluster samples when there are few positive outcomes. This testing methodology is shown to compare favorably with the generalized Wald and score tests and the bootstrap hypothesis test in terms of maintaining nominal levels. The proposed method is also useful when testing goodness-of-fit of logistic regression models using deciles-of-risk tables.
Collapse
Affiliation(s)
- Sally Hunsberger
- Biometric Research Branch, National Cancer Institute, Bethesda, MD 20892, U.S.A.
| | | | | |
Collapse
|
36
|
McMorris T, Rayment T. Short-duration, high-intensity exercise and performance of a sports-specific skill: a preliminary study. Percept Mot Skills 2008; 105:523-30. [PMID: 18065073 DOI: 10.2466/pms.105.2.523-530] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The purpose of this study was to examine the effect of one bout and three intermittent bouts of short-duration, high-intensity running on the performance of a sports-specific psychomotor skill. Participants (N=13) were male soccer players (M age 20.5 yr., SD=2.0) who had been playing semi-professionally for M=2.1 years, SD=1.11 and trained twice a week. They undertook a soccer-passing test in three conditions: following rest, following a 100-m sprint and following 3 x 100-m sprints, with 30-sec. rest intervals between sprints. Passing accuracy showed a significant linear deterioration, while number of passes showed a significant quadratic effect. Low to moderate linear regression correlations were found between posttest heart rate and absolute and variable errors on the test. It was concluded that short-duration, high-intensity exercise has a negative effect on accuracy in a sports-specific task that requires both perceptual judgment and motor control.
Collapse
Affiliation(s)
- Terry McMorris
- Centre for Sports Science and Medicine, University of Chichester, College Lane, Chichester, West Sussex PO19 6PE, United Kingdom.
| | | |
Collapse
|
37
|
Donner A, Taljaard M, Klar N. The merits of breaking the matches: a cautionary tale. Stat Med 2007; 26:2036-51. [PMID: 16927437 DOI: 10.1002/sim.2662] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Matched-pair cluster randomization trials are frequently adopted as the design of choice for evaluating an intervention offered at the community level. However, previous research has demonstrated that a strategy of breaking the matches and performing an unmatched analysis may be more efficient than performing a matched analysis on the resulting data, particularly when the total number of communities is small and the matching is judged as relatively ineffective. The research concerning this question has naturally focused on testing the effect of intervention. However, a secondary objective of many community intervention trials is to investigate the effect of individual-level risk factors on one or more outcome variables. Focusing on the case of a continuous outcome variable, we show that the practice of performing an unmatched analysis on data arising from a matched-pair design can lead to bias in the estimated regression coefficient, and a corresponding test of significance which is overly liberal. However, for large-scale community intervention trials, which typically recruit a relatively small number of large clusters, such an analysis will generally be both valid and efficient. We also consider other approaches to testing the effect of an individual-level risk factor in a matched-pair cluster randomization design, including a generalized linear model approach that preserves the matching, a two-stage cluster-level analysis, and an approach based on generalized estimating equations.
Collapse
Affiliation(s)
- Allan Donner
- Department of Epidemiology and Biostatistics, Schulich School of Medicine, University of Western Ontario, London, Canada.
| | | | | |
Collapse
|
38
|
Wickwire EM, Whelan JP, Meyers AW, Murray DM. Environmental correlates of gambling behavior in urban adolescents. JOURNAL OF ABNORMAL CHILD PSYCHOLOGY 2007; 35:179-90. [PMID: 17219080 DOI: 10.1007/s10802-006-9065-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The present study considered the relation between adolescent gambling behavior and the perceived environment, the component of Jessor and Jessor's (1977) Problem Behavior Theory that assesses the ways that adolescents perceive the attitudes and behaviors of parents and peers. The predominantly African-American sample included 188 sophomores from two urban public high schools. Using the South Oaks Gambling Screen-Revised for Adolescents to assess gambling risk, rates of both at-risk (20.7%) and problem (12.8%) gambling were found to be high. Boys displayed more gambling problems than did girls. The perceived environment accounted for significant variance in gambling problems and frequency, with proximal components displaying stronger relationships than distal components. Perceiving parent gambling and friend models for problem behavior were positively correlated with gambling problems, and friend models were positively related to gambling frequency. Among girls, family support was positively related to gambling problems. Among boys, this relation was negative.
Collapse
Affiliation(s)
- Emerson M Wickwire
- Institute for Gambling Education and Research, Department of Psychology, The University of Memphis, Memphis, TN 38152-3230, USA
| | | | | | | |
Collapse
|
39
|
McMORRIS TERRY. SHORT-DURATION, HIGH-INTENSITY EXERCISE AND PERFORMANCE OF A SPORTS-SPECIFIC SKILL: A PRELIMINARY STUDY. Percept Mot Skills 2007. [DOI: 10.2466/pms.105.6.523-530] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
40
|
Schmiege SJ, Khoo ST, Sandler IN, Ayers TS, Wolchik SA. Symptoms of internalizing and externalizing problems: modeling recovery curves after the death of a parent. Am J Prev Med 2006; 31:S152-60. [PMID: 17175410 DOI: 10.1016/j.amepre.2006.07.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2005] [Revised: 06/29/2006] [Accepted: 07/17/2006] [Indexed: 11/21/2022]
Abstract
BACKGROUND The death of a parent is a major family disruption that can place children at risk for later depression and other mental health problems. DESIGN Theoretically based randomized controlled trial for parentally bereaved children. SETTING/PARTICIPANTS Two-hundred and forty-four children and adolescents and their caregivers from 156 families were randomly assigned to the Family Bereavement Program (FBP) intervention condition (90 families; 135 children) or to a control condition (66 families; 109 children). Data collection occurred from 1996 to 1998. INTERVENTION Children and caregivers in the intervention condition met separately for 12 two-hour weekly sessions. Skills targeted by the program for children included positive coping, stress appraisals, control beliefs, and self-esteem. The caregiver program targeted caregiver mental health, life stressors, and improved discipline in the home. Both child and caregiver programs focused on improved quality of the caregiver-child relationship. MAIN OUTCOME MEASURES Child and caregiver reports of internalizing and externalizing symptoms. RESULTS Longitudinal growth curve modeling was performed to model symptoms over time from the point of parental death. The rate of recovery for girls in the program condition was significantly different from that of girls in the control condition across all outcomes. Boys in both conditions showed reduced symptoms over time. CONCLUSIONS The methodology offers a conceptually unique way of assessing recovery in terms of reduced mental health problems over time after an event and has contributed to further understanding of FBP intervention effects. The intervention program facilitated recovery among girls, who did not show reduction in behavior problems without the program, while boys demonstrated decreased symptoms even without intervention.
Collapse
|
41
|
Wichman AL, Rodgers JL, MacCallum RC. A multilevel approach to the relationship between birth order and intelligence. PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN 2006; 32:117-27. [PMID: 16317193 DOI: 10.1177/0146167205279581] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Many studies show relationships between birth order and intelligence but use cross-sectional designs or manifest other threats to internal validity. Multilevel analyses with a control variable show that when these threats are removed, two major results emerge: (a) birth order has no significant influence on children's intelligence and (b) earlier reported birth order effects on intelligence are attributable to factors that vary between, not within, families. Analyses on 7- to 8 - and 13- to 14-year-old children from the National Longitudinal Survey of Youth support these conclusions. When hierarchical data structures, age variance of children, and within-family versus between-family variance sources are taken into account, previous research is seen in a new light.
Collapse
|
42
|
Carlin JB, Gurrin LC, Sterne JA, Morley R, Dwyer T. Regression models for twin studies: a critical review. Int J Epidemiol 2005; 34:1089-99. [PMID: 16087687 DOI: 10.1093/ije/dyi153] [Citation(s) in RCA: 351] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Twin studies have long been recognized for their value in learning about the aetiology of disease and specifically for their potential for separating genetic effects from environmental effects. The recent upsurge of interest in life-course epidemiology and the study of developmental influences on later health has provided a new impetus to study twins as a source of unique insights. Twins are of special interest because they provide naturally matched pairs where the confounding effects of a large number of potentially causal factors (such as maternal nutrition or gestation length) may be removed by comparisons between twins who share them. The traditional tool of epidemiological 'risk factor analysis' is the regression model, but it is not straightforward to transfer standard regression methods to twin data, because the analysis needs to reflect the paired structure of the data, which induces correlation between twins. This paper reviews the use of more specialized regression methods for twin data, based on generalized least squares or linear mixed models, and explains the relationship between these methods and the commonly used approach of analysing within-twin-pair difference values. Methods and issues of interpretation are illustrated using an example from a recent study of the association between birth weight and cord blood erythropoietin. We focus on the analysis of continuous outcome measures but review additional complexities that arise with binary outcomes. We recommend the use of a general model that includes separate regression coefficients for within-twin-pair and between-pair effects, and provide guidelines for the interpretation of estimates obtained under this model.
Collapse
Affiliation(s)
- John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia.
| | | | | | | | | |
Collapse
|
43
|
Solis-Trapala IL, Farewell VT. Regression analysis of overdispersed correlated count data with subject specific covariates. Stat Med 2005; 24:2557-75. [PMID: 15977293 DOI: 10.1002/sim.2121] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A robust likelihood approach for the analysis of overdispersed correlated count data that takes into account cluster varying covariates is proposed. We emphasise two characteristics of the proposed method: That the correlation structure satisfies the constraints on the second moments and that the estimation of the correlation structure guarantees consistent estimates of the regression coefficients. In addition we extend the mean specification to include within- and between-cluster effects. The method is illustrated through the analysis of data from two studies. In the first study, cross-sectional count data from a randomised controlled trial are analysed to evaluate the efficacy of a communication skills training programme. The second study involves longitudinal count data which represent counts of damaged hand joints in patients with psoriatic arthritis. Motivated by this study, we generalize our model to accommodate for a subpopulation of patients who are not susceptible to the development of damaged hand joints.
Collapse
Affiliation(s)
- I L Solis-Trapala
- Departamento de Probabilidad y Estadística, Centro de Investigación en Matemáticas, Guanajuato, Gto., México
| | | |
Collapse
|
44
|
Dwyer T, Blizzard L. A discussion of some statistical methods for separating within-pair associations from associations among all twins in research on fetal origins of disease. Paediatr Perinat Epidemiol 2005; 19 Suppl 1:48-53. [PMID: 15670122 DOI: 10.1111/j.1365-3016.2005.00615.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Twin data can be used to gain insights into the origin of associations between factors arising in fetal life and the risk of later disease. This is because twin data afford an opportunity to conduct paired analyses that take the influence of shared factors into account. When an association that is present in an unpaired analysis is present also in a paired analysis, there is evidence that the causal pathway linking the fetal factor and the disease may have a fetal origin. If the association disappears in the paired analysis, there is evidence that it may have has arisen from a shared source such as the mother. The relevant factors include diet and socio-economic status. There are several statistical approaches to this. The simplest involves comparing, say, a coefficient from a regression of an outcome on a fetal factor for all subjects in a twin sample, with the coefficient obtained from regressing the within-pair difference in the outcome on the within-pair difference in the fetal factor. Alternative approaches involve simultaneously estimating regression parameters for between- and within-pair components. These approaches permit similar inferences about whether the association is due to individual (fetal) or shared (maternal) factors, and are valid in the circumstances that non-shared factors missing from the regression model do not influence the regression estimates.
Collapse
Affiliation(s)
- Terence Dwyer
- Menzies Research Institute, University of Tasmania, Hobart, Tasmania, Australia.
| | | |
Collapse
|
45
|
Foster EM, Fang GY. Alternative methods for handling attrition: an illustration using data from the Fast Track evaluation. EVALUATION REVIEW 2004; 28:434-64. [PMID: 15358906 PMCID: PMC2765229 DOI: 10.1177/0193841x04264662] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Using data from the evaluation of the Fast Track intervention, this article illustrates three methods for handling attrition. Multiple imputation and ignorable maximum likelihood estimation produce estimates that are similar to those based on listwise-deleted data. A panel selection model that allows for selective dropout reveals that highly aggressive boys accumulate in the treatment group over time and produces a larger estimate of treatment effect. In contrast, this model produces a smaller treatment effect for girls. The article's conclusion discusses the strengths and weaknesses of the alternative approaches and outlines ways in which researchers might improve their handling of attrition.
Collapse
Affiliation(s)
- E Michael Foster
- Pennsylvania State University, University Park, PA 16802-6500, USA
| | | |
Collapse
|
46
|
McMORRIS TERRY. INCREMENTAL EXERCISE, PLASMA CONCENTRATIONS OF CATECHOLAMINES, REACTION TIME, AND MOTOR TIME DURING PERFORMANCE OF A NONCOMPATIBLE CHOICE RESPONSE TIME TASK. Percept Mot Skills 2003. [DOI: 10.2466/pms.97.6.590-604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
47
|
|
48
|
Abstract
A common objective in health care quality studies involves measuring and comparing the quality of care delivered to cohorts of patients by different health care providers. The data used for inference involve observations on units grouped within clusters, such as patients treated within hospitals. Unlike cluster randomization trials where often clusters are randomized to interventions to learn about individuals, the target of inference in health quality studies is the cluster. Furthermore, randomization is often not performed and the resulting biases may invalidate standard tests. In this paper, we discuss approaches to sample size determination in the design of observational health quality studies when the outcome is binary. Methods for calculating sample size using marginal models are briefly reviewed, but the focus is on hierarchical binomial models. Sample size in unbalanced clusters and stratified designs are characterized. We draw upon the experiences that have arisen from a study funded by the Agency for Healthcare Research and Quality involving assessment of quality of care for patients with cardiovascular disease. If researchers are interested in comparing clusters, hierarchical models are preferred.
Collapse
|
49
|
Carvajal SC, Baumler E, Harrist RB, Parcel GS. Multilevel Models and Unbiased Tests for Group Based Interventions: Examples from the Safer Choices Study. MULTIVARIATE BEHAVIORAL RESEARCH 2001; 36:185-205. [PMID: 26822108 DOI: 10.1207/s15327906mbr3602_03] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
For many large-scale behavioral interventions, random assignment to intervention condition occurs at the group level. Data analytic models that ignore potential non-independence of observations provide inefficient parameter estimates and often produce biased test statistics. For studies in which individuals are randomized by groups to treatment condition, multilevel models (MLMs) provide a flexible approach to statistically evaluating program effects. This article presents an explanation of the need for MLM's for such nested designs and uses data from the Safer Choices study to illustrate the application of MLMs for both continuous and dichotomous outcomes. When designing studies, researchers who are considering group-randomized interventions should also consider the features of the multilevel analytic models they might employ.
Collapse
|
50
|
Krull JL, MacKinnon DP. Multilevel Modeling of Individual and Group Level Mediated Effects. MULTIVARIATE BEHAVIORAL RESEARCH 2001; 36:249-77. [PMID: 26822111 DOI: 10.1207/s15327906mbr3602_06] [Citation(s) in RCA: 606] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
This article combines procedures for single-level mediational analysis with multilevel modeling techniques in order to appropriately test mediational effects in clustered data. A simulation study compared the performance of these multilevel mediational models with that of single-level mediational models in clustered data with individual- or group-level initial independent variables, individual- or group-level mediators, and individual level outcomes. The standard errors of mediated effects from the multilevel solution were generally accurate, while those from the single-level procedure were downwardly biased, often by 20% or more. The multilevel advantage was greatest in those situations involving group-level variables, larger group sizes, and higher intraclass correlations in mediator and outcome variables. Multilevel mediational modeling methods were also applied to data from a preventive intervention designed to reduce intentions to use steroids among players on high school football teams. This example illustrates differences between single-level and multilevel mediational modeling in real-world clustered data and shows how the multilevel technique may lead to more accurate results.
Collapse
|