76
|
Wakefield J, Haneuse S, Dobra A, Teeple E. Bayes computation for ecological inference. Stat Med 2011; 30:1381-96. [PMID: 21341304 PMCID: PMC3178414 DOI: 10.1002/sim.4214] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Accepted: 01/03/2011] [Indexed: 11/08/2022]
Abstract
Ecological data are available at the level of the group, rather than at the level of the individual. The use of ecological data in spatial epidemiological investigations is particularly common. Although the computational methods described are more generally applicable, this paper concentrates on the situation in which the margins of 2 × 2 tables are observed in each of n geographical areas, with a Bayesian approach to inference. We consider auxiliary schemes that impute the missing data, and compare with a previously suggested normal approximation. The analysis of ecological data is subject to ecological bias, with the only reliable means of removing such bias being the addition of auxiliary individual-level information. Various schemes have been suggested for this supplementation, and we illustrate how the computational methods may be applied to the analysis of such enhanced data. The methods are illustrated using simulated data and two examples. In the first example, the ecological data are supplemented with a simple random sample of individual-level data, and in this example the normal approximation fails. In the second example case-control sampling provides the additional information.
Collapse
|
77
|
Wolever RQ, Dreusicke M, Fikkan J, Hawkins TV, Yeung S, Wakefield J, Duda L, Flowers P, Cook C, Skinner E. Integrative health coaching for patients with type 2 diabetes: a randomized clinical trial. DIABETES EDUCATOR 2010; 36:629-39. [PMID: 20534872 DOI: 10.1177/0145721710371523] [Citation(s) in RCA: 200] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
PURPOSE The purpose of this study was to evaluate the effectiveness of integrative health (IH) coaching on psychosocial factors, behavior change, and glycemic control in patients with type 2 diabetes. METHODS Fifty-six patients with type 2 diabetes were randomized to either 6 months of IH coaching or usual care (control group). Coaching was conducted by telephone for fourteen 30-minute sessions. Patients were guided in creating an individualized vision of health, and goals were self-chosen to align with personal values. The coaching agenda, discussion topics, and goals were those of the patient, not the provider. Preintervention and postintervention assessments measured medication adherence, exercise frequency, patient engagement, psychosocial variables, and A1C. RESULTS Perceived barriers to medication adherence decreased, while patient activation, perceived social support, and benefit finding all increased in the IH coaching group compared with those in the control group. Improvements in the coaching group alone were also observed for self-reported adherence, exercise frequency, stress, and perceived health status. Coaching participants with elevated baseline A1C (>/=7%) significantly reduced their A1C. CONCLUSIONS A coaching intervention focused on patients' values and sense of purpose may provide added benefit to traditional diabetes education programs. Fundamentals of IH coaching may be applied by diabetes educators to improve patient self-efficacy, accountability, and clinical outcomes.
Collapse
|
78
|
Glynn A, Wakefield J. Ecological Inference in the Social Sciences. STATISTICAL METHODOLOGY 2010; 7:307-322. [PMID: 20563299 DOI: 10.1016/j.stamet.2009.09.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Ecological inference is a problem of partial identification, and therefore reliable precise conclusions are rarely possible without the collection of individual level (identifying) data. Without such data, sensitivity analyses provide the only recourse. In this paper we review and critique approaches to ecological inference in the social sciences, and describe in detail hierarchical models, which allow both sensitivity analysis and the incorporation of individual level data into an ecological analysis. A crucial element of a sensitivity analysis in such models is prior specification, and we detail how this may be carried out. Furthermore, we demonstrate how the inclusion of a small amount of individual level data can dramatically improve the properties of such estimates.
Collapse
|
79
|
Wakefield J, De Vocht F, Hung RJ. Bayesian mixture modeling of gene-environment and gene-gene interactions. Genet Epidemiol 2010; 34:16-25. [PMID: 19492346 DOI: 10.1002/gepi.20429] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
With the advent of rapid and relatively cheap genotyping technologies there is now the opportunity to attempt to identify gene-environment and gene-gene interactions when the number of genes and environmental factors is potentially large. Unfortunately the dimensionality of the parameter space leads to a computational explosion in the number of possible interactions that may be investigated. The full model that includes all interactions and main effects can be unstable, with wide confidence intervals arising from the large number of estimated parameters. We describe a hierarchical mixture model that allows all interactions to be investigated simultaneously, but assumes the effects come from a mixture prior with two components, one that reflects small null effects and the second for epidemiologically significant effects. Effects from the former are effectively set to zero, hence increasing the power for the detection of real signals. The prior framework is very flexible, which allows substantive information to be incorporated into the analysis. We illustrate the methods first using simulation, and then on data from a case-control study of lung cancer in Central and Eastern Europe.
Collapse
|
80
|
Fong Y, Rue H, Wakefield J. Bayesian inference for generalized linear mixed models. Biostatistics 2009; 11:397-412. [PMID: 19966070 DOI: 10.1093/biostatistics/kxp053] [Citation(s) in RCA: 125] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Generalized linear mixed models (GLMMs) continue to grow in popularity due to their ability to directly acknowledge multiple levels of dependency and model different data types. For small sample sizes especially, likelihood-based inference can be unreliable with variance components being particularly difficult to estimate. A Bayesian approach is appealing but has been hampered by the lack of a fast implementation, and the difficulty in specifying prior distributions with variance components again being particularly problematic. Here, we briefly review previous approaches to computation in Bayesian implementations of GLMMs and illustrate in detail, the use of integrated nested Laplace approximations in this context. We consider a number of examples, carefully specifying prior distributions on meaningful quantities in each case. The examples cover a wide range of data types including those requiring smoothing over time and a relatively complicated spline model for which we examine our prior specification in terms of the implied degrees of freedom. We conclude that Bayesian inference is now practically feasible for GLMMs and provides an attractive alternative to likelihood-based approaches such as penalized quasi-likelihood. As with likelihood-based approaches, great care is required in the analysis of clustered binary data since approximation strategies may be less accurate for such data.
Collapse
|
81
|
Wakefield J. Comments on ‘The BUGS project: Evolution, critique and future directions’. Stat Med 2009; 28:3079-80. [DOI: 10.1002/sim.3674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
82
|
Jackson T, Nghiem HS, Rowell D, Jorm C, Wakefield J. Setting economic priorities for patient safety programs and patient safety research using case mix costing data. BMC Health Serv Res 2009. [PMCID: PMC2773579 DOI: 10.1186/1472-6963-9-s1-a4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
83
|
Fong Y, Wakefield J, Rice K. Bayesian mixture modeling using a hybrid sampler with application to protein subfamily identification. Biostatistics 2009; 11:18-33. [PMID: 19696187 DOI: 10.1093/biostatistics/kxp033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Predicting protein function is essential to advancing our knowledge of biological processes. This article is focused on discovering the functional diversification within a protein family. A Bayesian mixture approach is proposed to model a protein family as a mixture of profile hidden Markov models. For a given mixture size, a hybrid Markov chain Monte Carlo sampler comprising both Gibbs sampling steps and hierarchical clustering-based split/merge proposals is used to obtain posterior inference. Inference for mixture size concentrates on comparing the integrated likelihoods. The choice of priors is critical with respect to the performance of the procedure. Through simulation studies, we show that 2 priors that are based on independent data sets allow correct identification of the mixture size, both when the data are homogeneous and when the data are generated from a mixture. We illustrate our method using 2 sets of real protein sequences.
Collapse
|
84
|
Wakefield J. Bayes factors for genome-wide association studies: comparison with P-values. Genet Epidemiol 2009; 33:79-86. [PMID: 18642345 DOI: 10.1002/gepi.20359] [Citation(s) in RCA: 256] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Bayes factor is a summary measure that provides an alternative to the P-value for the ranking of associations, or the flagging of associations as "significant". We describe an approximate Bayes factor that is straightforward to use and is appropriate when sample sizes are large. We consider various choices of the prior on the effect size, including those that allow effect size to vary with the minor allele frequency (MAF) of the marker. An important contribution is the description of a specific prior that gives identical rankings between Bayes factors and P-values, providing a link between the two approaches, and allowing the implications of the use of P-values to be more easily understood. As a summary measure of noteworthiness P-values are difficult to calibrate since their interpretation depends on MAF and, crucially, on sample size. A consequence is that a consistent decision-making procedure using P-values requires a threshold for significance that reduces with sample size, contrary to common practice.
Collapse
|
85
|
Abstract
Testing for Hardy-Weinberg equilibrium is ubiquitous and has traditionally been carried out via frequentist approaches. However, the discreteness of the sample space means that uniformity of p-values under the null cannot be assumed, with enumeration of all possible counts, conditional on the minor allele count, offering a computationally expensive way of p-value calibration. In addition, the interpretation of the subsequent p-values, and choice of significance threshold depends critically on sample size, because equilibrium will always be rejected at conventional levels with large sample sizes. We argue for a Bayesian approach using both Bayes factors, and the examination of posterior distributions. We describe simple conjugate approaches, and methods based on importance sampling Monte Carlo. The former are convenient because they yield closed-form expressions for Bayes factors, which allow their application to a large number of single nucleotide polymorphisms (SNPs), in particular in genome-wide contexts. We also describe straightforward direct sampling methods for examining posterior distributions of parameters of interest. For large numbers of alleles at a locus we resort to Markov chain Monte Carlo. We discuss a number of possibilities for prior specification, and apply the suggested methods to a number of real datasets.
Collapse
|
86
|
Islami F, Kamangar F, Nasrollahzadeh D, Aghcheli K, Sotoudeh M, Abedi-Ardekani B, Merat S, Nasseri-Moghaddam S, Semnani S, Sepehr A, Wakefield J, Møller H, Abnet CC, Dawsey SM, Boffetta P, Malekzadeh R. Socio-economic status and oesophageal cancer: results from a population-based case-control study in a high-risk area. Int J Epidemiol 2009; 38:978-88. [PMID: 19416955 DOI: 10.1093/ije/dyp195] [Citation(s) in RCA: 170] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Cancer registries in the 1970s showed that parts of Golestan Province in Iran had the highest rate of oesophageal squamous cell carcinoma (OSCC) in the world. More recent studies have shown that while rates are still high, they are approximately half of what they were before, which might be attributable to improved socio-economic status (SES) and living conditions in this area. We examined a wide range of SES indicators to investigate the association between different SES components and risk of OSCC in the region. METHODS Data were obtained from a population-based case-control study conducted between 2003 and 2007 with 300 histologically proven OSCC cases and 571 matched neighbourhood controls. We used conditional logistic regression to compare cases and controls for individual SES indicators, for a composite wealth score constructed using multiple correspondence analysis, and for factors obtained from factors analysis. RESULTS We found that various dimensions of SES, such as education, wealth and being married were all inversely related to OSCC. The strongest inverse association was found with education. Compared with no education, the adjusted odds ratios (95% confidence intervals) for primary education and high school or beyond were 0.52 (0.27-0.98) and 0.20 (0.06-0.65), respectively. CONCLUSIONS The strong association of SES with OSCC after adjustment for known risk factors implies the presence of yet unidentified risk factors that are correlated with our SES measures; identification of these factors could be the target of future studies. Our results also emphasize the importance of using multiple SES measures in epidemiological studies.
Collapse
|
87
|
Wakefield J. Multi-level modelling, the ecologic fallacy, and hybrid study designs. Int J Epidemiol 2009; 38:330-6. [PMID: 19339258 PMCID: PMC2663723 DOI: 10.1093/ije/dyp179] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/10/2009] [Indexed: 11/12/2022] Open
|
88
|
Islami F, Malekshah AF, Kimiagar M, Pourshams A, Wakefield J, Goglani G, Rakhshani N, Nasrollahzadeh D, Salahi R, Semnani S, Saadatian-Elahi M, Abnet CC, Kamangar F, Dawsey SM, Brennan P, Boffetta P, Malekzadeh R. Patterns of food and nutrient consumption in northern Iran, a high-risk area for esophageal cancer. Nutr Cancer 2009; 61:475-83. [PMID: 19838919 PMCID: PMC2796961 DOI: 10.1080/01635580902803735] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Our objectives were to investigate patterns of food and nutrient consumption in Golestan province, a high-incidence area for esophageal cancer (EC) in northern Iran. Twelve 24-h dietary recalls were administered during a 1-yr period to 131 healthy participants in a pilot cohort study. We compare here nutrient intake in Golestan with recommended daily allowances (RDAs) and lowest threshold intakes (LTIs). We also compare the intake of 27 food groups and nutrients among several population subgroups using mean values from the 12 recalls. Rural women had a very low level of vitamin intake, which was even lower than LTIs (P < 0.01). Daily intake of vitamins A and C was lower than LTI in 67% and 73% of rural women, respectively. Among rural men, the vitamin intakes were not significantly different from LTIs. Among urban women, the vitamin intakes were significantly lower than RDAs but were significantly higher than LTIs. Among urban men, the intakes were not significantly different from RDAs. Compared to urban dwellers, intake of most food groups and nutrients, including vitamins, was significantly lower among rural dwellers. In terms of vitamin intake, no significant difference was observed between Turkmen and non-Turkmen ethnics. The severe deficiency in vitamin intake among women and rural dwellers and marked differences in nutrient intake between rural and urban dwellers may contribute to the observed epidemiological pattern of EC in Golestan, with high incidence rates among women and people with low socioeconomic status and the highest incidence rate among rural women.
Collapse
|
89
|
Haneuse S, Wakefield J. Geographic-based ecological correlation studies using supplemental case-control data. Stat Med 2008; 27:864-87. [PMID: 17624917 DOI: 10.1002/sim.2979] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
It is well known that the ecological study design suffers from a variety of biases that render the interpretation of its results difficult. Despite its limitations, however, the ecological study design is still widely used in a range of disciplines. The only solution to the ecological inference problem is to supplement the aggregate data with individual-level data and, to this end, Haneuse and Wakefield (Biometrics 2007; 63:128-136) recently proposed a hybrid study design in which an ecological study is supplemented with a sample of case-control data. The latter provides the basis for the control of bias, while the former may provide efficiency gains. Building on that work, we illustrate the use of the hybrid design in the context of a geographical correlation study of lung cancer mortality from the state of Ohio. Focusing on epidemiological applications, we initially provide an overview of the use of ecological studies in scientific research, highlighting the breadth of current application as well as advantages and drawbacks of the design. We consider the interplay between the two sources of information in the design: ecological and case-control, and then provide details on a Bayesian spatial random effects model in the setting of the hybrid design. Issues of specification are addressed, as well as sensitivity to modeling assumptions. Further, an interesting feature of these data is that they provide an example of how the proposed design may be used to resolve the ecological fallacy.
Collapse
|
90
|
Abstract
Ecologic (aggregate) data are widely available and widely utilized in epidemiologic studies. However, ecologic bias, which arises because aggregate data cannot characterize within-group variability in exposure and confounder variables, can only be removed by supplementing ecologic data with individual-level data. Here the authors describe the two-phase study design as a framework for achieving this objective. In phase 1, outcomes are stratified by any combination of area, confounders, and error-prone (or discretized) versions of exposures of interest. Phase 2 data, sampled within each phase 1 stratum, provide accurate measures of exposure and possibly of additional confounders. The phase 1 aggregate-level data provide a high level of statistical power and a cross-classification by which individuals may be efficiently sampled in phase 2. The phase 2 individual-level data then provide a control for ecologic bias by characterizing the within-area variability in exposures and confounders. In this paper, the authors illustrate the two-phase study design by estimating the association between infant mortality and birth weight in several regions of North Carolina for 2000-2004, controlling for gender and race. This example shows that the two-phase design removes ecologic bias and produces gains in efficiency over the use of case-control data alone. The authors discuss the advantages and disadvantages of the approach.
Collapse
|
91
|
McKay JD, Hashibe M, Hung RJ, Wakefield J, Gaborieau V, Szeszenia-Dabrowska N, Zaridze D, Lissowska J, Rudnai P, Fabianova E, Mates D, Foretova L, Janout V, Bencko V, Chabrier A, Hall J, Boffetta P, Canzian F, Brennan P. Sequence variants of NAT1 and NAT2 and other xenometabolic genes and risk of lung and aerodigestive tract cancers in Central Europe. Cancer Epidemiol Biomarkers Prev 2008; 17:141-7. [PMID: 18199719 DOI: 10.1158/1055-9965.epi-07-0553] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Tobacco smoke contains an extensive cocktail of highly carcinogenic chemicals. Individuals with a slower elimination rate of the chemicals in tobacco smoke may have increased exposure to their carcinogenic properties compared with those with a faster rate. Polymorphisms that alter the function of the genes involved in the activation or the detoxification of the chemical carcinogens in tobacco smoke can potentially influence an individual's risk of developing a tobacco-related cancer. To test this hypothesis, we have genotyped polymorphisms in 16 genes involved in metabolism of chemical carcinogens in a Central and Eastern European case-control study comprising 2,250 lung cases, 811 upper aerodigestive cancer (UADT) cases, and 2,704 controls. The N-acetyltransferase (NAT) genes were the most implicated in risk, with the NAT1*10 haplotype showing an inverse association in lung cancer, in both heterozygote carriers [odds ratio (OR), 0.81; 95% confidence interval (95% CI), 0.70-0.93] and homozygote carriers (OR, 0.70; 95% CI, 0.48-1.01), suggesting a genotype dose response (P < 0.001). In UADT cancer, a similar inverse association was noted in NAT1*10 although only in heterozygotes (OR, 0.78; 95%CI, 0.65-0.95). In NAT2, when considering the individuals inferred acetylator phenotypes based on their NAT2 diplotype, "slow" acetylators compared with intermediate or fast acetylators showed no association with risk. None of the other 14 genes provided robust evidence of an association for either lung or UADT cancer. We therefore conclude that, of the genetic variation studied, NAT1 gene was the most likely candidate to influence the risk of developing a tobacco-related cancer.
Collapse
|
92
|
Li SM, Wakefield J, Self S. A transdimensional Bayesian model for pattern recognition in DNA sequences. Biostatistics 2008; 9:668-85. [PMID: 18349034 DOI: 10.1093/biostatistics/kxm058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Identification of transcription factor binding sites (TFBSs) is essential to elucidate gene regulatory networks. This article is focused on the recognition of overpresented short patterns, called "motifs", that may correspond to regulatory binding sites in the DNA sequences upstream of genes. An integrated Bayesian model is proposed to incorporate all unknown characteristics in motif discovery, including the number of motifs, motif widths, motif compositions, the number of motif sites, and locations of motif sites. Reversible jump Markov chain Monte Carlo is used to obtain posterior inference in the transdimensional parameter space. We present a number of suggestions for graphical summarization of the posterior distribution over the complex parameter space. The basic model is extended using a third-order Markov structure for nonmotif bases and allowing positions within a motif to be switched between 2 types: "conserved" and "degenerate." We evaluate the prediction accuracy for the simulated data with 3 motifs and apply the model to upstream sequences in high signal-to-noise regions in a human ChIP-chip study. The performance of the Bayesian model is assessed using yeast data sets of various numbers of sequences and background structures, with and without true TFBSs. The performance is also compared to other computational methods, including 2 statistical approaches, AlignACE and multiple expectation maximization for motif elicitation, and 1 word numeration-based approach, yeast motif finder (YMF).
Collapse
|
93
|
Wakefield J. Reporting and interpretation in genome-wide association studies. Int J Epidemiol 2008; 37:641-53. [PMID: 18270206 DOI: 10.1093/ije/dym257] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND In the context of genome-wide association studies we critique a number of methods that have been suggested for flagging associations for further investigation. METHODS The P-value is by far the most commonly used measure, but requires careful calibration when the a priori probability of an association is small, and discards information by not considering the power associated with each test. The q-value is a frequentist method by which the false discovery rate (FDR) may be controlled. RESULTS We advocate the use of the Bayes factor as a summary of the information in the data with respect to the comparison of the null and alternative hypotheses, and describe a recently-proposed approach to the calculation of the Bayes factor that is easily implemented. The combination of data across studies is straightforward using the Bayes factor approach, as are power calculations. CONCLUSIONS The Bayes factor and the q-value provide complementary information and when used in addition to the P-value may be used to reduce the number of reported findings that are subsequently not reproduced.
Collapse
|
94
|
Glynn A, Wakefield J, Handcock MS, Richardson TS. Alleviating Linear Ecological Bias and Optimal Design with Sub-sample Data. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2008; 171:179-202. [PMID: 20052294 PMCID: PMC2801082 DOI: 10.1111/j.1467-985x.2007.00511.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
In this paper, we illustrate that combining ecological data with subsample data in situations in which a linear model is appropriate provides three main benefits. First, by including the individual level subsample data, the biases associated with linear ecological inference can be eliminated. Second, by supplementing the subsample data with ecological data, the information about parameters will be increased. Third, we can use readily available ecological data to design optimal subsampling schemes, so as to further increase the information about parameters. We present an application of this methodology to the classic problem of estimating the effect of a college degree on wages. We show that combining ecological data with subsample data provides precise estimates of this value, and that optimal subsampling schemes (conditional on the ecological data) can provide good precision with only a fraction of the observations.
Collapse
|
95
|
Abstract
This article considers the modeling of single-dose pharmacokinetic data. Traditionally, so-called compartmental models have been used to analyze such data. Unfortunately, the mean function of such models are sums of exponentials for which inference and computation may not be straightforward. We present an alternative to these models based on generalized linear models, for which desirable statistical properties exist, with a logarithmic link and gamma distribution. The latter has a constant coefficient of variation, which is often appropriate for pharmacokinetic data. Inference is convenient from either a likelihood or a Bayesian perspective. We consider models for both single and multiple individuals, the latter via generalized linear mixed models. For single individuals, Bayesian computation may be carried out with recourse to simulation. We describe a rejection algorithm that, unlike Markov chain Monte Carlo, produces independent samples from the posterior and allows straightforward calculation of Bayes factors for model comparison. We also illustrate how prior distributions may be specified in terms of model-free pharmacokinetic parameters of interest. The methods are applied to data from 12 individuals following administration of the antiasthmatic agent theophylline.
Collapse
|
96
|
Haneuse S, Wakefield J, Sheppard L. The interpretation of exposure effect estimates in chronic air pollution studies. Stat Med 2007; 26:3172-87. [PMID: 17225212 DOI: 10.1002/sim.2785] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this article we consider the interpretation of regression parameters used to represent 'chronic' or 'long-term' air pollution exposure effects. Although scientific interest typically lies in understanding such effects at the level of the individual, studies have generally employed a semi-ecological design; outcomes and confounder information are collected on individuals while exposure is only available at the aggregate-or group-level. A precise interpretation of results from a semi-ecological design must take into account the aggregated nature, both spatial and temporal, of the exposure measure. The most common analysis approach for assessing chronic exposure effects has been within the Cox proportional hazards model framework; specific analyses are tailored to accommodate the shortcomings of the available exposure information. We revisit the underlying assumptions of the Cox model and discuss the implications of two common aspects of chronic effects studies: time-dependent exposures and time-varying effects. Focusing on the consequences of temporal aggregation of exposure, we show that an estimate obtained from a time-aggregated semi-ecological design can correspond to very different underlying time-varying exposure and risk scenarios. Further, distinguishing which of these is correct is not possible from the semi-ecological data alone. Our goal is to highlight some statistical issues faced by existing studies of chronic air pollution effects, and aid in the development and planning of future studies.
Collapse
|
97
|
Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 2007; 81:208-27. [PMID: 17668372 PMCID: PMC1950810 DOI: 10.1086/519024] [Citation(s) in RCA: 342] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2007] [Accepted: 04/23/2007] [Indexed: 11/04/2022] Open
Abstract
In light of the vast amounts of genomic data that are now being generated, we propose a new measure, the Bayesian false-discovery probability (BFDP), for assessing the noteworthiness of an observed association. BFDP shares the ease of calculation of the recently proposed false-positive report probability (FPRP) but uses more information, has a noteworthy threshold defined naturally in terms of the costs of false discovery and nondiscovery, and has a sound methodological foundation. In addition, in a multiple-testing situation, it is straightforward to estimate the expected numbers of false discoveries and false nondiscoveries. We provide an in-depth discussion of FPRP, including a comparison with the q value, and examine the empirical behavior of these measures, along with BFDP, via simulation. Finally, we use BFDP to assess the association between 131 single-nucleotide polymorphisms and lung cancer in a case-control study.
Collapse
|
98
|
Abstract
A major drawback of epidemiological ecological studies, in which the association between area-level summaries of risk and exposure is used to make inference about individual risk, is the difficulty in characterizing within-area variability in exposure and confounder variables. To avoid ecological bias, samples of individual exposure/confounder data within each area are required. Unfortunately, these may be difficult or expensive to obtain, particularly if large samples are required. In this paper, we propose a new approach suitable for use with small samples. We combine a Bayesian nonparametric Dirichlet process prior with an estimating functions' approach and show that this model gives a compromise between 2 previously described methods. The method is investigated using simulated data, and a practical illustration is provided through an analysis of lung cancer mortality and residential radon exposure in counties of Minnesota. We conclude that we require good quality prior information about the exposure/confounder distributions and a large between- to within-area variability ratio for an ecological study to be feasible using only small samples of individual data.
Collapse
|
99
|
Wakefield J. Statistical Analysis of Environmental Space-Time Processes edited by N. D. Le and J. V. Zidek. Biometrics 2007. [DOI: 10.1111/j.1541-0420.2006.00787_1.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
100
|
Abstract
In recent years there has been great interest in making inference for gene expression data collected over time. In this article, we describe a Bayesian hierarchical mixture model for partitioning such data. While conventional approaches cluster the observed data, we assume a nonparametric, random walk model, and partition on the basis of the parameters of this model. The model is flexible and can be tuned to the specific context, respects the order of observations within each curve, acknowledges measurement error, and allows prior knowledge on parameters to be incorporated. The number of partitions may also be treated as unknown, and inferred from the data, in which case computation is carried out via a birth-death Markov chain Monte Carlo algorithm. We first examine the behavior of the model on simulated data, along with a comparison with more conventional approaches, and then analyze meiotic expression data collected over time on fission yeast genes.
Collapse
|