51
|
Schaafsma T, Wakefield J, Hanisch R, Bray F, Schüz J, Joy EJM, Watts MJ, McCormack V. Correction: Africa's Oesophageal Cancer Corridor: Geographic Variations in Incidence Correlate with Certain Micronutrient Deficiencies. PLoS One 2015; 10:e0142648. [PMID: 26565803 PMCID: PMC4643890 DOI: 10.1371/journal.pone.0142648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
52
|
Bauer C, Wakefield J, Rue H, Self S, Feng Z, Wang Y. Bayesian penalized spline models for the analysis of spatio-temporal count data. Stat Med 2015; 35:1848-65. [PMID: 26530705 DOI: 10.1002/sim.6785] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 10/02/2015] [Accepted: 10/10/2015] [Indexed: 11/11/2022]
Abstract
In recent years, the availability of infectious disease counts in time and space has increased, and consequently, there has been renewed interest in model formulation for such data. In this paper, we describe a model that was motivated by the need to analyze hand, foot, and mouth disease surveillance data in China. The data are aggregated by geographical areas and by week, with the aims of the analysis being to gain insight into the space-time dynamics and to make short-term predictions, which will aid in the implementation of public health campaigns in those areas with a large predicted disease burden. The model we develop decomposes disease-risk into marginal spatial and temporal components and a space-time interaction piece. The latter is the crucial element, and we use a tensor product spline model with a Markov random field prior on the coefficients of the basis functions. The model can be formulated as a Gaussian Markov random field and so fast computation can be carried out using the integrated nested Laplace approximation approach. A simulation study shows that the model can pick up complex space-time structure and our analysis of hand, foot, and mouth disease data in the central north region of China provides new insights into the dynamics of the disease.
Collapse
|
53
|
Schaafsma T, Wakefield J, Hanisch R, Bray F, Schüz J, Joy EJM, Watts MJ, McCormack V. Africa's Oesophageal Cancer Corridor: Geographic Variations in Incidence Correlate with Certain Micronutrient Deficiencies. PLoS One 2015; 10:e0140107. [PMID: 26448405 PMCID: PMC4598094 DOI: 10.1371/journal.pone.0140107] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2015] [Accepted: 09/21/2015] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND The aetiology of Africa's easterly-lying corridor of squamous cell oesophageal cancer is poorly understood. Micronutrient deficiencies have been implicated in this cancer in other areas of the world, but their role in Africa is unclear. Without prospective cohorts, timely insights can instead be gained through ecological studies. METHODS Across Africa we assessed associations between a country's oesophageal cancer incidence rate and food balance sheet-derived estimates of mean national dietary supplies of 7 nutrients: calcium (Ca), copper (Cu), iron (Fe), iodine (I), magnesium (Mg), selenium (Se) and zinc (Zn). We included 32 countries which had estimates of dietary nutrient supplies and of better-quality GLOBCAN 2012 cancer incidence rates. Bayesian hierarchical Poisson lognormal models were used to estimate incidence rate ratios for oesophageal cancer associated with each nutrient, adjusted for age, gender, energy intake, phytate, smoking and alcohol consumption, as well as their 95% posterior credible intervals (CI). Adult dietary deficiencies were quantified using an estimated average requirements (EAR) cut-point approach. RESULTS Adjusted incidence rate ratios for oesophageal cancer associated with a doubling of mean nutrient supply were: for Fe 0.49 (95% CI: 0.29-0.82); Mg 0.58 (0.31-1.08); Se 0.40 (0.18-0.90); and Zn 0.29 (0.11-0.74). There were no associations with Ca, Cu and I. Mean national nutrient supplies exceeded adult EARs for Mg and Fe in most countries. For Se, mean supplies were less than EARs (both sexes) in 7 of the 10 highest oesophageal cancer ranking countries, compared to 23% of remaining countries. For Zn, mean supplies were less than the male EARs in 8 of these 10 highest ranking countries compared to in 36% of other countries. CONCLUSIONS Ecological associations are consistent with the potential role of Se and/or Zn deficiencies in squamous cell oesophageal cancer in Africa. Individual-level analytical studies are needed to elucidate their causal role in this setting.
Collapse
|
54
|
Ross M, Wakefield J. Bayesian hierarchical models for smoothing in two-phase studies, with application to small area estimation. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2015; 178:1009-1023. [PMID: 26705382 PMCID: PMC4687749 DOI: 10.1111/rssa.12103] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Two-phase study designs are appealing since they allow for the oversampling of rare sub-populations which improves efficiency. In this paper we describe a Bayesian hierarchical model for the analysis of two-phase data. Such a model is particularly appealing in a spatial setting in which random effects are introduced to model between-area variability. In such a situation, one may be interested in estimating regression coefficients or, in the context of small area estimation, in reconstructing the population totals by strata. The efficiency gains of the two-phase sampling scheme are compared to standard approaches using 2011 birth data from the research triangle area of North Carolina. We show that the proposed method can overcome small sample difficulties and improve on existing techniques. We conclude that the two-phase design is an attractive approach for small area estimation.
Collapse
|
55
|
Abstract
Gene expression levels are determined by the balance between rates of mRNA transcription and decay, and genetic variation in either of these processes can result in heritable differences in transcript abundance. Although the genetics of gene expression has been a subject of intense interest, the contribution of heritable variation in mRNA decay rates to gene expression variation has received far less attention. To this end, we developed a novel statistical framework and measured allele-specific differences in mRNA decay rates in a diploid yeast hybrid created by mating two genetically diverse parental strains. We estimate that 31% of genes exhibit allelic differences in mRNA decay rates, of which 350 can be identified at a false discovery rate of 10%. Genes with significant allele-specific differences in mRNA decay rates have higher levels of polymorphism compared to other genes, with all gene regions contributing to allelic differences in mRNA decay rates. Strikingly, we find widespread evidence for compensatory evolution, such that variants influencing transcriptional initiation and decay have opposite effects, suggesting that steady-state gene expression levels are subject to pervasive stabilizing selection. Our results demonstrate that heritable differences in mRNA decay rates are widespread and are an important target for natural selection to maintain or fine-tune steady-state gene expression levels.
Collapse
|
56
|
Chen C, Wakefield J, Lumely T. The use of sampling weights in Bayesian hierarchical models for small area estimation. Spat Spatiotemporal Epidemiol 2014; 11:33-43. [PMID: 25457595 DOI: 10.1016/j.sste.2014.07.002] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Revised: 05/22/2014] [Accepted: 07/12/2014] [Indexed: 10/24/2022]
Abstract
Hierarchical modeling has been used extensively for small area estimation. However, design weights that are required to reflect complex surveys are rarely considered in these models. We develop computationally efficient, Bayesian spatial smoothing models that acknowledge the design weights. Computation is carried out using the integrated nested Laplace approximation, which is fast. An extensive simulation study is presented that considers the effects of non-response and non-random selection of individuals, allowing examination of the impact of ignoring the design weights and the benefits of spatial smoothing. The results show that, when compared with standard approaches, mean squared error can be greatly reduced with the proposed methods. Bias reduction occurs through the inclusion of the design weights, with variance reduction being achieved through hierarchical smoothing. We analyze data from the Washington State 2006 Behavioral Risk Factor Surveillance System. The models are easily and quickly fitted within the R environment, using existing packages.
Collapse
|
57
|
Connelly CF, Wakefield J, Akey JM. Evolution and genetic architecture of chromatin accessibility and function in yeast. PLoS Genet 2014; 10:e1004427. [PMID: 24992477 PMCID: PMC4081003 DOI: 10.1371/journal.pgen.1004427] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 04/23/2014] [Indexed: 01/23/2023] Open
Abstract
Chromatin accessibility is an important functional genomics phenotype that influences transcription factor binding and gene expression. Genome-scale technologies allow chromatin accessibility to be mapped with high-resolution, facilitating detailed analyses into the genetic architecture and evolution of chromatin structure within and between species. We performed Formaldehyde-Assisted Isolation of Regulatory Elements sequencing (FAIRE-Seq) to map chromatin accessibility in two parental haploid yeast species, Saccharomyces cerevisiae and Saccharomyces paradoxus and their diploid hybrid. We show that although broad-scale characteristics of the chromatin landscape are well conserved between these species, accessibility is significantly different for 947 regions upstream of genes that are enriched for GO terms such as intracellular transport and protein localization exhibit. We also develop new statistical methods to investigate the genetic architecture of variation in chromatin accessibility between species, and find that cis effects are more common and of greater magnitude than trans effects. Interestingly, we find that cis and trans effects at individual genes are often negatively correlated, suggesting widespread compensatory evolution to stabilize levels of chromatin accessibility. Finally, we demonstrate that the relationship between chromatin accessibility and gene expression levels is complex, and a significant proportion of differences in chromatin accessibility might be functionally benign.
Collapse
|
58
|
Psoter KJ, Rosenfeld M, De Roos AJ, Mayer JD, Wakefield J. Differential geographical risk of initial Pseudomonas aeruginosa acquisition in young US children with cystic fibrosis. Am J Epidemiol 2014; 179:1503-13. [PMID: 24875373 DOI: 10.1093/aje/kwu077] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Pseudomonas aeruginosa is the sentinel respiratory pathogen in cystic fibrosis patients. We conducted a retrospective study to examine whether state of residence affected risk of P. aeruginosa acquisition among US children under 6 years of age with cystic fibrosis by using data from the Cystic Fibrosis Foundation National Patient Registry, 2003-2009. The outcome was time to first isolation of P. aeruginosa from a respiratory culture. We used a Bayesian hierarchical Weibull regression model with interval-censored outcomes. Spatial random effects, included at the state level and modeled using an intrinsic conditional autoregressive prior, allowed estimation of the residual spatial correlation. The regression portion of the model was adjusted for demographic and disease characteristics potentially affecting P. aeruginosa acquisition. A total of 3,608 children met the inclusion criteria and were followed for an average of 2.1 (standard deviation, 1.6) years. P. aeruginosa was cultured in 1,714 (48%) subjects. There was a moderately elevated spatial residual relative risk. An estimated 95% credible interval for the residual hazard ratio under 1 of the fitted models was 0.64-1.57; the strongest positive association was observed in the Southern states. The fact that risk for P. aeruginosa acquisition displayed spatial dependence suggests that regional factors, such as climate, may play an important role in P. aeruginosa acquisition.
Collapse
|
59
|
Mercer L, Wakefield J, Chen C, Lumley T. A comparison of spatial smoothing methods for small area estimation with sampling weights. SPATIAL STATISTICS 2014; 8:69-85. [PMID: 24959396 PMCID: PMC4064473 DOI: 10.1016/j.spasta.2013.12.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Small area estimation (SAE) is an important endeavor in many fields and is used for resource allocation by both public health and government organizations. Often, complex surveys are carried out within areas, in which case it is common for the data to consist only of the response of interest and an associated sampling weight, reflecting the design. While it is appealing to use spatial smoothing models, and many approaches have been suggested for this endeavor, it is rare for spatial models to incorporate the weighting scheme, leaving the analysis potentially subject to bias. To examine the properties of various approaches to estimation we carry out a simulation study, looking at bias due to both non-response and non-random sampling. We also carry out SAE of smoking prevalence in Washington State, at the zip code level, using data from the 2006 Behavioral Risk Factor Surveillance System. The computation times for the methods we compare are short, and all approaches are implemented in R using currently available packages.
Collapse
|
60
|
Wakefield J, Skrivankova V, Hsu FC, Sale M, Heagerty P. Detecting signals in pharmacogenomic genome-wide association studies. THE PHARMACOGENOMICS JOURNAL 2014; 14:309-15. [PMID: 24394200 PMCID: PMC4085158 DOI: 10.1038/tpj.2013.44] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/20/2013] [Revised: 11/02/2013] [Accepted: 11/12/2013] [Indexed: 11/24/2022]
Abstract
In one common pharmacogenomic scenario, outcome measures are compared for treated and untreated subjects across genotype defined subgroups. The key question is whether treatment benefit (or harm) is particularly strong in certain subgroups, and therefore statistical analysis focuses on the interaction between treatment and genotype. However, genome-wide analysis in such scenarios requires careful statistical thought since, in addition to the usual problems of multiple testing, the marker-defined sample sizes, and therefore power, vary across the individual genotypes being evaluated. The variability in power means the usual practice of using a common p-value threshold across tests has difficulties. The reason is that the use of a fixed threshold, with variable power, implies that the costs of type I and type II errors are varying across tests in a manner which is implicit rather than dictated by the analyst. In this paper we discuss this problem and describe an easily implementable solution based on Bayes factors. We pay particular attention to the specification of priors, which is not a straightforward task. The methods are illustrated using data from a randomized controlled clinical trial in which homocysteine levels are compared in individuals receiving low and high doses of folate supplements and across marker subgroups. The method we describe is implemented in the R computing environment with code available from http://faculty.washington.edu/jonno/cv.html.
Collapse
|
61
|
Hamilton TW, Hutchings L, Alsousou J, Tutton E, Hodson E, Smith CH, Wakefield J, Gray B, Symonds S, Willett K. The treatment of stable paediatric forearm fractures using a cast that may be removed at home. Bone Joint J 2013; 95-B:1714-20. [DOI: 10.1302/0301-620x.95b12.31299] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We investigated whether, in the management of stable paediatric fractures of the forearm, flexible casts that can be removed at home are as clinically effective, cost-effective and acceptable to both patient and parent as management using a cast conventionally removed in hospital. A single-centre randomised controlled trial was performed on 317 children with a mean age of 9.3 years (2 to 16). No significant differences were seen in the change in Childhood Health Assessment Questionnaire index score (p = 0.10) or EuroQol 5-Dimensions domain scores between the two groups one week after removal of the cast or the absolute scores at six months. There was a significantly lower overall median treatment cost in the group whose casts were removed at home (£150.88 (sem 1.90) vs £251.62 (sem 2.68); p < 0.001). No difference was seen in satisfaction between the two groups (p = 0.48). Cite this article: Bone Joint J 2013;95-B:1714–20.
Collapse
|
62
|
|
63
|
Psoter KJ, De Roos AJ, Wakefield J, Mayer J, Rosenfeld M. Season is associated with Pseudomonas aeruginosa acquisition in young children with cystic fibrosis. Clin Microbiol Infect 2013; 19:E483-9. [PMID: 23795938 DOI: 10.1111/1469-0691.12272] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2013] [Revised: 03/19/2013] [Accepted: 05/17/2013] [Indexed: 11/29/2022]
Abstract
Pseudomonas aeruginosa, the principal respiratory pathogen in cystic fibrosis (CF) patients, is ubiquitous in the environment. Initial P. aeruginosa isolates in CF patients are generally environmental in nature. However, little information regarding seasonality of P. aeruginosa acquisition is available. We conducted a retrospective study to evaluate the seasonality of initial P. aeruginosa acquisition in young children with CF in the USA using the Cystic Fibrosis Foundation National Patient Registry from 2003 to 2009. Additionally, we assessed whether seasonal acquisition varied by climate zone. A total of 4123 children met inclusion criteria and 45% (n = 1866) acquired P. aeruginosa during a mean 2.0 years (SD 0.2 years) of follow up. Compared with winter, increased P. aeruginosa acquisition was observed in summer (incidence rate ratio (IRR): 1.22; 95% CI: 1.07-1.40) and autumn (IRR: 1.34; 95% CI: 1.18-1.52), with lower acquisition observed in spring (IRR: 0.81; 95% CI: 0.70-0.94). Seasonal variations in P. aeruginosa acquisition rates in the temperate and continental climate zones were similar to those in the overall cohort. In contrast, no significant seasonal effect was observed in the dry climate zone. In a corresponding analysis, no seasonal difference was observed in the rate of acquisition of Staphylococcus aureus, another common CF respiratory pathogen. These results provide preliminary support that climatic factors may be associated with initial P. aeruginosa acquisition in CF patients. Investigation and identification of specific risk factors, as well as awareness of seasonal variation, could potentially inform clinical recommendations including increased awareness of infection control and prevention strategies.
Collapse
|
64
|
Skelly DA, Merrihew GE, Riffle M, Connelly CF, Kerr EO, Johansson M, Jaschob D, Graczyk B, Shulman NJ, Wakefield J, Cooper SJ, Fields S, Noble WS, Muller EGD, Davis TN, Dunham MJ, Maccoss MJ, Akey JM. Integrative phenomics reveals insight into the structure of phenotypic diversity in budding yeast. Genome Res 2013; 23:1496-504. [PMID: 23720455 PMCID: PMC3759725 DOI: 10.1101/gr.155762.113] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
To better understand the quantitative characteristics and structure of phenotypic diversity, we measured over 14,000 transcript, protein, metabolite, and morphological traits in 22 genetically diverse strains of Saccharomyces cerevisiae. More than 50% of all measured traits varied significantly across strains [false discovery rate (FDR) = 5%]. The structure of phenotypic correlations is complex, with 85% of all traits significantly correlated with at least one other phenotype (median = 6, maximum = 328). We show how high-dimensional molecular phenomics data sets can be leveraged to accurately predict phenotypic variation between strains, often with greater precision than afforded by DNA sequence information alone. These results provide new insights into the spectrum and structure of phenotypic diversity and the characteristics influencing the ability to accurately predict phenotypes.
Collapse
|
65
|
Ross M, Wakefield J. Bayesian inference for two-phase studies with categorical covariates. Biometrics 2013; 69:469-77. [PMID: 23607570 DOI: 10.1111/biom.12019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Revised: 09/01/2012] [Accepted: 11/01/2012] [Indexed: 11/30/2022]
Abstract
In this article, we consider two-phase sampling in the situation in which all covariates are categorical. Two-phase designs are appealing from an efficiency perspective since they allow sampling to be concentrated in informative cells. A number of likelihood-based methods have been developed for the analysis of two-phase data, but we describe a Bayesian approach which has previously been unavailable. The methods are first compared with existing approaches via a simulation study, and are then applied to data collected on Wilms tumor. The benefits of a Bayesian approach include relaxation of the reliance on asymptotic inference, particularly in sparse data situations, and the potential to model data with complex dependencies, for example, via the introduction of random effects. The sparse data situation is illustrated via a simulated example.
Collapse
|
66
|
Islami F, Pourshams A, Vedanthan R, Poustchi H, Kamangar F, Golozar A, Etemadi A, Khademi H, Freedman ND, Merat S, Garg V, Fuster V, Wakefield J, Dawsey SM, Pharoah P, Brennan P, Abnet CC, Malekzadeh R, Boffetta P. Smoking water-pipe, chewing nass and prevalence of heart disease: a cross-sectional analysis of baseline data from the Golestan Cohort Study, Iran. Heart 2013; 99:272-8. [PMID: 23257174 PMCID: PMC3671096 DOI: 10.1136/heartjnl-2012-302861] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE Water-pipe and smokeless tobacco use have been associated with several adverse health outcomes. However, little information is available on the association between water-pipe use and heart disease (HD). Therefore, we investigated the association of smoking water-pipe and chewing nass (a mixture of tobacco, lime and ash) with prevalent HD. DESIGN Cross-sectional study. SETTING Baseline data (collected in 2004-2008) from a prospective population-based study in Golestan Province, Iran. PARTICIPANTS 50 045 residents of Golestan (40-75 years old; 42.4% men). MAIN OUTCOME MEASURES ORs and 95% CIs from multivariate logistic regression models for the association of water-pipe and nass use with HD prevalence. RESULTS A total of 3051 (6.1%) participants reported a history of HD, and 525 (1.1%) and 3726 (7.5%) reported ever water-pipe or nass use, respectively. Heavy water-pipe smoking was significantly associated with HD prevalence (highest level of cumulative use vs never use, OR=3.75; 95% CI 1.52 to 9.22; p for trend=0.04). This association persisted when using different cut-off points, when restricting HD to those taking nitrate compound medications, and among never cigarette smokers. There was no significant association between nass use and HD prevalence (highest category of use vs never use, OR=0.91; 95% CI 0.69 to 1.20). CONCLUSIONS Our study suggests a significant association between HD and heavy water-pipe smoking. Although the existing evidence suggesting similar biological consequences of water-pipe and cigarette smoking make this association plausible, results of our study were based on a modest number of water-pipe users and need to be replicated in further studies.
Collapse
|
67
|
Méheust D, Le Cann P, Reponen T, Wakefield J, Vesper S, Gangneux JP. Possible application of the environmental relative moldiness index (ermi) in brittany, france. J Mycol Med 2012. [DOI: 10.1016/j.mycmed.2012.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
68
|
de Vocht F, Cherry N, Wakefield J. A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2012; 22:352-60. [PMID: 22588215 DOI: 10.1038/jes.2012.22] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Predisposition to a disease is usually caused by cumulative effects of a multitude of exposures and lifestyle factors in combination with individual susceptibility. Failure to include all relevant variables may result in biased risk estimates and decreased power, whereas inclusion of all variables may lead to computational difficulties, especially when variables are correlated. We describe a Bayesian Mixture Model (BMM) incorporating a variable-selection prior and compared its performance with logistic multiple regression model (LM) in simulated case-control data with up to twenty exposures with varying prevalences and correlations. In addition, as a practical example we re analyzed data on male infertility and occupational exposures (Chaps-UK). BMM mean-squared errors (MSE) were smaller than of the LM, and were independent of the number of model parameters. BMM type I errors were minimal (≤1), whereas for the LM this increased with the number of parameters and correlation between exposures. The numbers of type II errors were comparable. Re analysis of Chaps-UK data demonstrated more convincingly than by using a LM that occupational exposure to glycol ethers and VOCs are likely risk factors for male infertility. This BMM proves an appealing alternative to standard logistic regression when dealing with the analysis of (correlated) exposures in case-control studies.
Collapse
|
69
|
Johansson M, Roberts A, Chen D, Li Y, Delahaye-Sourdeix M, Aswani N, Greenwood MA, Benhamou S, Lagiou P, Holcátová I, Richiardi L, Kjaerheim K, Agudo A, Castellsagué X, Macfarlane TV, Barzan L, Canova C, Thakker NS, Conway DI, Znaor A, Healy CM, Ahrens W, Zaridze D, Szeszenia-Dabrowska N, Lissowska J, Fabiánová E, Mates IN, Bencko V, Foretova L, Janout V, Curado MP, Koifman S, Menezes A, Wünsch-Filho V, Eluf-Neto J, Boffetta P, Franceschi S, Herrero R, Fernandez Garrote L, Talamini R, Boccia S, Galan P, Vatten L, Thomson P, Zelenika D, Lathrop M, Byrnes G, Cunningham H, Brennan P, Wakefield J, Mckay JD. Using prior information from the medical literature in GWAS of oral cancer identifies novel susceptibility variant on chromosome 4--the AdAPT method. PLoS One 2012; 7:e36888. [PMID: 22662130 PMCID: PMC3360735 DOI: 10.1371/journal.pone.0036888] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 04/09/2012] [Indexed: 11/18/2022] Open
Abstract
Background Genome-wide association studies (GWAS) require large sample sizes to obtain adequate statistical power, but it may be possible to increase the power by incorporating complementary data. In this study we investigated the feasibility of automatically retrieving information from the medical literature and leveraging this information in GWAS. Methods We developed a method that searches through PubMed abstracts for pre-assigned keywords and key concepts, and uses this information to assign prior probabilities of association for each single nucleotide polymorphism (SNP) with the phenotype of interest - the Adjusting Association Priors with Text (AdAPT) method. Association results from a GWAS can subsequently be ranked in the context of these priors using the Bayes False Discovery Probability (BFDP) framework. We initially tested AdAPT by comparing rankings of known susceptibility alleles in a previous lung cancer GWAS, and subsequently applied it in a two-phase GWAS of oral cancer. Results Known lung cancer susceptibility SNPs were consistently ranked higher by AdAPT BFDPs than by p-values. In the oral cancer GWAS, we sought to replicate the top five SNPs as ranked by AdAPT BFDPs, of which rs991316, located in the ADH gene region of 4q23, displayed a statistically significant association with oral cancer risk in the replication phase (per-rare-allele log additive p-value [ptrend] = 2.5×10−3). The combined OR for having one additional rare allele was 0.83 (95% CI: 0.76–0.90), and this association was independent of previously identified susceptibility SNPs that are associated with overall UADT cancer in this gene region. We also investigated if rs991316 was associated with other cancers of the upper aerodigestive tract (UADT), but no additional association signal was found. Conclusion This study highlights the potential utility of systematically incorporating prior knowledge from the medical literature in genome-wide analyses using the AdAPT methodology. AdAPT is available online (url: http://services.gate.ac.uk/lld/gwas/service/config).
Collapse
|
70
|
Fong Y, Wakefield J, De Rosa S, Frahm N. A robust bayesian random effects model for nonlinear calibration problems. Biometrics 2012; 68:1103-12. [PMID: 22551415 DOI: 10.1111/j.1541-0420.2012.01762.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
In the context of a bioassay or an immunoassay, calibration means fitting a curve, usually nonlinear, through the observations collected on a set of samples containing known concentrations of a target substance, and then using the fitted curve and observations collected on samples of interest to predict the concentrations of the target substance in these samples. Recent technological advances have greatly improved our ability to quantify minute amounts of substance from a tiny volume of biological sample. This has in turn led to a need to improve statistical methods for calibration. In this article, we focus on developing calibration methods robust to dependent outliers. We introduce a novel normal mixture model with dependent error terms to model the experimental noise. In addition, we propose a reparameterization of the five parameter logistic nonlinear regression model that allows us to better incorporate prior information. We examine the performance of our methods with simulation studies and show that they lead to a substantial increase in performance measured in terms of mean squared error of estimation and a measure of the average prediction accuracy. A real data example from the HIV Vaccine Trials Network Laboratory is used to illustrate the methods.
Collapse
|
71
|
Wakefield J. Commentary: Genome-wide significance thresholds via Bayes factors. Int J Epidemiol 2012; 41:286-91. [PMID: 22345299 DOI: 10.1093/ije/dyr241] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
72
|
Fong Y, Wakefield J, Rice K. An Efficient Markov Chain Monte Carlo Method for Mixture Models by Neighborhood Pruning. J Comput Graph Stat 2012. [DOI: 10.1198/jcgs.2011.09187] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
73
|
Wakefield J. Nonparametric Regression with Multiple Predictors. SPRINGER SERIES IN STATISTICS 2012. [DOI: 10.1007/978-1-4419-0925-1_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
|
74
|
Wang Y, Feng Z, Yang Y, Self S, Gao Y, Longini IM, Wakefield J, Zhang J, Wang L, Chen X, Yao L, Stanaway JD, Wang Z, Yang W. Hand, foot, and mouth disease in China: patterns of spread and transmissibility. Epidemiology 2011; 22:781-92. [PMID: 21968769 PMCID: PMC3246273 DOI: 10.1097/ede.0b013e318231d67a] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
BACKGROUND There were large outbreaks of hand, foot, and mouth disease in both 2008 and 2009 in China. METHODS Using the national surveillance data since 2 May 2008, we summarized the epidemiologic characteristics of the recent outbreaks. Using a susceptible-infectious-recovered transmission model, we evaluated the transmissibility of the disease and potential risk factors. RESULTS Children ages 1.0 to 2.9 years were the most susceptible to hand, foot, and mouth disease (odds ratios [OR] >2.3 as compared with other age-groups). Infant cases had the highest incidences of severe disease (ORs >1.4) and death (ORs >2.4), as well as the longest delay from symptom onset to diagnosis (2.3 days). Boys were more susceptible than girls (OR = 1.56 [95% confidence interval = 1.56-1.57]). A 1-day delay in diagnosis was associated with increases in the odds of severe disease by 40% (39%-42%) and in the odds of death by 54% (44%-65%). Compared with Coxsackie A16, enterovirus 71 is more strongly associated with severe disease (OR = 16 [13-18]) and death (OR = 40 [13-127]). The estimated local effective reproductive numbers among prefectures ranged from 1.4 to 1.6 (median = 1.4) in spring and stayed below 1.2 in other seasons. A higher risk of transmission was associated with temperatures in the range of 70° F to 80°F, higher relative humidity, higher [corrected] wind speed, more precipitation, greater population density, and [corrected] periods during which schools were open. CONCLUSION Hand, foot, and mouth disease is a moderately transmittable infectious disease, mainly among preschool children. Enterovirus 71 was responsible for most severe cases and fatalities. Mixing of asymptomatically infected children in schools might have contributed to spread the of infection. Timely diagnosis may be [corrected] key to reducing the high mortality rate in infants.
Collapse
|
75
|
Skelly DA, Johansson M, Madeoy J, Wakefield J, Akey JM. A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Res 2011; 21:1728-37. [PMID: 21873452 PMCID: PMC3202289 DOI: 10.1101/gr.119784.110] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 07/12/2011] [Indexed: 11/24/2022]
Abstract
Variation in gene expression is thought to make a significant contribution to phenotypic diversity among individuals within populations. Although high-throughput cDNA sequencing offers a unique opportunity to delineate the genome-wide architecture of regulatory variation, new statistical methods need to be developed to capitalize on the wealth of information contained in RNA-seq data sets. To this end, we developed a powerful and flexible hierarchical Bayesian model that combines information across loci to allow both global and locus-specific inferences about allele-specific expression (ASE). We applied our methodology to a large RNA-seq data set obtained in a diploid hybrid of two diverse Saccharomyces cerevisiae strains, as well as to RNA-seq data from an individual human genome. Our statistical framework accurately quantifies levels of ASE with specified false-discovery rates, achieving high reproducibility between independent sequencing platforms. We pinpoint loci that show unusual and biologically interesting patterns of ASE, including allele-specific alternative splicing and transcription termination sites. Our methodology provides a rigorous, quantitative, and high-resolution tool for profiling ASE across whole genomes.
Collapse
|