1
|
Lee SY. Using Bayesian statistics in confirmatory clinical trials in the regulatory setting: a tutorial review. BMC Med Res Methodol 2024; 24:110. [PMID: 38714936 PMCID: PMC11077897 DOI: 10.1186/s12874-024-02235-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 04/24/2024] [Indexed: 05/12/2024] Open
Abstract
Bayesian statistics plays a pivotal role in advancing medical science by enabling healthcare companies, regulators, and stakeholders to assess the safety and efficacy of new treatments, interventions, and medical procedures. The Bayesian framework offers a unique advantage over the classical framework, especially when incorporating prior information into a new trial with quality external data, such as historical data or another source of co-data. In recent years, there has been a significant increase in regulatory submissions using Bayesian statistics due to its flexibility and ability to provide valuable insights for decision-making, addressing the modern complexity of clinical trials where frequentist trials are inadequate. For regulatory submissions, companies often need to consider the frequentist operating characteristics of the Bayesian analysis strategy, regardless of the design complexity. In particular, the focus is on the frequentist type I error rate and power for all realistic alternatives. This tutorial review aims to provide a comprehensive overview of the use of Bayesian statistics in sample size determination, control of type I error rate, multiplicity adjustments, external data borrowing, etc., in the regulatory environment of clinical trials. Fundamental concepts of Bayesian sample size determination and illustrative examples are provided to serve as a valuable resource for researchers, clinicians, and statisticians seeking to develop more complex and innovative designs.
Collapse
Affiliation(s)
- Se Yoon Lee
- Department of Statistics, Texas A &M University, 3143 TAMU, College Station, TX, 77843, USA.
| |
Collapse
|
2
|
Banerjee S. Discussion of "Optimal test procedures for multiple hypotheses controlling the familywise expected loss" by Willi Maurer, Frank Bretz, and Xiaolei Xun. Biometrics 2023; 79:2798-2801. [PMID: 37463841 PMCID: PMC10794545 DOI: 10.1111/biom.13908] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 11/01/2022] [Indexed: 07/20/2023]
Affiliation(s)
- Sudipto Banerjee
- Department of Biostatistics, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
3
|
Masharipov R, Knyazeva I, Nikolaev Y, Korotkov A, Didur M, Cherednichenko D, Kireev M. Providing Evidence for the Null Hypothesis in Functional Magnetic Resonance Imaging Using Group-Level Bayesian Inference. Front Neuroinform 2021; 15:738342. [PMID: 34924989 PMCID: PMC8674455 DOI: 10.3389/fninf.2021.738342] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/05/2021] [Indexed: 11/23/2022] Open
Abstract
Classical null hypothesis significance testing is limited to the rejection of the point-null hypothesis; it does not allow the interpretation of non-significant results. This leads to a bias against the null hypothesis. Herein, we discuss statistical approaches to ‘null effect’ assessment focusing on the Bayesian parameter inference (BPI). Although Bayesian methods have been theoretically elaborated and implemented in common neuroimaging software packages, they are not widely used for ‘null effect’ assessment. BPI considers the posterior probability of finding the effect within or outside the region of practical equivalence to the null value. It can be used to find both ‘activated/deactivated’ and ‘not activated’ voxels or to indicate that the obtained data are not sufficient using a single decision rule. It also allows to evaluate the data as the sample size increases and decide to stop the experiment if the obtained data are sufficient to make a confident inference. To demonstrate the advantages of using BPI for fMRI data group analysis, we compare it with classical null hypothesis significance testing on empirical data. We also use simulated data to show how BPI performs under different effect sizes, noise levels, noise distributions and sample sizes. Finally, we consider the problem of defining the region of practical equivalence for BPI and discuss possible applications of BPI in fMRI studies. To facilitate ‘null effect’ assessment for fMRI practitioners, we provide Statistical Parametric Mapping 12 based toolbox for Bayesian inference.
Collapse
Affiliation(s)
- Ruslan Masharipov
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Irina Knyazeva
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Yaroslav Nikolaev
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Alexander Korotkov
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Michael Didur
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Denis Cherednichenko
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| | - Maxim Kireev
- N. P. Bechtereva Institute of the Human Brain, Russian Academy of Sciences, Saint Petersburg, Russia
| |
Collapse
|
4
|
Lock EF, Bandyopadhyay D. Bayesian nonparametric multiway regression for clustered binomial data. Stat (Int Stat Inst) 2021; 10. [DOI: 10.1002/sta4.378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Eric F. Lock
- Division of Biostatistics University of Minnesota Minneapolis Minnesota USA
| | | |
Collapse
|
5
|
Pereira LA, Taylor-Rodríguez D, Gutiérrez L. A Bayesian nonparametric testing procedure for paired samples. Biometrics 2020; 76:1133-1146. [PMID: 32012223 DOI: 10.1111/biom.13234] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 12/17/2019] [Accepted: 01/24/2020] [Indexed: 11/30/2022]
Abstract
We propose a Bayesian hypothesis testing procedure for comparing the distributions of paired samples. The procedure is based on a flexible model for the joint distribution of both samples. The flexibility is given by a mixture of Dirichlet processes. Our proposal uses a spike-slab prior specification for the base measure of the Dirichlet process and a particular parametrization for the kernel of the mixture in order to facilitate comparisons and posterior inference. The joint model allows us to derive the marginal distributions and test whether they differ or not. The procedure exploits the correlation between samples, relaxes the parametric assumptions, and detects possible differences throughout the entire distributions. A Monte Carlo simulation study comparing the performance of this strategy to other traditional alternatives is provided. Finally, we apply the proposed approach to spirometry data collected in the United States to investigate changes in pulmonary function in children and adolescents in response to air polluting factors.
Collapse
Affiliation(s)
| | | | - Luis Gutiérrez
- Departamento de Estadística, Pontificia Universidad Católica de Chile, Santiago, Chile.,Millennium Nucleus Center for the Discovery of Structures in Complex Data, Santiago, Chile
| |
Collapse
|
6
|
Identifying differentially expressed genes using the Polya urn scheme. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2017. [DOI: 10.29220/csam.2017.24.6.627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
Guhaniyogi R. Bayesian nonparametric areal wombling for small-scale maps with an application to urinary bladder cancer data from Connecticut. Stat Med 2017; 36:4007-4027. [DOI: 10.1002/sim.7408] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Rajarshi Guhaniyogi
- Department of Applied Mathematics & Statistics; University of California; Santa Cruz 95064 SOE2 CA USA
| |
Collapse
|
8
|
Shang K, Reilly C. Non parametric Bayesian analysis of the two-sample problem with censoring. COMMUN STAT-THEOR M 2017. [DOI: 10.1080/03610926.2017.1288249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Kan Shang
- Department of Biomedical Informatics and Computational Biology and Division of Biostatistics, University of Minnesota Twin Cities, Minneapolis, MN, USA
| | - Cavan Reilly
- Division of Biostatistics, University of Minnesota Twin Cities, Minneapolis, MN, USA
| |
Collapse
|
9
|
|
10
|
Saraiva EF, Louzada F. A gene-by-gene multiple comparison analysis: A predictive Bayesian approach. BRAZ J PROBAB STAT 2015. [DOI: 10.1214/13-bjps233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
11
|
Abstract
Selection of the most important predictor variables in regression analysis is one of the key problems statistical research has been concerned with for long time. In this article, we propose the methodology, Dirichlet Lasso (abbreviated as DLASSO) to address this issue in a Bayesian framework. In many modern regression settings, large set of predictor variables are grouped and the coefficients belonging to any one of these groups are either all redundant or all important in predicting the response; we say in those cases that the predictors exhibit a group structure. We show that DLASSO is particularly useful where the group structure is not fully known. We exploit the clustering property of Dirichlet Process priors to infer the possibly missing group information. The Dirichlet Process has the advantage of simultaneously clustering the variable coefficients and selecting the best set of predictor variables. We compare the predictive performance of DLASSO to Group Lasso and ordinary Lasso with real data and simulation studies. Our results demonstrate that the predictive performance of DLASSO is almost as good as that of Group Lasso when group label information is given; and superior to the ordinary Lasso for missing group information. For high dimensional data (e.g., genetic data) with missing group information, DLASSO will be a powerful approach of variable selection since it provides a superior predictive performance and higher statistical accuracy.
Collapse
Affiliation(s)
- Kiranmoy Das
- Bayesian and Interdisciplinary Research Unit, Indian Statistical Institute, Kolkata, India
| | - Marc Sobel
- Department of Statistics, Temple University, Philadelphia, USA
| |
Collapse
|
12
|
Shahbaba B, Johnson WO. Bayesian nonparametric variable selection as an exploratory tool for discovering differentially expressed genes. Stat Med 2013; 32:2114-26. [PMID: 23172736 DOI: 10.1002/sim.5680] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 10/28/2012] [Indexed: 01/15/2023]
Abstract
High-throughput scientific studies involving no clear a priori hypothesis are common. For example, a large-scale genomic study of a disease may examine thousands of genes without hypothesizing that any specific gene is responsible for the disease. In these studies, the objective is to explore a large number of possible factors (e.g., genes) in order to identify a small number that will be considered in follow-up studies that tend to be more thorough and on smaller scales. A simple, hierarchical, linear regression model with random coefficients is assumed for case-control data that correspond to each gene. The specific model used will be seen to be related to a standard Bayesian variable selection model. Relatively large regression coefficients correspond to potential differences in responses for cases versus controls and thus to genes that might 'matter'. For large-scale studies, and using a Dirichlet process mixture model for the regression coefficients, we are able to find clusters of regression effects of genes with increasing potential effect or 'relevance', in relation to the outcome of interest. One cluster will always correspond to genes whose coefficients are in a neighborhood that is relatively close to zero and will be deemed least relevant. Other clusters will correspond to increasing magnitudes of the random/latent regression coefficients. Using simulated data, we demonstrate that our approach could be quite effective in finding relevant genes compared with several alternative methods. We apply our model to two large-scale studies. The first study involves transcriptome analysis of infection by human cytomegalovirus. The second study's objective is to identify differentially expressed genes between two types of leukemia.
Collapse
Affiliation(s)
- Babak Shahbaba
- Department of Statistics, University of California at Irvine, CA, USA.
| | | |
Collapse
|
13
|
Schörgendorfer A, Branscum AJ, Hanson TE. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data. Biometrics 2013; 69:508-19. [PMID: 23489010 DOI: 10.1111/biom.12007] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Revised: 10/01/2012] [Accepted: 11/01/2012] [Indexed: 11/29/2022]
Abstract
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework.
Collapse
Affiliation(s)
- Angela Schörgendorfer
- IBM T.J. Watson Research Center, 1101 Kitchawan Road, Yorktown Heights, NY 10598, USA.
| | | | | |
Collapse
|
14
|
Berry D. Multiplicities in cancer research: ubiquitous and necessary evils. J Natl Cancer Inst 2012; 104:1124-32. [PMID: 22859849 DOI: 10.1093/jnci/djs301] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Scientific inquiry involves observations and measurements, some of which are planned and some of which are not. The most interesting or unusual observations might be regarded as discoveries and therefore particularly worthy of publication. However, the observational process is fraught with inferential land mines, especially if the discoveries are serendipitous. Multiple observations increase the probability of false-positive conclusions and have led to many false and otherwise misleading publications. Statisticians recommend adjustments to final inferences with the goal of reducing the rate of false positives, a strategy that increases the rate of false negatives. Some scientists object to making such adjustments, arguing that it should not be more difficult to determine the validity of a discovery simply because other observations were made. Which tack is right? How does one decide that any particular scientific discovery is real? Unfortunately, there is no panacea, no one-size-fits-all approach. The goal of this commentary is to elucidate the issues and provide recommendations for conducting and reporting results of empirical studies, with emphasis on the problems of multiple comparisons and other types of multiplicities, including what I call "silent multiplicities." Because of the many observations, outcomes, subsets, treatments, etc, that are typically made or addressed in epidemiology and biomarker research, these recommendations may be particularly relevant for such studies. However, the lessons apply quite generally. I consider both frequentist and Bayesian statistical approaches.
Collapse
Affiliation(s)
- Donald Berry
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, 1400 Pressler St, 4-5062 Pickens Academic Tower, Houston, TX 77030-1402, USA.
| |
Collapse
|
15
|
Curtis SM, Ghosh SK. A Bayesian Approach to Multicollinearity and the Simultaneous Selection and Clustering of Predictors in Linear Regression. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2011. [DOI: 10.1080/15598608.2011.10483741] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
16
|
Kruschke JK. Bayesian data analysis. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2010; 1:658-676. [PMID: 26271651 DOI: 10.1002/wcs.72] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Bayesian methods have garnered huge interest in cognitive science as an approach to models of cognition and perception. On the other hand, Bayesian methods for data analysis have not yet made much headway in cognitive science against the institutionalized inertia of 20th century null hypothesis significance testing (NHST). Ironically, specific Bayesian models of cognition and perception may not long endure the ravages of empirical verification, but generic Bayesian methods for data analysis will eventually dominate. It is time that Bayesian data analysis became the norm for empirical methods in cognitive science. This article reviews a fatal flaw of NHST and introduces the reader to some benefits of Bayesian data analysis. The article presents illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power. Copyright © 2010 John Wiley & Sons, Ltd. For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- John K Kruschke
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405-7007, USA
| |
Collapse
|
17
|
Bush CA, Lee J, MacEachern SN. Minimally informative prior distributions for non-parametric Bayesian analysis. J R Stat Soc Series B Stat Methodol 2010. [DOI: 10.1111/j.1467-9868.2009.00735.x] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
18
|
Abstract
We discuss a Bayesian discovery procedure for multiple comparison problems. We show that under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule based on a threshold of the posterior probability of the alternative. Under a semi-parametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure (ODP), recently introduced by Storey (2007a). Improving the approximation leads us to a Bayesian discovery procedure (BDP), which exploits the multiple shrinkage in clusters implied by the assumed nonparametric model. We compare the BDP and the ODP estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumor samples. We extend the setting of the ODP by discussing modifications of the loss function that lead to different single thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.
Collapse
|
19
|
Sebastiani P, Timofeev N, Dworkis DA, Perls TT, Steinberg MH. Genome-wide association studies and the genetic dissection of complex traits. Am J Hematol 2009; 84:504-15. [PMID: 19569043 PMCID: PMC2895326 DOI: 10.1002/ajh.21440] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The availability of affordable high throughput technology for parallel genotyping has opened the field of genetics to genome-wide association studies (GWAS), and in the last few years hundreds of articles reporting results of GWAS for a variety of heritable traits have been published. What do these results tell us? Although GWAS have discovered a few hundred reproducible associations, this number is underwhelming in relation to the huge amount of data produced, and challenges the conjecture that common variants may be the genetic causes of common diseases. We argue that the massive amount of genetic data that result from these studies remains largely unexplored and unexploited because of the challenge of mining and modeling enormous data sets, the difficulty of using nontraditional computational techniques and the focus of accepted statistical analyses on controlling the false positive rate rather than limiting the false negative rate. In this article, we will review the common approach to analysis of GWAS data and then discuss options to learn more from these data. We will use examples from our ongoing studies of sickle cell anemia and also GWAS in multigenic traits.
Collapse
Affiliation(s)
- Paola Sebastiani
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts 02118, USA.
| | | | | | | | | |
Collapse
|
20
|
Alber SA, Weiss RE. A model selection approach to analysis of variance and covariance. Stat Med 2009; 28:1821-40. [DOI: 10.1002/sim.3595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
21
|
Hamilton C, Bratcher TL, Stamey JD. Bayesian subset selection approach to ranking normal means. J Appl Stat 2008. [DOI: 10.1080/02664760802124174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
22
|
Abstract
Multiplicities are ubiquitous. They threaten every inference in every aspect of life. Despite the focus in statistics on multiplicities, statisticians underestimate their importance. One reason is that the focus is on methodology for known multiplicities. Silent multiplicities are much more important and they are insidious. Both frequentists and Bayesians have important contributions to make regarding problems of multiplicities. But neither group has an inside track. Frequentists and Bayesians working together is a promising way of making inroads into this knotty set of problems. Two experiments with identical results may well lead to very different statistical conclusions. So we will never be able to use a software package with default settings to resolve all problems of multiplicities. Every problem has unique aspects. And all problems require understanding the substantive area of application.
Collapse
Affiliation(s)
- Donald A Berry
- Department of Biostatistics, M.D. Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
23
|
Kim SY, Pritchard JK. Adaptive evolution of conserved noncoding elements in mammals. PLoS Genet 2007; 3:1572-86. [PMID: 17845075 PMCID: PMC1971121 DOI: 10.1371/journal.pgen.0030147] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2007] [Accepted: 07/13/2007] [Indexed: 02/07/2023] Open
Abstract
Conserved noncoding elements (CNCs) are an abundant feature of vertebrate genomes. Some CNCs have been shown to act as cis-regulatory modules, but the function of most CNCs remains unclear. To study the evolution of CNCs, we have developed a statistical method called the "shared rates test" to identify CNCs that show significant variation in substitution rates across branches of a phylogenetic tree. We report an application of this method to alignments of 98,910 CNCs from the human, chimpanzee, dog, mouse, and rat genomes. We find that approximately 68% of CNCs evolve according to a null model where, for each CNC, a single parameter models the level of constraint acting throughout the phylogeny linking these five species. The remaining approximately 32% of CNCs show departures from the basic model including speed-ups and slow-downs on particular branches and occasionally multiple rate changes on different branches. We find that a subset of the significant CNCs have evolved significantly faster than the local neutral rate on a particular branch, providing strong evidence for adaptive evolution in these CNCs. The distribution of these signals on the phylogeny suggests that adaptive evolution of CNCs occurs in occasional short bursts of evolution. Our analyses suggest a large set of promising targets for future functional studies of adaptation.
Collapse
Affiliation(s)
- Su Yeon Kim
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- * To whom correspondence should be addressed. E-mail: (SYK); (JKP)
| | - Jonathan K Pritchard
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
- * To whom correspondence should be addressed. E-mail: (SYK); (JKP)
| |
Collapse
|
24
|
Abstract
Studies that include individuals with multiple highly correlated exposures are common in epidemiology. Because standard maximum likelihood techniques often fail to converge in such instances, hierarchical regression methods have seen increasing use. Bayesian hierarchical regression places prior distributions on exposure-specific regression coefficients to stabilize estimation and incorporate prior knowledge, if available. A common parametric approach in epidemiology is to treat the prior mean and variance as fixed constants. An alternative parametric approach is to place distributions on the prior mean and variance to allow the data to help inform their values. As a more flexible semiparametric option, one can place an unknown distribution on the coefficients that simultaneously clusters exposures into groups using a Dirichlet process prior. We also present a semiparametric model with a variable-selection prior to allow clustering of coefficients at 0. We compare these 4 hierarchical regression methods and demonstrate their application in an example estimating the association of herbicides with retinal degeneration among wives of pesticide applicators.
Collapse
Affiliation(s)
- Richard F MacLehose
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
| | | | | | | |
Collapse
|
25
|
|
26
|
Abstract
In clinical studies involving multiple variables, simultaneous tests are often considered where both the outcomes and hypotheses are correlated. This article proposes a multivariate mixture prior on treatment effects, that allows positive probability of zero effect for each hypothesis, correlations among effect sizes, correlations among binary outcomes of zero versus nonzero effect, and correlations among the observed test statistics (conditional on the effects). We develop a Bayesian multiple testing procedure, for the multivariate two-sample situation with unknown covariance structure, and obtain the posterior probabilities of no difference between treatment regimens for specific variables. Prior selection methods and robustness issues are discussed in the context of a clinical example.
Collapse
Affiliation(s)
- Mithat Gönen
- Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA.
| | | | | |
Collapse
|