1
|
Chatterjee S, Chowdhury S, Ryu D, Basu S. Bayesian functional data analysis over dependent regions and its application for identification of differentially methylated regions. Biometrics 2023; 79:3294-3306. [PMID: 37479677 DOI: 10.1111/biom.13902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 05/08/2023] [Indexed: 07/23/2023]
Abstract
We consider a Bayesian functional data analysis for observations measured as extremely long sequences. Splitting the sequence into several small windows with manageable lengths, the windows may not be independent especially when they are neighboring each other. We propose to utilize Bayesian smoothing splines to estimate individual functional patterns within each window and to establish transition models for parameters involved in each window to address the dependence structure between windows. The functional difference of groups of individuals at each window can be evaluated by the Bayes factor based on Markov Chain Monte Carlo samples in the analysis. In this paper, we examine the proposed method through simulation studies and apply it to identify differentially methylated genetic regions in TCGA lung adenocarcinoma data.
Collapse
Affiliation(s)
- Suvo Chatterjee
- Department of Epidemiology and Biostatistics, Indiana University, School of Public Health, Bloomington, Indiana, USA
| | - Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences and Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Duchwan Ryu
- Department of Statistics and Actuarial Science, Northern Illinois University, Illinois, USA
| | - Sanjib Basu
- Division of Epidemiology and Biostatistics, University of Illinois Chicago, Illinois, USA
| |
Collapse
|
2
|
De Stefano D, Pauli F, Torelli N. Preelectoral polls variability: A hierarchical Bayesian model to assess the role of house effects with application to Italian elections. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1507] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Francesco Pauli
- Department of Business, Economics, Mathematics, and Statistics, University of Trieste
| | - Nicola Torelli
- Department of Business, Economics, Mathematics, and Statistics, University of Trieste
| |
Collapse
|
3
|
Faulkner JR, Magee AF, Shapiro B, Minin VN. Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories. Biometrics 2020; 76:677-690. [PMID: 32277713 DOI: 10.1111/biom.13276] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 04/26/2019] [Accepted: 07/09/2019] [Indexed: 11/26/2022]
Abstract
Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.
Collapse
Affiliation(s)
- James R Faulkner
- Quantitative Ecology and Resource Management, University of Washington, Seattle, Washington.,Fish Ecology Division, Northwest Fisheries Science Center, National Marine Fisheries Service, NOAA, Seattle, Washington
| | - Andrew F Magee
- Department of Biology, University of Washington, Seattle, Washington
| | - Beth Shapiro
- Ecology and Evolutionary Biology Department and Genomics Institute, University of California Santa Cruz, Santa Cruz, California.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, California
| | - Vladimir N Minin
- Department of Statistics, University of California Irvine, Irvine, California
| |
Collapse
|
4
|
Madathil S, Joseph L, Hardy R, Rousseau MC, Nicolau B. A Bayesian approach to investigate life course hypotheses involving continuous exposures. Int J Epidemiol 2019; 47:1623-1635. [PMID: 29912384 PMCID: PMC6208282 DOI: 10.1093/ije/dyy107] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/18/2018] [Indexed: 01/19/2023] Open
Abstract
Background Different hypotheses have been proposed in life course epidemiology on how a time-varying exposure can affect health or disease later in life. Researchers are often interested in investigating the probability of these hypotheses based on observed life course data. However, current techniques based on model/variable selection do not provide a direct estimate of this probability. We propose an alternative technique for a continuous exposure, using a Bayesian approach that has specific advantages, to investigate which life course hypotheses are supported by the observed data. Methods We demonstrate the technique, the relevant life course exposure model (RLM), using simulations. We also analyse data from a case-control study on risk factors of oral cancer, with repeated measurements of betel quid chewing across life. We investigate the relative importance of chewing one quid of betel per day, at three life periods: ≤20 years, 21–40 years and above 40 years of age, on the risk of developing oral cancer. Results RLM was able to correctly identify the life course hypothesis under which the data were simulated. Results from the case-control study showed that there was 74.3% probability that betel quid exposure earlier in life, compared with later, results in higher odds of developing oral cancer later in life. Conclusions RLM is a useful option to identify the life course hypothesis supported by the observed data prior to the estimation of a causal effect.
Collapse
Affiliation(s)
- Sreenath Madathil
- Faculty of Dentistry, McGill University, Montreal, QC, Canada.,Epidemiology and Biostatistics Unit, Institut Armand-Frappier, INRS, Laval, QC, Canada
| | - Lawrence Joseph
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Rebecca Hardy
- MRC Unit for Lifelong Health and Ageing at UCL, University College London, London, UK
| | - Marie-Claude Rousseau
- Faculty of Dentistry, McGill University, Montreal, QC, Canada.,Epidemiology and Biostatistics Unit, Institut Armand-Frappier, INRS, Laval, QC, Canada
| | - Belinda Nicolau
- Faculty of Dentistry, McGill University, Montreal, QC, Canada
| |
Collapse
|
5
|
Faulkner JR, Minin VN. Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors. BAYESIAN ANALYSIS 2018; 13:225-252. [PMID: 29755638 PMCID: PMC5942601 DOI: 10.1214/17-ba1050] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We present a locally adaptive nonparametric curve fitting method that operates within a fully Bayesian framework. This method uses shrinkage priors to induce sparsity in order-k differences in the latent trend function, providing a combination of local adaptation and global control. Using a scale mixture of normals representation of shrinkage priors, we make explicit connections between our method and kth order Gaussian Markov random field smoothing. We call the resulting processes shrinkage prior Markov random fields (SPMRFs). We use Hamiltonian Monte Carlo to approximate the posterior distribution of model parameters because this method provides superior performance in the presence of the high dimensionality and strong parameter correlations exhibited by our models. We compare the performance of three prior formulations using simulated data and find the horseshoe prior provides the best compromise between bias and precision. We apply SPMRF models to two benchmark data examples frequently used to test nonparametric methods. We find that this method is flexible enough to accommodate a variety of data generating models and offers the adaptive properties and computational tractability to make it a useful addition to the Bayesian nonparametric toolbox.
Collapse
Affiliation(s)
- James R Faulkner
- Quantitative Ecology and Resource Management, University of Washington, Seattle, WA 98195
- National Oceanic and Atmospheric Administration, Northwest Fisheries Science Center, Seattle, WA 98112
| | - Vladimir N Minin
- Quantitative Ecology and Resource Management, University of Washington, Seattle, WA 98195
- National Oceanic and Atmospheric Administration, Northwest Fisheries Science Center, Seattle, WA 98112
- Departments of Statistics and Biology, University of Washington, Seattle, WA 98195
| |
Collapse
|
6
|
Rahnama Rad K, Machado TA, Paninski L. Robust and scalable Bayesian analysis of spatial neural tuning function data. Ann Appl Stat 2017. [DOI: 10.1214/16-aoas996] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
7
|
Okango E, Mwambi H, Ngesa O. Spatial modeling of HIV and HSV-2 among women in Kenya with spatially varying coefficients. BMC Public Health 2016; 16:355. [PMID: 27103038 PMCID: PMC4840964 DOI: 10.1186/s12889-016-3022-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2015] [Accepted: 04/08/2016] [Indexed: 12/03/2022] Open
Abstract
Background Disease mapping has become popular in the field of statistics as a method to explain the spatial distribution of disease outcomes and as a tool to help design targeted intervention strategies. Most of these models however have been implemented with assumptions that may be limiting or altogether lead to less meaningful results and hence interpretations. Some of these assumptions include the linearity, stationarity and normality assumptions. Studies have shown that the linearity assumption is not necessarily true for all covariates. Age for example has been found to have a non-linear relationship with HIV and HSV-2 prevalence. Other studies have made stationarity assumption in that one stimulus e.g. education, provokes the same response in all the regions under study and this is also quite restrictive. Responses to stimuli may vary from region to region due to aspects like culture, preferences and attitudes. Methods We perform a spatial modeling of HIV and HSV-2 among women in Kenya, while relaxing these assumptions i.e. the linearity assumption by allowing the covariate age to have a non-linear effect on HIV and HSV-2 prevalence using the random walk model of order 2 and the stationarity assumption by allowing the rest of the covariates to vary spatially using the conditional autoregressive model. The women data used in this study were derived from the 2007 Kenya AIDS indicator survey where women aged 15–49 years were surveyed. A full Bayesian approach was used and the models were implemented in R-INLA software. Results Age was found to have a non-linear relationship with both HIV and HSV-2 prevalence, and the spatially varying coefficient model provided a significantly better fit for HSV-2. Age-at first sex also had a greater effect on HSV-2 prevalence in the Coastal and some parts of North Eastern regions suggesting either early marriages or child prostitution. The effect of education on HIV prevalence among women was more in the North Eastern, Coastal, Southern and parts of Central region. Conclusions The models introduced in this study enable relaxation of two limiting assumptions in disease mapping. The effects of the covariates on HIV and HSV-2 were found to vary spatially. The effect of education on HSV-2 status for example was lower in North Eastern and parts of the Rift region than most of the other parts of the country. Age was found to have a non-linear effect on HIV and HSV-2 prevalence, a linearity assumption would have led to wrong results and hence interpretations. The findings are relevant in that they can be used in informing tailor made strategies for tackling HIV and HSV-2 in different counties. The methodology used here may also be replicated in other studies with similar data. Electronic supplementary material The online version of this article (doi:10.1186/s12889-016-3022-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elphas Okango
- School of Mathematics, Statistics and Computer Science, University of KwaZulu -Natal, Private Bag X01, 3201, Pietermaritzburg, South Africa.
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of KwaZulu -Natal, Private Bag X01, 3201, Pietermaritzburg, South Africa
| | - Oscar Ngesa
- School of Mathematics, Statistics and Computer Science, University of KwaZulu -Natal, Private Bag X01, 3201, Pietermaritzburg, South Africa.,Mathematics and Informatics Department, Taita Taveta University College, P.O Box 635-80300, Voi, Kenya
| |
Collapse
|
8
|
Rakêt LL, Markussen B. Approximate inference for spatial functional data on massively parallel processors. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.10.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
9
|
Yue YR, Loh JM. Bayesian nonparametric estimation of pair correlation function for inhomogeneous spatial point processes. J Nonparametr Stat 2013. [DOI: 10.1080/10485252.2013.767337] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Yue YR, Hong HG. Bayesian Tobit quantile regression model for medical expenditure panel survey data. STAT MODEL 2012. [DOI: 10.1177/1471082x1201200402] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
High expenditure on healthcare is an important segment of the U.S. economy, making healthcare cost modelling valuable in decision-making processes over a wide array of domains. In this paper, we analyze medical expenditure panel survey (MEPS) data. Tobit regression model has been popularly used for the medical expenditures. However, it is no longer sufficient for the MEPS data because: (i) the distribution of the expenditures shows skewness, heavy tails and heterogeneity; (ii) most predictors are categorical, including binary, nominal and ordinal variables; (iii) there are a few predictors which may be nonlinearly related to the response. We therefore propose a Bayesian Tobit quantile regression model to describe a complete distributional view on how the medical expenditures depend on the various predictors. Specifically, we assume an asymmetric Laplace error distribution to adapt the quantile regression to a Bayesian setting. Then, we propose a modified group Lasso for categorical factor selection, and a smoothing Gaussian prior for modelling the nonlinear effects. The estimates and their uncertainties are obtained using an efficient Monte Carlo Markov Chain sampling method. The effectiveness of our approach is demonstrated by modelling 2007 MEPS data.
Collapse
Affiliation(s)
- Yu Ryan Yue
- Zicklin School of Business, Baruch College, The City University of New York, New York
| | - Hyokyoung Grace Hong
- Zicklin School of Business, Baruch College, The City University of New York, New York
| |
Collapse
|
11
|
Yue YR, Loh JM. Bayesian semiparametric intensity estimation for inhomogeneous spatial point processes. Biometrics 2010; 67:937-46. [PMID: 21175553 DOI: 10.1111/j.1541-0420.2010.01531.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In this work we propose a fully Bayesian semiparametric method to estimate the intensity of an inhomogeneous spatial point process. The basic idea is to first convert intensity estimation into a Poisson regression setting via binning data points on a regular grid, and then model the log intensity semiparametrically using an adaptive version of Gaussian Markov random fields to smooth the corresponding counts. The inference is carried by an efficient Markov chain Monte Carlo simulation algorithm. Compared to existing methods for intensity estimation, for example, parametric modeling and kernel smoothing, the proposed estimator not only provides inference regarding the dependence of the intensity function on possible covariates, but also uses information from the data to adaptively determine the amount of smoothing at the local level. The effectiveness of using our method is demonstrated through simulation studies and an application to a rainforest dataset.
Collapse
Affiliation(s)
- Yu Ryan Yue
- Baruch College, City University of New York, New York, New York 10010, USA.
| | | |
Collapse
|