1
|
Chase EC, Taylor JMG, Boonstra PS. Modeling basal body temperature data using horseshoe process regression. Stat Med 2024; 43:817-832. [PMID: 38095078 DOI: 10.1002/sim.9991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 08/07/2023] [Accepted: 12/03/2023] [Indexed: 02/21/2024]
Abstract
Biomedical data often exhibit jumps or abrupt changes. For example, women's basal body temperature may jump at ovulation, menstruation, implantation, and miscarriage. These sudden changes make these data challenging to model: many methods will oversmooth the sharp changes or overfit in response to measurement error. We develop horseshoe process regression (HPR) to address this problem. We define a horseshoe process as a stochastic process in which each increment is horseshoe-distributed. We use the horseshoe process as a nonparametric Bayesian prior for modeling a potentially nonlinear association between an outcome and its continuous predictor, which we implement via Stan and in the R package HPR. We provide guidance and extensions to advance HPR's use in applied practice: we introduce a Bayesian imputation scheme to allow for interpolation at unobserved values of the predictor within the HPR; include additional covariates via a partial linear model framework; and allow for monotonicity constraints. We find that HPR performs well when fitting functions that have sharp changes. We apply HPR to model women's basal body temperatures over the course of the menstrual cycle.
Collapse
Affiliation(s)
| | - Jeremy M G Taylor
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Philip S Boonstra
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
2
|
Qi X, Zhou S, Wang Y, Peterson C. Bayesian sparse modeling to identify high-risk subgroups in meta-analysis of safety data. Res Synth Methods 2022; 13:807-820. [PMID: 36054779 PMCID: PMC9649868 DOI: 10.1002/jrsm.1597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 06/16/2022] [Accepted: 07/13/2022] [Indexed: 11/08/2022]
Abstract
Meta-analysis allows researchers to combine evidence from multiple studies, making it a powerful tool for synthesizing information on the safety profiles of new medical interventions. There is a critical need to identify subgroups at high risk of experiencing treatment-related toxicities. However, this remains quite challenging from a statistical perspective as there are a variety of clinical risk factors that may be relevant for different types of adverse events, and adverse events of interest may be rare or incompletely reported. We frame this challenge as a variable selection problem and propose a Bayesian hierarchical model which incorporates a horseshoe prior on the interaction terms to identify high-risk groups. Our proposed model is motivated by a meta-analysis of adverse events in cancer immunotherapy, and our results uncover key factors driving the risk of specific types of treatment-related adverse events.
Collapse
Affiliation(s)
- Xinyue Qi
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| | - Shouhao Zhou
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, PA
| | - Yucai Wang
- Division of Hematology, Mayo Clinic, Rochester, Minnesota
| | - Christine Peterson
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX
| |
Collapse
|
3
|
Parker PA, Holan SH. A bayesian functional data model for surveys collected under informative sampling with application to mortality estimation using NHANES. Biometrics 2022. [PMID: 35561139 DOI: 10.1111/biom.13696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 05/02/2022] [Indexed: 11/30/2022]
Abstract
Functional data are often extremely high-dimensional and exhibit strong dependence structures but can often prove valuable for both prediction and inference. The literature on functional data analysis is well developed; however, there has been very little work involving functional data in complex survey settings. Motivated by physical activity monitor data from the National Health and Nutrition Examination Survey (NHANES), we develop a Bayesian model for functional covariates that can properly account for the survey design. Our approach is intended for non-Gaussian data and can be applied in multivariate settings. In addition, we make use of a variety of Bayesian modeling techniques to ensure that the model is fit in a computationally efficient manner. We illustrate the value of our approach through two simulation studies as well as an example of mortality estimation using NHANES data. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Paul A Parker
- Department of Statistics, University of California Santa Cruz, 1156 High St, Santa Cruz, CA
| | - Scott H Holan
- Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO.,Research and Methodology Directorate, U.S. Census Bureau, 4600 Silver Hill Road, Washington, D.C
| |
Collapse
|
4
|
Ohigashi T, Maruo K, Sozu T, Gosho M. Using horseshoe prior for incorporating multiple historical control data in randomized controlled trials. Stat Methods Med Res 2022; 31:1392-1404. [PMID: 35379046 DOI: 10.1177/09622802221090752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Meta-analytic approaches and power priors are often used to incorporate historical controls into the analysis of a current randomized controlled trial. In this study, we propose a method for incorporating multiple historical controls based on a horseshoe prior, which is a type of global-local shrinkage prior. The method assumes that historical controls follow the same distribution as the current control. In the case in which only a few historical controls are heterogeneous, we consider them to follow a potentially biased distribution from the distribution of the current control. We analyze two clinical trial examples with binary and time-to-event endpoints and conduct simulation studies to compare the performance of the proposed and existing methods. In the analysis of the clinical trial example, the posterior standard deviation of the treatment effect is decreased by the proposed method by considering the bias between the current control and heterogeneous historical control. In the scenarios in which the current and historical controls follow the same distribution, the statistical power using the proposed method is higher than that using existing methods. The proposed method is advantageous when few or no heterogeneous historical controls are expected.
Collapse
Affiliation(s)
- Tomohiro Ohigashi
- Graduate School of Comprehensive Human Sciences, 13121University of Tsukuba, Tsukuba, Japan.,Department of Biostatistics, Tsukuba Clinical Research & Development Organization, 13121University of Tsukuba, Tsukuba, Japan
| | - Kazushi Maruo
- Department of Biostatistics, Faculty of Medicine, 13121University of Tsukuba, Tsukuba, Japan
| | - Takashi Sozu
- Department of Information and Computer Technology, Faculty of Engineering, 26413Tokyo University of Science, Tokyo, Japan
| | - Masahiko Gosho
- Department of Biostatistics, Faculty of Medicine, 13121University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
5
|
Mu J, Liu Q, Kuo L, Hu G. Bayesian variable selection for the Cox regression model with spatially varying coefficients with applications to Louisiana respiratory cancer data. Biom J 2021; 63:1607-1622. [PMID: 34319616 DOI: 10.1002/bimj.202000047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 10/20/2020] [Accepted: 11/07/2020] [Indexed: 11/11/2022]
Abstract
The Cox regression model is a commonly used model in survival analysis. In public health studies, clinical data are often collected from medical service providers of different locations. There are large geographical variations in the covariate effects on survival rates from particular diseases. In this paper, we focus on the variable selection issue for the Cox regression model with spatially varying coefficients. We propose a Bayesian hierarchical model which incorporates a horseshoe prior for sparsity and a point mass mixture prior to determine whether a regression coefficient is spatially varying. An efficient two-stage computational method is used for posterior inference and variable selection. It essentially applies the existing method for maximizing the partial likelihood for the Cox model by site independently first and then applying an Markov chain Monte Carlo algorithm for variable selection based on results of the first stage. Extensive simulation studies are carried out to examine the empirical performance of the proposed method. Finally, we apply the proposed methodology to analyzing a real dataset on respiratory cancer in Louisiana from the Surveillance, Epidemiology, and End Results (SEER) program.
Collapse
Affiliation(s)
- Jinjian Mu
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | - Qingyang Liu
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | - Lynn Kuo
- Department of Statistics, University of Connecticut, Storrs, CT, USA
| | - Guanyu Hu
- Department of Statistics, University of Missouri - Columbia, Columbia, MO, USA
| |
Collapse
|
6
|
Shioda K, Cai J, Warren JL, Weinberger DM. Incorporating Information on Control Diseases Across Space and Time to Improve Estimation of the Population-level Impact of Vaccines. Epidemiology 2021; 32:360-367. [PMID: 33783394 PMCID: PMC8011507 DOI: 10.1097/ede.0000000000001341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 02/12/2021] [Indexed: 11/26/2022]
Abstract
BACKGROUND The synthetic control method evaluates the impact of vaccines while adjusting for a set of control time series representing diseases that are unaffected by the vaccine. However, noise in control time series, particularly in areas with small counts, can obscure the association with the outcome, preventing proper adjustments. To overcome this issue, we investigated the use of temporal and spatial aggregation methods to smooth the controls and allow for adjustment of underlying trends. METHODS We evaluated the impact of pneumococcal conjugate vaccine on all-cause pneumonia hospitalizations among adults ≥80 years of age in 25 states in Brazil from 2005 to 2015. Pneumonia hospitalizations in this group indicated a strong increasing secular trend over time that may influence estimation of the vaccine impact. First, we aggregated control time series separately by time or space before incorporation into the synthetic control model. Next, we developed distributed lags models (DLMs) to automatically determine what level of aggregation was most appropriate for each control. RESULTS The aggregation of control time series enabled the synthetic control model to identify stronger associations between outcome and controls. As a result, the aggregation models and DLMs succeeded in adjusting for long-term trends even in smaller states with sparse data, leading to more reliable estimates of vaccine impact. CONCLUSIONS When synthetic control struggles to identify important prevaccine associations due to noise in control time series, users can aggregate controls over time or space to generate more robust estimates of the vaccine impact. DLMs automate this process without requiring prespecification of the aggregation level.
Collapse
Affiliation(s)
- Kayoko Shioda
- From the Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT
| | - Jiachen Cai
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| | - Joshua L. Warren
- Department of Biostatistics, Yale School of Public Health, New Haven, CT
| | - Daniel M. Weinberger
- From the Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT
| |
Collapse
|
7
|
Maity AK, Carroll RJ, Mallick BK. Integration of Survival and Binary Data for Variable Selection and Prediction: A Bayesian Approach. J R Stat Soc Ser C Appl Stat 2020; 68:1577-1595. [PMID: 33311813 DOI: 10.1111/rssc.12377] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
We consider the problem where the data consist of a survival time and a binary outcome measurement for each individual, as well as corresponding predictors. The goal is to select the common set of predictors which affect both the responses, and not just only one of them. In addition, we develop a survival prediction model based on data integration. This article is motivated by the Cancer Genomic Atlas (TCGA) databank, which is currently the largest genomics and transcriptomics database. The data contain cancer survival information along with cancer stages for each patient. Furthermore, it contains Reverse-phase Protein Array (RPPA) measurements for each individual, which are the predictors associated with these responses. The biological motivation is to identify the major actionable proteins associated with both survival outcomes and cancer stages. We develop a Bayesian hierarchical model to jointly model the survival time and the classification of the cancer stages. Moreover, to deal with the high dimensionality of the RPPA measurements, we use a shrinkage prior to identify significant proteins. Simulations and TCGA data analysis show that the joint integrated modeling approach improves survival prediction.
Collapse
Affiliation(s)
- Arnab Kumar Maity
- Early Clinical Development Oncology Statistics, 10777 Science Center Drive, Pfizer Inc., San Diego, CA 92121
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX, 77843-3143, and School of Mathematical and Physical Sciences, University of Technology, Sydney, Broadway NSW 2007, Australia
| | - Bani K Mallick
- Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX, 77843-3143
| |
Collapse
|
8
|
Melo D, Marroig G, Wolf JB. Genomic Perspective on Multivariate Variation, Pleiotropy, and Evolution. J Hered 2020; 110:479-493. [PMID: 30986303 DOI: 10.1093/jhered/esz011] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 02/13/2019] [Indexed: 11/14/2022] Open
Abstract
Multivariate quantitative genetics provides a powerful framework for understanding patterns and processes of phenotypic evolution. Quantitative genetics parameters, like trait heritability or the G-matrix for sets of traits, can be used to predict evolutionary response or to understand the evolutionary history of a population. These population-level approaches have proven to be extremely successful, but the underlying genetics of multivariate variation and evolutionary change typically remain a black box. Establishing a deeper empirical understanding of how individual genetic effects lead to genetic (co)variation is then crucial to our understanding of the evolutionary process. To delve into this black box, we exploit an experimental population of mice composed from lineages derived by artificial selection. We develop an approach to estimate the multivariate effect of loci and characterize these vectors of effects in terms of their magnitude and alignment with the direction of evolutionary divergence. Using these estimates, we reconstruct the traits in the ancestral populations and quantify how much of the divergence is due to genetic effects. Finally, we also use these vectors to decompose patterns of genetic covariation and examine the relationship between these components and the corresponding distribution of pleiotropic effects. We find that additive effects are much larger than dominance effects and are more closely aligned with the direction of selection and divergence, with larger effects being more aligned than smaller effects. Pleiotropic effects are highly variable but are, on average, modular. These results are consistent with pleiotropy being partly shaped by selection while reflecting underlying developmental constraints.
Collapse
Affiliation(s)
- Diogo Melo
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brasil
| | - Gabriel Marroig
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade de São Paulo, São Paulo, SP, Brasil
| | - Jason B Wolf
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK
| |
Collapse
|
9
|
Ruffieux H, Davison AC, Hager J, Inshaw J, Fairfax BP, Richardson S, Bottolo L. A Global-Local Approach for Detecting Hotspots in Multiple-Response Regression. Ann Appl Stat 2020; 14:905-928. [PMID: 34992707 PMCID: PMC7612176 DOI: 10.1214/20-aoas1332] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We tackle modelling and inference for variable selection in regression problems with many predictors and many responses. We focus on detecting hotspots, that is, predictors associated with several responses. Such a task is critical in statistical genetics, as hotspot genetic variants shape the architecture of the genome by controlling the expression of many genes and may initiate decisive functional mechanisms underlying disease endpoints. Existing hierarchical regression approaches designed to model hotspots suffer from two limitations: their discrimination of hotspots is sensitive to the choice of top-level scale parameters for the propensity of predictors to be hotspots, and they do not scale to large predictor and response vectors, for example, of dimensions 103-105 in genetic applications. We address these shortcomings by introducing a flexible hierarchical regression framework that is tailored to the detection of hotspots and scalable to the above dimensions. Our proposal implements a fully Bayesian model for hotspots based on the horseshoe shrinkage prior. Its global-local formulation shrinks noise globally and, hence, accommodates the highly sparse nature of genetic analyses while being robust to individual signals, thus leaving the effects of hotspots unshrunk. Inference is carried out using a fast variational algorithm coupled with a novel simulated annealing procedure that allows efficient exploration of multimodal distributions.
Collapse
Affiliation(s)
| | | | | | - Jamie Inshaw
- Wellcome Centre for Human Genetics, Oxford, University of Oxford
| | - Benjamin P. Fairfax
- Department of Oncology, MRC Weatherall Institute for Molecular Medicine, University of Oxford
| | - Sylvia Richardson
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
| | - Leonardo Bottolo
- MRC Biostatistics Unit, University of Cambridge
- Alan Turing Institute
- Department of Medical Genetics, University of Cambridge
| |
Collapse
|
10
|
Abstract
Inferring gene regulatory networks from high-throughput 'omics' data has proven to be a computationally demanding task of critical importance. Frequently, the classical methods break down owing to the curse of dimensionality, and popular strategies to overcome this are typically based on regularized versions of the classical methods. However, these approaches rely on loss functions that may not be robust and usually do not allow for the incorporation of prior information in a straightforward way. Fully Bayesian methods are equipped to handle both of these shortcomings quite naturally, and they offer the potential for improvements in network structure learning. We propose a Bayesian hierarchical model to reconstruct gene regulatory networks from time-series gene expression data, such as those common in perturbation experiments of biological systems. The proposed methodology uses global-local shrinkage priors for posterior selection of regulatory edges and relaxes the common normal likelihood assumption in order to allow for heavy-tailed data, which were shown in several of the cited references to severely impact network inference. We provide a sufficient condition for posterior propriety and derive an efficient Markov chain Monte Carlo via Gibbs sampling in the electronic supplementary material. We describe a novel way to detect multiple scales based on the corresponding posterior quantities. Finally, we demonstrate the performance of our approach in a simulation study and compare it with existing methods on real data from a T-cell activation study.
Collapse
Affiliation(s)
- Viral Panchal
- Department of Mathematics and Statistics, University of North Carolina Wilmington, Wilmington, NC 28403, USA
| | - Daniel F Linder
- Medical College of Georgia, Augusta University, Augusta, GA 30912, USA
| |
Collapse
|