1
|
Bhattacharyya A, Pal S, Mitra R, Rai S. Applications of Bayesian shrinkage prior models in clinical research with categorical responses. BMC Med Res Methodol 2022; 22:126. [PMID: 35484507 PMCID: PMC9046716 DOI: 10.1186/s12874-022-01560-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 02/10/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Prediction and classification algorithms are commonly used in clinical research for identifying patients susceptible to clinical conditions such as diabetes, colon cancer, and Alzheimer's disease. Developing accurate prediction and classification methods benefits personalized medicine. Building an excellent predictive model involves selecting the features that are most significantly associated with the outcome. These features can include several biological and demographic characteristics, such as genomic biomarkers and health history. Such variable selection becomes challenging when the number of potential predictors is large. Bayesian shrinkage models have emerged as popular and flexible methods of variable selection in regression settings. This work discusses variable selection with three shrinkage priors and illustrates its application to clinical data such as Pima Indians Diabetes, Colon cancer, ADNI, and OASIS Alzheimer's real-world data. METHODS A unified Bayesian hierarchical framework that implements and compares shrinkage priors in binary and multinomial logistic regression models is presented. The key feature is the representation of the likelihood by a Polya-Gamma data augmentation, which admits a natural integration with a family of shrinkage priors, specifically focusing on Horseshoe, Dirichlet Laplace, and Double Pareto priors. Extensive simulation studies are conducted to assess the performances under different data dimensions and parameter settings. Measures of accuracy, AUC, brier score, L1 error, cross-entropy, and ROC surface plots are used as evaluation criteria comparing the priors with frequentist methods as Lasso, Elastic-Net, and Ridge regression. RESULTS All three priors can be used for robust prediction on significant metrics, irrespective of their categorical response model choices. Simulation studies could achieve the mean prediction accuracy of 91.6% (95% CI: 88.5, 94.7) and 76.5% (95% CI: 69.3, 83.8) for logistic regression and multinomial logistic models, respectively. The model can identify significant variables for disease risk prediction and is computationally efficient. CONCLUSIONS The models are robust enough to conduct both variable selection and prediction because of their high shrinkage properties and applicability to a broad range of classification problems.
Collapse
Affiliation(s)
- Arinjita Bhattacharyya
- Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY USA
| | - Subhadip Pal
- Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY USA
| | - Riten Mitra
- Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY USA
| | - Shesh Rai
- Department of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY USA
- Biostatistics & Bioinformatics Facility, JG Brown Cancer Center, University of Louisville, Louisville, KY USA
- The Christina Lee Brown Envirome Institute, University of Louisville, Louisville, KY USA
- University of Louisville Alcohol Research Center, University of Louisville, Louisville, KY USA
- University of Louisville Hepatobiology & Toxicology Center, University of Louisville, Louisville, KY USA
| |
Collapse
|
2
|
Kaciroti NA, Little RJA. Bayesian sensitivity analyses for longitudinal data with dropouts that are potentially missing not at random: A high dimensional pattern-mixture model. Stat Med 2021; 40:4609-4628. [PMID: 34405912 DOI: 10.1002/sim.9083] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 04/05/2021] [Accepted: 05/10/2021] [Indexed: 11/05/2022]
Abstract
Randomized clinical trials with outcome measured longitudinally are frequently analyzed using either random effect models or generalized estimating equations. Both approaches assume that the dropout mechanism is missing at random (MAR) or missing completely at random (MCAR). We propose a Bayesian pattern-mixture model to incorporate missingness mechanisms that might be missing not at random (MNAR), where the distribution of the outcome measure at the follow-up time t k , conditional on the prior history, differs across the patterns of missing data. We then perform sensitivity analysis on estimates of the parameters of interest. The sensitivity parameters relate the distribution of the outcome of interest between subjects from a missing-data pattern at time t k with that of the observed subjects at time t k . The large number of the sensitivity parameters is reduced by treating them as random with a prior distribution having some pre-specified mean and variance, which are varied to explore the sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the proposed model, allowing a sensitivity analysis of deviations from MAR. The proposed approach is applied to data from the Trial of Preventing Hypertension.
Collapse
Affiliation(s)
- Niko A Kaciroti
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA.,Department of Pediatrics, Medical School, University of Michigan, Ann Arbor, Michigan, USA
| | - Roderick J A Little
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
3
|
Shi F, Xia L, Shan F, Song B, Wu D, Wei Y, Yuan H, Jiang H, He Y, Gao Y, Sui H, Shen D. Large-scale screening to distinguish between COVID-19 and community-acquired pneumonia using infection size-aware classification. Phys Med Biol 2021; 66:065031. [DOI: 10.1088/1361-6560/abe838] [Citation(s) in RCA: 135] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
4
|
Zhou T, Daniels MJ, Müller P. A Semiparametric Bayesian Approach to Dropout in Longitudinal Studies with Auxiliary Covariates. J Comput Graph Stat 2020; 29:1-12. [PMID: 33013150 DOI: 10.1080/10618600.2019.1617159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We develop a semiparametric Bayesian approach to missing outcome data in longitudinal studies in the presence of auxiliary covariates. We consider a joint model for the full data response, missingness and auxiliary covariates. We include auxiliary covariates to "move" the missingness "closer" to missing at random (MAR). In particular, we specify a semiparametric Bayesian model for the observed data via Gaussian process priors and Bayesian additive regression trees. These model specifications allow us to capture non-linear and non-additive effects, in contrast to existing parametric methods. We then separately specify the conditional distribution of the missing data response given the observed data response, missingness and auxiliary covariates (i.e. the extrapolation distribution) using identifying restrictions. We introduce meaningful sensitivity parameters that allow for a simple sensitivity analysis. Informative priors on those sensitivity parameters can be elicited from subject-matter experts. We use Monte Carlo integration to compute the full data estimands. Performance of our approach is assessed using simulated datasets. Our methodology is motivated by, and applied to, data from a clinical trial on treatments for schizophrenia.
Collapse
Affiliation(s)
- Tianjian Zhou
- Department of Public Health Sciences, The University of Chicago
| | | | - Peter Müller
- Department of Mathematics, The University of Texas at Austin
| |
Collapse
|
5
|
Ji L, Chen M, Oravecz Z, Cummings EM, Lu ZH, Chow SM. A Bayesian Vector Autoregressive Model with Nonignorable Missingness in Dependent Variables and Covariates: Development, Evaluation, and Application to Family Processes. STRUCTURAL EQUATION MODELING : A MULTIDISCIPLINARY JOURNAL 2020; 27:442-467. [PMID: 32601517 PMCID: PMC7323924 DOI: 10.1080/10705511.2019.1623681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Intensive longitudinal designs involving repeated assessments of constructs often face the problems of nonignorable attrition and selected omission of responses on particular occasions. However, time series models, such as vector autoregressive (VAR) models, are often fit to these data without consideration of nonignorable missingness. We introduce a Bayesian model that simultaneously represents the over-time dependencies in multivariate, multiple-subject time series data via a VAR model, and possible ignorable and nonignorable missingness in the data. We provide software code for implementing this model with application to an empirical data set. Moreover, simulation results comparing the joint approach with two-step multiple imputation procedures are included to shed light on the relative strengths and weaknesses of these approaches in practical data analytic scenarios.
Collapse
|
6
|
Igari R, Hoshino T. A Bayesian data combination approach for repeated durations under unobserved missing indicators: Application to interpurchase-timing in marketing. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.04.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
7
|
|
8
|
Linero AR, Daniels MJ. Bayesian Approaches for Missing Not at Random Outcome Data: The Role of Identifying Restrictions. Stat Sci 2018; 33:198-213. [PMID: 31889740 PMCID: PMC6936760 DOI: 10.1214/17-sts630] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Missing data is almost always present in real datasets, and introduces several statistical issues. One fundamental issue is that, in the absence of strong uncheckable assumptions, effects of interest are typically not nonparametrically identified. In this article, we review the generic approach of the use of identifying restrictions from a likelihood-based perspective, and provide points of contact for several recently proposed methods. An emphasis of this review is on restrictions for nonmonotone missingness, a subject that has been treated sparingly in the literature. We also present a general, fully-Bayesian, approach which is widely applicable and capable of handling a variety of identifying restrictions in a uniform manner.
Collapse
|
9
|
Affiliation(s)
- H. Tak
- Statistical and Applied Mathematical Sciences Institute, NC, USA
| |
Collapse
|
10
|
Bayesian nonparametric analysis of longitudinal studies in the presence of informative missingness. Biometrika 2017. [DOI: 10.1093/biomet/asx015] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
11
|
Gaskins JT, Daniels MJ, Marcus BH. Bayesian methods for nonignorable dropout in joint models in smoking cessation studies. J Am Stat Assoc 2017; 111:1454-1465. [PMID: 29104333 DOI: 10.1080/01621459.2016.1167693] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Inference on data with missingness can be challenging, particularly if the knowledge that a measurement was unobserved provides information about its distribution. Our work is motivated by the Commit to Quit II study, a smoking cessation trial that measured smoking status and weight change as weekly outcomes. It is expected that dropout in this study was informative and that patients with missed measurements are more likely to be smoking, even after conditioning on their observed smoking and weight history. We jointly model the categorical smoking status and continuous weight change outcomes by assuming normal latent variables for cessation and by extending the usual pattern mixture model to the bivariate case. The model includes a novel approach to sharing information across patterns through a Bayesian shrinkage framework to improve estimation stability for sparsely observed patterns. To accommodate the presumed informativeness of the missing data in a parsimonious manner, we model the unidentified components of the model under a non-future dependence assumption and specify departures from missing at random through sensitivity parameters, whose distributions are elicited from a subject-matter expert.
Collapse
Affiliation(s)
- J T Gaskins
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY 40202
| | - M J Daniels
- Department of Integrative Biology, Department of Statistics & Data Sciences, University of Texas, Austin, TX 78712
| | - B H Marcus
- Department of Family and Preventive Medicine, UC San Diego, San Diego, CA 92093
| |
Collapse
|
12
|
Linero AR, Daniels MJ. A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial. J Am Stat Assoc 2015; 110:45-55. [PMID: 26236060 PMCID: PMC4517693 DOI: 10.1080/01621459.2014.969424] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
We develop a Bayesian nonparametric model for a longitudinal response in the presence of nonignorable missing data. Our general approach is to first specify a working model that flexibly models the missingness and full outcome processes jointly. We specify a Dirichlet process mixture of missing at random (MAR) models as a prior on the joint distribution of the working model. This aspect of the model governs the fit of the observed data by modeling the observed data distribution as the marginalization over the missing data in the working model. We then separately specify the conditional distribution of the missing data given the observed data and dropout. This approach allows us to identify the distribution of the missing data using identifying restrictions as a starting point. We propose a framework for introducing sensitivity parameters, allowing us to vary the untestable assumptions about the missing data mechanism smoothly. Informative priors on the space of missing data assumptions can be specified to combine inferences under many different assumptions into a final inference and accurately characterize uncertainty. These methods are motivated by, and applied to, data from a clinical trial assessing the efficacy of a new treatment for acute Schizophrenia.
Collapse
Affiliation(s)
- Antonio R Linero
- Department of Statistics, University of Florida, Gainesville, FL, 32611
| | - Michael J Daniels
- Section of Integrative Biology, Department of Statistics & Data Sciences, University of Texas at Austin, Austin, TX 78712
| |
Collapse
|
13
|
Daniels MJ, Wang C, Marcus BH. Fully Bayesian inference under ignorable missingness in the presence of auxiliary covariates. Biometrics 2013; 70:62-72. [PMID: 24571539 DOI: 10.1111/biom.12121] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 10/01/2013] [Accepted: 10/01/2013] [Indexed: 11/27/2022]
Abstract
In order to make a missing at random (MAR) or ignorability assumption realistic, auxiliary covariates are often required. However, the auxiliary covariates are not desired in the model for inference. Typical multiple imputation approaches do not assume that the imputation model marginalizes to the inference model. This has been termed "uncongenial" [Meng (1994, Statistical Science 9, 538-558)]. In order to make the two models congenial (or compatible), we would rather not assume a parametric model for the marginal distribution of the auxiliary covariates, but we typically do not have enough data to estimate the joint distribution well non-parametrically. In addition, when the imputation model uses a non-linear link function (e.g., the logistic link for a binary response), the marginalization over the auxiliary covariates to derive the inference model typically results in a difficult to interpret form for the effect of covariates. In this article, we propose a fully Bayesian approach to ensure that the models are compatible for incomplete longitudinal data by embedding an interpretable inference model within an imputation model and that also addresses the two complications described above. We evaluate the approach via simulations and implement it on a recent clinical trial.
Collapse
Affiliation(s)
- M J Daniels
- Division of Statistics and Scientific Computation and Section of Integrative Biology, University of Texas at Austin, Austin, Texas 78712, U.S.A
| | | | | |
Collapse
|
14
|
Brogi S, Papazafiri P, Roussis V, Tafi A. 3D-QSAR using pharmacophore-based alignment and virtual screening for discovery of novel MCF-7 cell line inhibitors. Eur J Med Chem 2013; 67:344-51. [DOI: 10.1016/j.ejmech.2013.06.048] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 05/10/2013] [Accepted: 06/19/2013] [Indexed: 02/06/2023]
|
15
|
Sensitivity analysis for nonignorable missingness and outcome misclassification from proxy reports. Epidemiology 2013; 24:215-23. [PMID: 23348065 DOI: 10.1097/ede.0b013e31827f4fa9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
Researchers often recruit proxy respondents, such as relatives or caregivers, for epidemiologic studies of older adults when study participants are unable to provide self-reports (eg, because of illness or cognitive impairment). In most studies involving proxy-reported outcomes, proxies are recruited only to report on behalf of participants who have missing self-reported outcomes; thus, either a proxy report or participant self-report, but not both, is available for each participant. When outcomes are binary and investigators conceptualize participant self-reports as gold standard measures, substituting proxy reports in place of missing participant self-reports in statistical analysis can introduce misclassification error and lead to biased parameter estimates. However, excluding observations from participants with missing self-reported outcomes may also lead to bias. We propose a pattern-mixture model that uses error-prone proxy reports to reduce selection bias from missing outcomes, and we describe a sensitivity analysis to address bias from differential outcome misclassification. We perform model estimation with high-dimensional (eg, continuous) covariates using propensity-score stratification and multiple imputation. We apply the methods to the Second Cohort of the Baltimore Hip Studies, a study of elderly hip fracture patients, to assess the relation between type of surgical treatment and perceived physical recovery. Simulation studies show that the proposed methods perform well. We provide SAS programs in the eAppendix (http://links.lww.com/EDE/A646) to enhance the methods' accessibility.
Collapse
|
16
|
Accommodation of missing data in supportive and palliative care clinical trials. Curr Opin Support Palliat Care 2012; 6:465-70. [DOI: 10.1097/spc.0b013e328358441d] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Wang C, Daniels MJ. A note on MAR, identifying restrictions, model comparison, and sensitivity analysis in pattern mixture models with and without covariates for incomplete data. Biometrics 2011; 67:810-8. [PMID: 21361893 DOI: 10.1111/j.1541-0420.2011.01565.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions is one approach to mixture model identification (Little, 1995, Journal of the American Statistical Association 90, 1112-1121; Little and Wang, 1996, Biometrics 52, 98-111; Thijs et al., 2002, Biostatistics 3, 245-265; Kenward, Molenberghs, and Thijs, 2003, Biometrika 90, 53-71; Daniels and Hogan, 2008, in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis) and is a natural starting point for missing not at random sensitivity analysis (Thijs et al., 2002, Biostatistics 3, 245-265; Daniels and Hogan, 2008, in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis). However, when the pattern specific models are multivariate normal, identifying restrictions corresponding to missing at random (MAR) may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g., baseline covariates with time-invariant coefficients). In this article, we explore conditions necessary for identifying restrictions that result in MAR to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. In addition, we propose alternative modeling and sensitivity analysis strategies under a less restrictive assumption for the distribution of the observed response data. We adopt the deviance information criterion for model comparison and perform a simulation study to evaluate the performances of the different modeling approaches. We also apply the methods to a longitudinal clinical trial. Problems caused by baseline covariates with time-invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.
Collapse
Affiliation(s)
- Chenguang Wang
- Division of Biostatistics, Center for Devices and Radiological Health, FDA, Silver Spring, Maryland 20993, USA.
| | | |
Collapse
|