1
|
An evaluation of computational methods for aggregate data meta-analyses of diagnostic test accuracy studies. BMC Med Res Methodol 2024; 24:111. [PMID: 38730436 PMCID: PMC11084104 DOI: 10.1186/s12874-024-02217-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND A Generalized Linear Mixed Model (GLMM) is recommended to meta-analyze diagnostic test accuracy studies (DTAs) based on aggregate or individual participant data. Since a GLMM does not have a closed-form likelihood function or parameter solutions, computational methods are conventionally used to approximate the likelihoods and obtain parameter estimates. The most commonly used computational methods are the Iteratively Reweighted Least Squares (IRLS), the Laplace approximation (LA), and the Adaptive Gauss-Hermite quadrature (AGHQ). Despite being widely used, it has not been clear how these computational methods compare and perform in the context of an aggregate data meta-analysis (ADMA) of DTAs. METHODS We compared and evaluated the performance of three commonly used computational methods for GLMM - the IRLS, the LA, and the AGHQ, via a comprehensive simulation study and real-life data examples, in the context of an ADMA of DTAs. By varying several parameters in our simulations, we assessed the performance of the three methods in terms of bias, root mean squared error, confidence interval (CI) width, coverage of the 95% CI, convergence rate, and computational speed. RESULTS For most of the scenarios, especially when the meta-analytic data were not sparse (i.e., there were no or negligible studies with perfect diagnosis), the three computational methods were comparable for the estimation of sensitivity and specificity. However, the LA had the largest bias and root mean squared error for pooled sensitivity and specificity when the meta-analytic data were sparse. Moreover, the AGHQ took a longer computational time to converge relative to the other two methods, although it had the best convergence rate. CONCLUSIONS We recommend practitioners and researchers carefully choose an appropriate computational algorithm when fitting a GLMM to an ADMA of DTAs. We do not recommend the LA for sparse meta-analytic data sets. However, either the AGHQ or the IRLS can be used regardless of the characteristics of the meta-analytic data.
Collapse
|
2
|
Bayesian design of multi-regional clinical trials with time-to-event endpoints. Biometrics 2023; 79:3586-3598. [PMID: 36594642 DOI: 10.1111/biom.13820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 12/23/2022] [Indexed: 01/04/2023]
Abstract
Sponsors often rely on multi-regional clinical trials (MRCTs) to introduce new treatments more rapidly into the global market. Many commonly used statistical methods do not account for regional differences, and small regional sample sizes frequently result in lower estimation quality of region-specific treatment effects. The International Council for Harmonization E17 guidelines suggest consideration of methods that allow for information borrowing across regions to improve estimation. In response to these guidelines, we develop a novel methodology to estimate global and region-specific treatment effects from MRCTs with time-to-event endpoints using Bayesian model averaging (BMA). This approach accounts for the possibility of heterogeneous treatment effects between regions, and we discuss how to assess the consistency of these effects using posterior model probabilities. We obtain posterior samples of the treatment effects using a Laplace approximation, and we show through simulation studies that the proposed modeling approach estimates region-specific treatment effects with lower mean squared error than a Cox proportional hazards model while resulting in a similar rejection rate of the global treatment effect. We then apply the BMA approach to data from the LEADER trial, an MRCT designed to evaluate the cardiovascular safety of an anti-diabetic treatment.
Collapse
|
3
|
Bayesian joint models for multi-regional clinical trials. Biostatistics 2023:kxad023. [PMID: 37669215 DOI: 10.1093/biostatistics/kxad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 08/09/2023] [Accepted: 08/10/2023] [Indexed: 09/07/2023] Open
Abstract
In recent years, multi-regional clinical trials (MRCTs) have increased in popularity in the pharmaceutical industry due to their ability to accelerate the global drug development process. To address potential challenges with MRCTs, the International Council for Harmonisation released the E17 guidance document which suggests the use of statistical methods that utilize information borrowing across regions if regional sample sizes are small. We develop an approach that allows for information borrowing via Bayesian model averaging in the context of a joint analysis of survival and longitudinal data from MRCTs. In this novel application of joint models to MRCTs, we use Laplace's method to integrate over subject-specific random effects and to approximate posterior distributions for region-specific treatment effects on the time-to-event outcome. Through simulation studies, we demonstrate that the joint modeling approach can result in an increased rejection rate when testing the global treatment effect compared with methods that analyze survival data alone. We then apply the proposed approach to data from a cardiovascular outcomes MRCT.
Collapse
|
4
|
Profile Likelihood for Hierarchical Models Using Data Doubling. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1262. [PMID: 37761561 PMCID: PMC10530212 DOI: 10.3390/e25091262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 08/04/2023] [Accepted: 08/18/2023] [Indexed: 09/29/2023]
Abstract
In scientific problems, an appropriate statistical model often involves a large number of canonical parameters. Often times, the quantities of scientific interest are real-valued functions of these canonical parameters. Statistical inference for a specified function of the canonical parameters can be carried out via the Bayesian approach by simply using the posterior distribution of the specified function of the parameter of interest. Frequentist inference is usually based on the profile likelihood for the parameter of interest. When the likelihood function is analytical, computing the profile likelihood is simply a constrained optimization problem with many numerical algorithms available. However, for hierarchical models, computing the likelihood function and hence the profile likelihood function is difficult because of the high-dimensional integration involved. We describe a simple computational method to compute profile likelihood for any specified function of the parameters of a general hierarchical model using data doubling. We provide a mathematical proof for the validity of the method under regularity conditions that assure that the distribution of the maximum likelihood estimator of the canonical parameters is non-singular, multivariate, and Gaussian.
Collapse
|
5
|
Dynamic logistic state space prediction model for clinical decision making. Biometrics 2023; 79:73-85. [PMID: 34697801 PMCID: PMC9038961 DOI: 10.1111/biom.13593] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 08/04/2021] [Accepted: 09/07/2021] [Indexed: 11/30/2022]
Abstract
Prediction modeling for clinical decision making is of great importance and needed to be updated frequently with the changes of patient population and clinical practice. Existing methods are either done in an ad hoc fashion, such as model recalibration or focus on studying the relationship between predictors and outcome and less so for the purpose of prediction. In this article, we propose a dynamic logistic state space model to continuously update the parameters whenever new information becomes available. The proposed model allows for both time-varying and time-invariant coefficients. The varying coefficients are modeled using smoothing splines to account for their smooth trends over time. The smoothing parameters are objectively chosen by maximum likelihood. The model is updated using batch data accumulated at prespecified time intervals, which allows for better approximation of the underlying binomial density function. In the simulation, we show that the new model has significantly higher prediction accuracy compared to existing methods. We apply the method to predict 1 year survival after lung transplantation using the United Network for Organ Sharing data.
Collapse
|
6
|
An approximate Bayesian approach for estimation of the instantaneous reproduction number under misreported epidemic data. Biom J 2023:e2200024. [PMID: 36639234 DOI: 10.1002/bimj.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 11/04/2022] [Accepted: 11/18/2022] [Indexed: 01/15/2023]
Abstract
In epidemic models, the effective reproduction number is of central importance to assess the transmission dynamics of an infectious disease and to orient health intervention strategies. Publicly shared data during an outbreak often suffers from two sources of misreporting (underreporting and delay in reporting) that should not be overlooked when estimating epidemiological parameters. The main statistical challenge in models that intrinsically account for a misreporting process lies in the joint estimation of the time-varying reproduction number and the delay/underreporting parameters. Existing Bayesian approaches typically rely on Markov chain Monte Carlo algorithms that are extremely costly from a computational perspective. We propose a much faster alternative based on Laplacian-P-splines (LPS) that combines Bayesian penalized B-splines for flexible and smooth estimation of the instantaneous reproduction number and Laplace approximations to selected posterior distributions for fast computation. Assuming a known generation interval distribution, the incidence at a given calendar time is governed by the epidemic renewal equation and the delay structure is specified through a composite link framework. Laplace approximations to the conditional posterior of the spline vector are obtained from analytical versions of the gradient and Hessian of the log-likelihood, implying a drastic speed-up in the computation of posterior estimates. Furthermore, the proposed LPS approach can be used to obtain point estimates and approximate credible intervals for the delay and reporting probabilities. Simulation of epidemics with different combinations for the underreporting rate and delay structure (one-day, two-day, and weekend delays) show that the proposed LPS methodology delivers fast and accurate estimates outperforming existing methods that do not take into account underreporting and delay patterns. Finally, LPS is illustrated in two real case studies of epidemic outbreaks.
Collapse
|
7
|
Federated learning algorithms for generalized mixed-effects model (GLMM) on horizontally partitioned data from distributed sources. BMC Med Inform Decis Mak 2022; 22:269. [PMID: 36244993 PMCID: PMC9569919 DOI: 10.1186/s12911-022-02014-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 10/04/2022] [Indexed: 01/04/2023] Open
Abstract
OBJECTIVES This paper developed federated solutions based on two approximation algorithms to achieve federated generalized linear mixed effect models (GLMM). The paper also proposed a solution for numerical errors and singularity issues. And showed the two proposed methods can perform well in revealing the significance of parameter in distributed datasets, comparing to a centralized GLMM algorithm from R package ('lme4') as the baseline model. METHODS The log-likelihood function of GLMM is approximated by two numerical methods (Laplace approximation and Gaussian Hermite approximation, abbreviated as LA and GH), which supports federated decomposition of GLMM to bring computation to data. To solve the numerical errors and singularity issues, the loss-less estimation of log-sum-exponential trick and the adaptive regularization strategy was used to tackle the problems caused by federated settings. RESULTS Our proposed method can handle GLMM to accommodate hierarchical data with multiple non-independent levels of observations in a federated setting. The experiment results demonstrate comparable (LA) and superior (GH) performances with simulated and real-world data. CONCLUSION We modified and compared federated GLMMs with different approximations, which can support researchers in analyzing versatile biomedical data to accommodate mixed effects and address non-independence due to hierarchical structures (i.e., institutes, region, country, etc.).
Collapse
|
8
|
Multivariate generalized linear mixed models for continuous bounded outcomes: Analyzing the body fat percentage data. Stat Methods Med Res 2021; 30:2619-2633. [PMID: 34825852 DOI: 10.1177/09622802211043276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We propose a multivariate regression model to handle multiple continuous bounded outcomes. We adopted the maximum likelihood approach for parameter estimation and inference. The model is specified by the product of univariate probability distributions and the correlation between the response variables is obtained through the correlation matrix of the random intercepts. For modeling continuous bounded variables on the interval (0,1) we considered the beta and unit gamma distributions. The main advantage of the proposed model is that we can easily combine different marginal distributions for the response variable vector. The computational implementation is performed using Template Model Builder, which combines the Laplace approximation with automatic differentiation. Therefore, the proposed approach allows us to estimate the model parameters quickly and efficiently. We conducted a simulation study to evaluate the computational implementation and the properties of the maximum likelihood estimators under different scenarios. Moreover, we investigate the impact of distribution misspecification in the proposed model. Our model was motivated by a data set with multiple continuous bounded outcomes, which refer to the body fat percentage measured at five regions of the body. Simulation studies and data analysis showed that the proposed model provides a general and rich framework to deal with multiple continuous bounded outcomes.
Collapse
|
9
|
High performance implementation of the hierarchical likelihood for generalized linear mixed models: an application to estimate the potassium reference range in massive electronic health records datasets. BMC Med Res Methodol 2021; 21:151. [PMID: 34303362 PMCID: PMC8310602 DOI: 10.1186/s12874-021-01318-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 05/12/2021] [Indexed: 12/03/2022] Open
Abstract
Background Converting electronic health record (EHR) entries to useful clinical inferences requires one to address the poor scalability of existing implementations of Generalized Linear Mixed Models (GLMM) for repeated measures. The major computational bottleneck concerns the numerical evaluation of multivariable integrals, which even for the simplest EHR analyses may involve millions of dimensions (one for each patient). The hierarchical likelihood (h-lik) approach to GLMMs is a methodologically rigorous framework for the estimation of GLMMs that is based on the Laplace Approximation (LA), which replaces integration with numerical optimization, and thus scales very well with dimensionality. Methods We present a high-performance, direct implementation of the h-lik for GLMMs in the R package TMB. Using this approach, we examined the relation of repeated serum potassium measurements and survival in the Cerner Real World Data (CRWD) EHR database. Analyzing this data requires the evaluation of an integral in over 3 million dimensions, putting this problem beyond the reach of conventional approaches. We also assessed the scalability and accuracy of LA in smaller samples of 1 and 10% size of the full dataset that were analyzed via the a) original, interconnected Generalized Linear Models (iGLM), approach to h-lik, b) Adaptive Gaussian Hermite (AGH) and c) the gold standard for multivariate integration Markov Chain Monte Carlo (MCMC). Results Random effects estimates generated by the LA were within 10% of the values obtained by the iGLMs, AGH and MCMC techniques. The H-lik approach was 4–30 times faster than AGH and nearly 800 times faster than MCMC. The major clinical inferences in this problem are the establishment of the non-linear relationship between the potassium level and the risk of mortality, as well as estimates of the individual and health care facility sources of variations for mortality risk in CRWD. Conclusions We found that the direct implementation of the h-lik offers a computationally efficient, numerically accurate approach for the analysis of extremely large, real world repeated measures data via the h-lik approach to GLMMs. The clinical inference from our analysis may guide choices of treatment thresholds for treating potassium disorders in the clinic. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01318-6.
Collapse
|
10
|
Improvement of Bobrovsky-Mayor-Wolf-Zakai Bound. ENTROPY 2021; 23:e23020161. [PMID: 33525685 PMCID: PMC7912606 DOI: 10.3390/e23020161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 01/26/2021] [Accepted: 01/26/2021] [Indexed: 11/21/2022]
Abstract
This paper presents a difference-type lower bound for the Bayes risk as a difference-type extension of the Borovkov–Sakhanenko bound. The resulting bound asymptotically improves the Bobrovsky–Mayor–Wolf–Zakai bound which is difference-type extension of the Van Trees bound. Some examples are also given.
Collapse
|
11
|
Modeling perinatal mortality in twins via generalized additive mixed models: a comparison of estimation approaches. BMC Med Res Methodol 2019; 19:209. [PMID: 31730446 PMCID: PMC6858726 DOI: 10.1186/s12874-019-0861-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 11/04/2019] [Indexed: 11/10/2022] Open
Abstract
Background The analysis of twin data presents a unique challenge. Second-born twins on average weigh less than first-born twins and have an elevated risk of perinatal mortality. It is not clear whether the risk difference depends on birth order or their relative birth weight. This study evaluates the association between birth order and perinatal mortality by birth order-specific weight difference in twin pregnancies. Methods We adopt generalized additive mixed models (GAMMs) which are a flexible version of generalized linear mixed models (GLMMs), to model the association. Estimation of such models for correlated binary data is challenging. We compare both Bayesian and likelihood-based approaches for estimating GAMMs via simulation. We apply the methods to the US matched multiple birth data to evaluate the association between twins’ birth order and perinatal mortality. Results Perinatal mortality depends on both birth order and relative birthweight. Simulation results suggest that the Bayesian method with half-Cauchy priors for variance components performs well in estimating all components of the GAMM. The Bayesian results were sensitive to prior specifications. Conclusion We adopted a flexible statistical model, GAMM, to precisely estimate the perinatal mortality risk differences between first- and second-born twins whereby birthweight and gestational age are nonparametrically modelled to explicitly adjust for their effects. The risk of perinatal mortality in twins was found to depend on both birth order and relative birthweight. We demonstrated that the Bayesian method estimated the GAMM model components more reliably than the frequentist approaches.
Collapse
|
12
|
MM ALGORITHMS FOR VARIANCE COMPONENT ESTIMATION AND SELECTION IN LOGISTIC LINEAR MIXED MODEL. Stat Sin 2019; 29:1585-1605. [PMID: 32523320 PMCID: PMC7286582 DOI: 10.5705/ss.202017.0220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Logistic linear mixed models are widely used in experimental designs and genetic analyses of binary traits. Motivated by modern applications, we consider the case of many groups of random effects, where each group corresponds to a variance component. When the number of variance components is large, fitting a logistic linear mixed model is challenging. Thus, we develop two efficient and stable minorization-maximization (MM) algorithms for estimating variance components based on a Laplace approximation of the logistic model. One of these leads to a simple iterative soft-thresholding algorithm for variance component selection using the maximum penalized approximated likelihood. We demonstrate the variance component estimation and selection performance of our algorithms by means of simulation studies and an analysis of real data.
Collapse
|
13
|
A Fast ML-Based Single-Step Localization Method Using EM Algorithm Based on Time Delay and Doppler Shift for a Far-Field Scenario. SENSORS 2018; 18:s18124139. [PMID: 30486271 PMCID: PMC6308437 DOI: 10.3390/s18124139] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 11/16/2018] [Accepted: 11/19/2018] [Indexed: 11/16/2022]
Abstract
This study discusses the localization problem based on time delay and Doppler shift for a far-field scenario. The conventional location methods employ two steps that first extract intermediate parameters from the received signals and then determine the source position from the measured parameters. As opposed to the traditional two-step methods, the direct position determination (DPD) methods accomplish the localization in a single step without computing intermediate parameters. However, the DPD cost function often remains non-convex, thereby it will cost a high amount of computational resources to find the estimated position through traversal search. Weiss proposed a DPD estimator to mitigate the computational complexity via eigenvalue decomposition. Unfortunately, when the computational resources are rather limited, Weiss's method fails to satisfy the timeliness. To solve this problem, this paper develops a DPD estimator using expectation maximization (EM) algorithm based on time delay and Doppler shift. The proposed method starts from choosing the transmitter-receiver range vector as the hidden variable. Then, the cost function is separated and simplified via the hidden variable, accomplishing the transformation from the high dimensional nonlinear search problem into a few one dimensional search subproblems. Finally, the expressions of EM repetition are obtained through Laplace approximation. In addition, we derive the Cramér⁻Rao bound to evaluate the best localization performance in this paper. Simulation results confirm that, on the basis of guaranteeing high accuracy, the proposed algorithm makes a good compromise in localization performance and computational complexity.
Collapse
|
14
|
Spherical Minimum Description Length. ENTROPY (BASEL, SWITZERLAND) 2018; 20:e20080575. [PMID: 33265664 PMCID: PMC7513099 DOI: 10.3390/e20080575] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 07/28/2018] [Accepted: 07/30/2018] [Indexed: 06/01/2023]
Abstract
We consider the problem of model selection using the Minimum Description Length (MDL) criterion for distributions with parameters on the hypersphere. Model selection algorithms aim to find a compromise between goodness of fit and model complexity. Variables often considered for complexity penalties involve number of parameters, sample size and shape of the parameter space, with the penalty term often referred to as stochastic complexity. Current model selection criteria either ignore the shape of the parameter space or incorrectly penalize the complexity of the model, largely because typical Laplace approximation techniques yield inaccurate results for curved spaces. We demonstrate how the use of a constrained Laplace approximation on the hypersphere yields a novel complexity measure that more accurately reflects the geometry of these spherical parameters spaces. We refer to this modified model selection criterion as spherical MDL. As proof of concept, spherical MDL is used for bin selection in histogram density estimation, performing favorably against other model selection criteria.
Collapse
|
15
|
Abstract
Next generation sequencing allows the identification of genes consisting of differentially expressed transcripts, a term which usually refers to changes in the overall expression level. A specific type of differential expression is differential transcript usage (DTU) and targets changes in the relative within gene expression of a transcript. The contribution of this paper is to: (a) extend the use of cjBitSeq to the DTU context, a previously introduced Bayesian model which is originally designed for identifying changes in overall expression levels and (b) propose a Bayesian version of DRIMSeq, a frequentist model for inferring DTU. cjBitSeq is a read based model and performs fully Bayesian inference by MCMC sampling on the space of latent state of each transcript per gene. BayesDRIMSeq is a count based model and estimates the Bayes Factor of a DTU model against a null model using Laplace's approximation. The proposed models are benchmarked against the existing ones using a recent independent simulation study as well as a real RNA-seq dataset. Our results suggest that the Bayesian methods exhibit similar performance with DRIMSeq in terms of precision/recall but offer better calibration of False Discovery Rate.
Collapse
|
16
|
Jointly modeling longitudinal proportional data and survival times with an application to the quality of life data in a breast cancer trial. LIFETIME DATA ANALYSIS 2017; 23:183-206. [PMID: 26403909 DOI: 10.1007/s10985-015-9346-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 09/15/2015] [Indexed: 06/05/2023]
Abstract
Motivated by the joint analysis of longitudinal quality of life data and recurrence free survival times from a cancer clinical trial, we present in this paper two approaches to jointly model the longitudinal proportional measurements, which are confined in a finite interval, and survival data. Both approaches assume a proportional hazards model for the survival times. For the longitudinal component, the first approach applies the classical linear mixed model to logit transformed responses, while the second approach directly models the responses using a simplex distribution. A semiparametric method based on a penalized joint likelihood generated by the Laplace approximation is derived to fit the joint model defined by the second approach. The proposed procedures are evaluated in a simulation study and applied to the analysis of breast cancer data motivated this research.
Collapse
|
17
|
Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and derivation of approximate Bayes and fractional Bayes factors for the complete family of models. J Theor Biol 2017; 417:131-141. [PMID: 28088357 DOI: 10.1016/j.jtbi.2016.12.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/30/2016] [Accepted: 12/28/2016] [Indexed: 11/22/2022]
Abstract
This study corresponds to the second part of a companion paper devoted to the development of Bayesian multiple regression models accounting for randomness of genotypes in across population genome-wide prediction. This family of models considers heterogeneous and correlated marker effects and allelic frequencies across populations, and has the ability of considering records from non-genotyped individuals and individuals with missing genotypes in any subset of loci without the need for previous imputation, taking into account uncertainty about imputed genotypes. This paper extends this family of models by considering multivariate spike and slab conditional priors for marker allele substitution effects and contains derivations of approximate Bayes factors and fractional Bayes factors to compare models from part I and those developed here with their null versions. These null versions correspond to simpler models ignoring heterogeneity of populations, but still accounting for randomness of genotypes. For each marker loci, the spike component of priors corresponded to point mass at 0 in RS, where S is the number of populations, and the slab component was a S-variate Gaussian distribution, independent conditional priors were assumed. For the Gaussian components, covariance matrices were assumed to be either the same for all markers or different for each marker. For null models, the priors were simply univariate versions of these finite mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes factors were found using the Laplace approximation. Using the simulated datasets described in part I, these models were implemented and compared with models derived in part I using measures of predictive performance based on squared Pearson correlations, Deviance Information Criterion, Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of genome-wide prediction models making it more flexible in the sense that it now offers more modeling options.
Collapse
|
18
|
Reliability analysis using an exponential power model with bathtub-shaped failure rate function: a Bayes study. SPRINGERPLUS 2016; 5:1076. [PMID: 27462524 PMCID: PMC4943921 DOI: 10.1186/s40064-016-2722-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Accepted: 06/30/2016] [Indexed: 11/10/2022]
Abstract
Models with bathtub-shaped hazard function have been widely accepted in the field of reliability and medicine and are particularly useful in reliability related decision making and cost analysis. In this paper, the exponential power model capable of assuming increasing as well as bathtub-shape, is studied. This article makes a Bayesian study of the same model and simultaneously shows how posterior simulations based on Markov chain Monte Carlo algorithms can be straightforward and routine in R. The study is carried out for complete as well as censored data, under the assumption of weakly-informative priors for the parameters. In addition to this, inference interest focuses on the posterior distribution of non-linear functions of the parameters. Also, the model has been extended to include continuous explanatory variables and R-codes are well illustrated. Two real data sets are considered for illustrative purposes.
Collapse
|
19
|
Abstract
We use exponential tilting to obtain versions of asymptotic formulae for Bayesian computation that do not involve conditional maxima of the likelihood function, yielding a more stable computational procedure and significantly reducing computational time. In particular we present an alternative version of the Laplace approximation for a marginal posterior density. Implementation of the asymptotic formulae and a modified signed root based importance sampler are illustrated with an example.
Collapse
|
20
|
Counter-factual mathematics of counterfactual predictive models. Front Psychol 2014; 5:801. [PMID: 25202284 PMCID: PMC4142339 DOI: 10.3389/fpsyg.2014.00801] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2014] [Accepted: 07/07/2014] [Indexed: 12/05/2022] Open
|
21
|
Bayesian analysis of generalized log-Burr family with R. SPRINGERPLUS 2014; 3:185. [PMID: 24839586 PMCID: PMC4022970 DOI: 10.1186/2193-1801-3-185] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 03/23/2014] [Indexed: 11/10/2022]
Abstract
Log-Burr distribution is a generalization of logistic and extreme value distributions, which are important reliability models. In this paper, Bayesian approach is used to model reliability data for log-Burr model using analytic and simulation tools. Laplace approximation is implemented for approximating posterior densities of the parameters. Moreover, parallel simulation tools are also implemented using ‘LaplacesDemon’ package of R.
Collapse
|
22
|
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages. AM STAT 2013; 67. [PMID: 24288415 DOI: 10.1080/00031305.2013.817357] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Collapse
|
23
|
Inference for Size Demography from Point Pattern Data using Integral Projection Models. JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS 2012; 17. [PMID: 24223480 DOI: 10.1007/s13253-012-0123-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Population dynamics with regard to evolution of traits has typically been studied using matrix projection models (MPMs). Recently, to work with continuous traits, integral projection models (IPMs) have been proposed. Imitating the path with MPMs, IPMs are handled first with a fitting stage, then with a projection stage. Fitting these models has so far been done only with individual-level transition data. These data are used to estimate the demographic functions (survival, growth, fecundity) that comprise the kernel of the IPM specification. Then, the estimated kernel is iterated from an initial trait distribution to project steady state population behavior under this kernel. When trait distributions are observed over time, such an approach does not align projected distributions with these observed temporal benchmarks. The contribution here, focusing on size distributions, is to address this issue. Our concern is that the above approach introduces an inherent mismatch in scales. The redistribution kernel in the IPM proposes a mechanistic description of population level redistribution. A kernel of the same functional form, fitted to data at the individual level, would provide a mechanistic model for individual-level processes. Resulting parameter estimates and the associated estimated kernel are at the wrong scale and do not allow population-level interpretation. Our approach views the observed size distribution at a given time as a point pattern over a bounded interval. We build a three-stage hierarchical model to infer about the dynamic intensities used to explain the observed point patterns. This model is driven by a latent deterministic IPM and we introduce uncertainty by having the operating IPM vary around this deterministic specification. Further uncertainty arises in the realization of the point pattern given the operating IPM. Fitted within a Bayesian framework, such modeling enables full inference about all features of the model. Such dynamic modeling, optimized by fitting data observed over time, is better suited to projection. Exact Bayesian model fitting is very computationally challenging; we offer approximate strategies to facilitate computation. We illustrate with simulated data examples as well as well as a set of annual tree growth data from Duke Forest in North Carolina. A further example shows the benefit of our approach, in terms of projection, compared with the foregoing individual level fitting.
Collapse
|
24
|
Assessing variance components in multilevel linear models using approximate Bayes factors: A case study of ethnic disparities in birthweight. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2011; 174:785-804. [PMID: 24082430 PMCID: PMC3784317 DOI: 10.1111/j.1467-985x.2011.00685.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Racial/ethnic disparities in birthweight are a large source of differential morbidity and mortality worldwide and have remained largely unexplained in epidemiologic models. We assess the impact of maternal ancestry and census tract residence on infant birth weights in New York City and the modifying effects of race and nativity by incorporating random effects in a multilevel linear model. Evaluating the significance of these predictors involves the test of whether the variances of the random effects are equal to zero. This is problematic because the null hypothesis lies on the boundary of the parameter space. We generalize an approach for assessing random effects in the two-level linear model to a broader class of multilevel linear models by scaling the random effects to the residual variance and introducing parameters that control the relative contribution of the random effects. After integrating over the random effects and variance components, the resulting integrals needed to calculate the Bayes factor can be efficiently approximated with Laplace's method.
Collapse
|
25
|
USING PROFILE LIKELIHOOD FOR SEMIPARAMETRIC MODEL SELECTION WITH APPLICATION TO PROPORTIONAL HAZARDS MIXED MODELS. Stat Sin 2009; 19:819-842. [PMID: 31762585 PMCID: PMC6874104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
We consider selection of nested and non-nested semiparametric models. Using profile likelihood we can define both a likelihood ratio statistic and an Akaike information for models with nuisance parameters. Asymptotic quadratic expansion of the log profile likelihood allows derivation of the asymptotic null distribution of the likelihood ratio statistic including the boundary cases, as well as unbiased estimation of the Akaike information by an Akaike information criterion. Our work was motivated by the proportional hazards mixed effects model (PHMM), which incorporates general random effects of arbitrary covariates and includes the frailty model as a special case. The asymptotic properties of its parameter estimate has recently been established, which enables the quadratic expansion of the log profile likelihood. For computation of the (profile) likelihood under PHMM we apply three algorithms: Laplace approximation, reciprocal importance sampling, and bridge sampling. We compare the three algorithms under different data structures, and apply the methods to a multi-center lung cancer clinical trial.
Collapse
|