101
|
Zhou H, Alexander D, Lange K. A quasi-Newton acceleration for high-dimensional optimization algorithms. STATISTICS AND COMPUTING 2011; 21:261-273. [PMID: 21359052 PMCID: PMC3045213 DOI: 10.1007/s11222-009-9166-3] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
In many statistical problems, maximum likelihood estimation by an EM or MM algorithm suffers from excruciatingly slow convergence. This tendency limits the application of these algorithms to modern high-dimensional problems in data mining, genomics, and imaging. Unfortunately, most existing acceleration techniques are ill-suited to complicated models involving large numbers of parameters. The squared iterative methods (SQUAREM) recently proposed by Varadhan and Roland constitute one notable exception. This paper presents a new quasi-Newton acceleration scheme that requires only modest increments in computation per iteration and overall storage and rivals or surpasses the performance of SQUAREM on several representative test problems.
Collapse
Affiliation(s)
- Hua Zhou
- Department of Human Genetics, University of California, Los Angeles, CA, USA 90095,
| | | | | |
Collapse
|
102
|
Yu Y, Meng XL. To Center or Not to Center: That Is Not the Question—An Ancillarity–Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency. J Comput Graph Stat 2011. [DOI: 10.1198/jcgs.2011.203main] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
103
|
Lachos VH, Labra FV, Bolfarine H, Ghosh P. Multivariate measurement error models based on scale mixtures of the skew–normal distribution. STATISTICS-ABINGDON 2010. [DOI: 10.1080/02331880903236926] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
104
|
van Dyk DA, Meng XL. Cross-Fertilizing Strategies for Better EM Mountain Climbing and DA Field Exploration: A Graphical Guide Book. Stat Sci 2010. [DOI: 10.1214/09-sts309] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
105
|
|
106
|
Tanner MA, Wong WH. From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s. Stat Sci 2010. [DOI: 10.1214/10-sts341] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
107
|
Xu H, Craig BA. Likelihood Analysis of Multivariate Probit Models Using a Parameter Expanded MCEM Algorithm. Technometrics 2010; 52:340-348. [PMID: 21042430 PMCID: PMC2966284 DOI: 10.1198/tech.2010.09055] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Multivariate binary data arise in a variety of settings. In this paper, we propose a practical and efficient computational framework for maximum likelihood estimation of multivariate probit regression models. This approach uses the Monte Carlo EM (MCEM) algorithm, with parameter expansion to complete the M-step, to avoid the direct evaluation of the intractable multivariate normal orthant probabilities. The parameter expansion not only enables a closed-form solution in the M-step but also improves efficiency. Using the simulation studies, we compare the performance of our approach with the MCEM algorithms developed by Chib and Greenberg (1998) and Song and Lee (2005), as well as the iterative approach proposed by Li and Schafer (2008). Our approach is further illustrated using a real-world example.
Collapse
Affiliation(s)
- Huiping Xu
- Department of Mathematics and Statistics, Mississippi State University, Mississippi State, MS 39762 ()
| | - Bruce A. Craig
- Department of Statistics Purdue University, West Lafayette, IN 47907 ()
| |
Collapse
|
108
|
Sun J, Kaban A. A fast algorithm for robust mixtures in the presence of measurement errors. ACTA ACUST UNITED AC 2010; 21:1206-20. [PMID: 20639180 DOI: 10.1109/tnn.2010.2048219] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In experimental and observational sciences, detecting atypical, peculiar data from large sets of measurements has the potential of highlighting candidates of interesting new types of objects that deserve more detailed domain-specific followup study. However, measurement data is nearly never free of measurement errors. These errors can generate false outliers that are not truly interesting. Although many approaches exist for finding outliers, they have no means to tell to what extent the peculiarity is not simply due to measurement errors. To address this issue, we have developed a model-based approach to infer genuine outliers from multivariate data sets when measurement error information is available. This is based on a probabilistic mixture of hierarchical density models, in which parameter estimation is made feasible by a tree-structured variational expectation-maximization algorithm. Here, we further develop an algorithmic enhancement to address the scalability of this approach, in order to make it applicable to large data sets, via a K-dimensional-tree based partitioning of the variational posterior assignments. This creates a non-trivial tradeoff between a more detailed noise model to enhance the detection accuracy, and the coarsened posterior representation to obtain computational speedup. Hence, we conduct extensive experimental validation to study the accuracy/speed tradeoffs achievable in a variety of data conditions. We find that, at low-to-moderate error levels, a speedup factor that is at least linear in the number of data points can be achieved without significantly sacrificing the detection accuracy. The benefits of including measurement error information into the modeling is evident in all situations, and the gain roughly recovers the loss incurred by the speedup procedure in large error conditions. We analyze and discuss in detail the characteristics of our algorithm based on results obtained on appropriately designed synthetic data experiments, and we also demonstrate its working in a real application example.
Collapse
Affiliation(s)
- Jianyong Sun
- Center for Plant Integrative Biology, School of Bioscience, The University of Nottingham, Sutton Bonington LE12 5RD, UK.
| | | |
Collapse
|
109
|
On Monte Carlo methods for Bayesian multivariate regression models with heavy-tailed errors. J MULTIVARIATE ANAL 2010. [DOI: 10.1016/j.jmva.2009.12.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
110
|
Saâdaoui F. Acceleration of the EM algorithm via extrapolation methods: Review, comparison and new methods. Comput Stat Data Anal 2010. [DOI: 10.1016/j.csda.2008.11.011] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
111
|
Poon WY, Wang HB. Analysis of ordinal categorical data with misclassification. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2010; 63:17-42. [PMID: 19364445 DOI: 10.1348/000711008x401314] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
We develop a method for the analysis of multivariate ordinal categorical data with misclassification based on the latent normal variable approach. Misclassification arises if a subject has been classified into a category that does not truly reflect its actual state, and can occur with one or more variables. A basic framework is developed to enable the analysis of two types of data. The first corresponds to a single sample that is obtained from a fallible design that may lead to misclassified data. The other corresponds to data that is obtained by double sampling. Double sampling data consists of two parts: a sample that is obtained by classifying subjects using the fallible design only and a sample that is obtained by classifying subjects using both fallible and true designs, which is assumed to have no misclassification. A unified expectation-maximization approach is developed to find the maximum likelihood estimate of model parameters. Simulation studies and examples that are based on real data are used to demonstrate the applicability and practicability of the proposed methods.
Collapse
Affiliation(s)
- Wai-Yin Poon
- Department of Statistics, Chinese University of Hong Kong, Shatin, Hong Kong, People's Republic of China.
| | | |
Collapse
|
112
|
|
113
|
Staudenmayer J, Lake EE, Wand MP. Robustness for general design mixed models using the t-distribution. STAT MODEL 2009. [DOI: 10.1177/1471082x0800900304] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The t-distribution allows the incorporation of outlier robustness into statistical models while retaining the elegance of likelihood-based inference. In this paper, we develop and implement a linear mixed model for the general design of the linear mixed model using the univariate t-distribution. This general design allows a considerably richer class of models to be fit than is possible with existing methods. Included in this class are semi-parametric regression and smoothing and spatial models.
Collapse
Affiliation(s)
- J Staudenmayer
- Department of Mathematics and Statistics, University of Massachusetts, USA
| | - E E Lake
- Eigenstat Inc., Newton, Massachusetts, USA
| | - M P Wand
- School of Mathematics and Applied Statistics, University of Wollongong, Australia
| |
Collapse
|
114
|
Abstract
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling.
Collapse
Affiliation(s)
- Hao Wang
- Department of Statistical Science , Duke University , Durham, North Carolina 27708 , U.S.A.
| | | |
Collapse
|
115
|
|
116
|
Browne WJ, Steele F, Golalizadeh M, Green MJ. The use of simple reparameterizations to improve the efficiency of Markov chain Monte Carlo estimation for multilevel models with applications to discrete time survival models. JOURNAL OF THE ROYAL STATISTICAL SOCIETY. SERIES A, (STATISTICS IN SOCIETY) 2009; 172:579-598. [PMID: 19649268 PMCID: PMC2718325 DOI: 10.1111/j.1467-985x.2009.00586.x] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We consider the application of Markov chain Monte Carlo (MCMC) estimation methods to random-effects models and in particular the family of discrete time survival models. Survival models can be used in many situations in the medical and social sciences and we illustrate their use through two examples that differ in terms of both substantive area and data structure. A multilevel discrete time survival analysis involves expanding the data set so that the model can be cast as a standard multilevel binary response model. For such models it has been shown that MCMC methods have advantages in terms of reducing estimate bias. However, the data expansion results in very large data sets for which MCMC estimation is often slow and can produce chains that exhibit poor mixing. Any way of improving the mixing will result in both speeding up the methods and more confidence in the estimates that are produced. The MCMC methodological literature is full of alternative algorithms designed to improve mixing of chains and we describe three reparameterization techniques that are easy to implement in available software. We consider two examples of multilevel survival analysis: incidence of mastitis in dairy cattle and contraceptive use dynamics in Indonesia. For each application we show where the reparameterization techniques can be used and assess their performance.
Collapse
|
117
|
Ghosh P, Tu W. Assessing Sexual Attitudes and Behaviors of Young Women: A Joint Model with Nonlinear Time Effects, Time Varying Covariates, and Dropouts. J Am Stat Assoc 2009. [DOI: 10.1198/jasa.2009.0013] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
118
|
Liu X, Daniels MJ, Marcus B. Joint Models for the Association of Longitudinal Binary and Continuous Processes With Application to a Smoking Cessation Trial. J Am Stat Assoc 2009; 104:429-438. [PMID: 20161053 PMCID: PMC2746699 DOI: 10.1198/016214508000000904] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Joint models for the association of a longitudinal binary and a longitudinal continuous process are proposed for situations in which their association is of direct interest. The models are parameterized such that the dependence between the two processes is characterized by unconstrained regression coefficients. Bayesian variable selection techniques are used to parsimoniously model these coefficients. A Markov chain Monte Carlo (MCMC) sampling algorithm is developed for sampling from the posterior distribution, using data augmentation steps to handle missing data. Several technical issues are addressed to implement the MCMC algorithm efficiently. The models are motivated by, and are used for, the analysis of a smoking cessation clinical trial in which an important question of interest was the effect of the (exercise) treatment on the relationship between smoking cessation and weight gain.
Collapse
Affiliation(s)
- Xuefeng Liu
- Department of Biostatistics and Epidemiology, East Tennessee State University, Johnson City, TN 37614
| | | | | |
Collapse
|
119
|
Dudley RM, Sidenko S, Wang Z. Differentiability of t-functionals of location and scatter. Ann Stat 2009. [DOI: 10.1214/08-aos592] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
120
|
Xu H, Craig BA. A Probit Latent Class Model with General Correlation Structures for Evaluating Accuracy of Diagnostic Tests. Biometrics 2009; 65:1145-55. [DOI: 10.1111/j.1541-0420.2008.01194.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
121
|
Ghosh J, Dunson DB. Default Prior Distributions and Efficient Posterior Computation in Bayesian Factor Analysis. J Comput Graph Stat 2009; 18:306-320. [PMID: 23997568 DOI: 10.1198/jcgs.2009.07145] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Factor analytic models are widely used in social sciences. These models have also proven useful for sparse modeling of the covariance structure in multidimensional data. Normal prior distributions for factor loadings and inverse gamma prior distributions for residual variances are a popular choice because of their conditionally conjugate form. However, such prior distributions require elicitation of many hyperparameters and tend to result in poorly behaved Gibbs samplers. In addition, one must choose an informative specification, as high variance prior distributions face problems due to impropriety of the posterior distribution. This article proposes a default, heavy-tailed prior distribution specification, which is induced through parameter expansion while facilitating efficient posterior computation. We also develop an approach to allow uncertainty in the number of factors. The methods are illustrated through simulated examples and epidemiology and toxicology applications. Data sets and computer code used in this article are available online.
Collapse
Affiliation(s)
- Joyee Ghosh
- Department of Biostatistics, The University of North Carolina, Chapel Hill, NC 27599
| | | |
Collapse
|
122
|
Xie X, Yan S, Kwok JT, Huang TS. Matrix-variate factor analysis and its applications. ACTA ACUST UNITED AC 2008; 19:1821-6. [PMID: 18842486 DOI: 10.1109/tnn.2008.2004963] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Factor analysis (FA) seeks to reveal the relationship between an observed vector variable and a latent variable of reduced dimension. It has been widely used in many applications involving high-dimensional data, such as image representation and face recognition. An intrinsic limitation of FA lies in its potentially poor performance when the data dimension is high, a problem known as curse of dimensionality. Motivated by the fact that images are inherently matrices, we develop, in this brief, an FA model for matrix-variate variables and present an efficient parameter estimation algorithm. Experiments on both toy and real-world image data demonstrate that the proposed matrix-variant FA model is more efficient and accurate than the classical FA approach, especially when the observed variable is high-dimensional and the samples available are limited.
Collapse
Affiliation(s)
- Xianchao Xie
- Department of Statistics, Harvard University Science Center, Cambridge, MA 02138-2901, USA.
| | | | | | | |
Collapse
|
123
|
Robust reconstruction of low-resolution document images by exploiting repetitive character behaviour. INT J DOC ANAL RECOG 2008. [DOI: 10.1007/s10032-008-0068-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
124
|
Boscardin WJ, Zhang X, Belin TR. Modeling a mixture of ordinal and continuous repeated measures. J STAT COMPUT SIM 2008. [DOI: 10.1080/00949650701480259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
125
|
Computationally efficient learning of multivariate t mixture models with missing information. Comput Stat 2008. [DOI: 10.1007/s00180-008-0129-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
126
|
Pilla RS, Kim Y, Lee H. On casting random-effects models in a survival framework. J R Stat Soc Series B Stat Methodol 2008. [DOI: 10.1111/j.1467-9868.2007.00652.x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
127
|
VARADHAN RAVI, ROLAND CHRISTOPHE. Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm. Scand Stat Theory Appl 2008. [DOI: 10.1111/j.1467-9469.2007.00585.x] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
128
|
Lawrence E, Bingham D, Liu C, Nair VN. Bayesian Inference for Multivariate Ordinal Data Using Parameter Expansion. Technometrics 2008. [DOI: 10.1198/004017008000000064] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
129
|
Gelman A, van Dyk DA, Huang Z, Boscardin JW. Using Redundant Parameterizations to Fit Hierarchical Models. J Comput Graph Stat 2008. [DOI: 10.1198/106186008x287337] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
130
|
|
131
|
Meyer K. Parameter expansion for estimation of reduced rank covariance matrices (Open Access publication). Genet Sel Evol 2008. [DOI: 10.1051/gse:2007032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
132
|
Meyer K. WOMBAT: a tool for mixed model analyses in quantitative genetics by restricted maximum likelihood (REML). J Zhejiang Univ Sci B 2007; 8:815-21. [PMID: 17973343 DOI: 10.1631/jzus.2007.b0815] [Citation(s) in RCA: 536] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
WOMBAT is a software package for quantitative genetic analyses of continuous traits, fitting a linear, mixed model; estimates of covariance components and the resulting genetic parameters are obtained by restricted maximum likelihood. A wide range of models, comprising numerous traits, multiple fixed and random effects, selected genetic covariance structures, random regression models and reduced rank estimation are accommodated. WOMBAT employs up-to-date numerical and computational methods. Together with the use of efficient compilers, this generates fast executable programs, suitable for large scale analyses. Use of WOMBAT is illustrated for a bivariate analysis. The package consists of the executable program, available for LINUX and WINDOWS environments, manual and a set of worked example, and can be downloaded free of charge from (http://agbu. une.edu.au/~kmeyer/wombat.html).
Collapse
Affiliation(s)
- Karin Meyer
- Animal Genetics and Breeding Unit, University of New England, Armidale, NSW, Australia.
| |
Collapse
|
133
|
Rodrigue N, Philippe H, Lartillot N. Exploring Fast Computational Strategies for Probabilistic Phylogenetic Analysis. Syst Biol 2007; 56:711-26. [PMID: 17849326 DOI: 10.1080/10635150701611258] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
In recent years, the advent of Markov chain Monte Carlo (MCMC) techniques, coupled with modern computational capabilities, has enabled the study of evolutionary models without a closed form solution of the likelihood function. However, current Bayesian MCMC applications can incur significant computational costs, as they are based on a full sampling from the posterior probability distribution of the parameters of interest. Here, we draw attention as to how MCMC techniques can be embedded within normal approximation strategies for more economical statistical computation. The overall procedure is based on an estimate of the first and second moments of the likelihood function, as well as a maximum likelihood estimate. Through examples, we review several MCMC-based methods used in the statistical literature for such estimation, applying the approaches to constructing posterior distributions under non-analytical evolutionary models relaxing the assumptions of rate homogeneity, and of independence between sites. Finally, we use the procedures for conducting Bayesian model selection, based on Laplace approximations of Bayes factors, which we find to be accurate and computationally advantageous. Altogether, the methods we expound here, as well as other related approaches from the statistical literature, should prove useful when investigating increasingly complex descriptions of molecular evolution, alleviating some of the difficulties associated with nonanalytical models.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Québec, Canada.
| | | | | |
Collapse
|
134
|
Meza C, Jaffrézic F, Foulley JL. REML Estimation of Variance Parameters in Nonlinear Mixed Effects Models Using the SAEM Algorithm. Biom J 2007; 49:876-88. [PMID: 17638294 DOI: 10.1002/bimj.200610348] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Nonlinear mixed effects models are now widely used in biometrical studies, especially in pharmacokinetic research or for the analysis of growth traits for agricultural and laboratory species. Most of these studies, however, are often based on ML estimation procedures, which are known to be biased downwards. A few REML extensions have been proposed, but only for approximated methods. The aim of this paper is to present a REML implementation for nonlinear mixed effects models within an exact estimation scheme, based on an integration of the fixed effects and a stochastic estimation procedure. This method was implemented via a stochastic EM, namely the SAEM algorithm. The simulation study showed that the proposed REML estimation procedure considerably reduced the bias observed with the ML estimation, as well as the residual mean squared error of the variance parameter estimations, especially in the unbalanced cases. ML and REML based estimators of fixed effects were also compared via simulation. Although the two kinds of estimates were very close in terms of bias and mean square error, predictions of individual profiles were clearly improved when using REML vs. ML. An application of this estimation procedure is presented for the modelling of growth in lines of chicken.
Collapse
Affiliation(s)
- Cristian Meza
- Laboratoire de Mathématiques, Université Paris-Sud, Bât. 425, 91405 Orsay Cedex, France.
| | | | | |
Collapse
|
135
|
|
136
|
Abstract
We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.
Collapse
Affiliation(s)
- Satkartar K Kinney
- Institute of Statistics and Decision Sciences, Duke University, Box 90251, Durham, North Carolina 27705, USA.
| | | |
Collapse
|
137
|
Rässler S, Rubin DB, Zell ER. 19 Incomplete Data in Epidemiology and Medical Statistics. ACTA ACUST UNITED AC 2007. [DOI: 10.1016/s0169-7161(07)27019-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
|
138
|
Zhang X, Boscardin WJ, Belin TR. Sampling Correlation Matrices in Bayesian Models With Correlated Latent Variables. J Comput Graph Stat 2006. [DOI: 10.1198/106186006x160050] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
139
|
Samson A, Lavielle M, Mentré F. Extension of the SAEM algorithm to left-censored data in nonlinear mixed-effects model: Application to HIV dynamics model. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2006.05.007] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
140
|
|
141
|
Li H. The covariance structure and likelihood function for multivariate dyadic data. J MULTIVARIATE ANAL 2006. [DOI: 10.1016/j.jmva.2005.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
142
|
Tan M, Tian GL, Wang Ng K. Hierarchical models for repeated binary data using the IBF sampler. Comput Stat Data Anal 2006. [DOI: 10.1016/j.csda.2004.12.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
143
|
Ma J, Xu L. Asymptotic convergence properties of the EM algorithm with respect to the overlap in the mixture. Neurocomputing 2005. [DOI: 10.1016/j.neucom.2004.12.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
144
|
Thompson R, Brotherstone S, White IMS. Estimation of quantitative genetic parameters. Philos Trans R Soc Lond B Biol Sci 2005; 360:1469-77. [PMID: 16048789 PMCID: PMC1569516 DOI: 10.1098/rstb.2005.1676] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
This paper gives a short review of the development of genetic parameter estimation over the last 40 years. This shows the development of more statistically and computationally efficient methods that allow the fitting of more biologically appropriate models. Methods have evolved from direct methods based on covariances between relatives to methods based on individual animal models. Maximum-likelihood methods have a natural interpretation in terms of best linear unbiased predictors. Improvements in iterative schemes to give estimates are discussed. As an example, a recent estimation of genetic parameters for a British population of dairy cattle is discussed. The development makes a connection to relevant work by Bill Hill.
Collapse
|
145
|
|
146
|
Shi NZ, Zheng SR, Guo J. The restricted EM algorithm under inequality restrictions on the parameters. J MULTIVARIATE ANAL 2005. [DOI: 10.1016/s0047-259x(03)00134-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
147
|
Chen MH, Ibrahim JG, Sinha D. A new joint model for longitudinal and survival data with a cure fraction. J MULTIVARIATE ANAL 2004. [DOI: 10.1016/j.jmva.2004.04.005] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
148
|
Fearnhead P. A Second-Order Approximation to the Log-Likelihood Surface for Mixture Models, With Application to the EM Algorithm. J Comput Graph Stat 2004. [DOI: 10.1198/106186004x2570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
149
|
Wang HX, Zhang QB, Luo B, Wei S. Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recognit Lett 2004. [DOI: 10.1016/j.patrec.2004.01.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
150
|
Abstract
In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.
Collapse
Affiliation(s)
- Leo Wang-Kit Cheung
- Epidemiology Section, Cancer Etiology Program, Cancer Research Center of Hawaii, University of Hawaii, Honolulu, HI 96813-2479, USA.
| |
Collapse
|