51
|
Liang F, Jia B, Xue J, Li Q, Luo Y. An imputation-regularized optimization algorithm for high dimensional missing data problems and beyond. J R Stat Soc Series B Stat Methodol 2018; 80:899-926. [PMID: 31130816 PMCID: PMC6533005 DOI: 10.1111/rssb.12279] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Missing data are frequently encountered in high dimensional problems, but they are usually difficult to deal with by using standard algorithms, such as the expectation-maximization algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap: we propose a general algorithm for high dimensional missing data problems. The algorithm works by iterating between an imputation step and a regularized optimization step. At the imputation step, the missing data are imputed conditionally on the observed data and the current estimates of parameters and, at the regularized optimization step, a consistent estimate is found via the regularization approach for the minimizer of a Kullback-Leibler divergence defined on the pseudocomplete data. For high dimensional problems, the consistent estimate can be found under sparsity constraints. The consistency of the averaged estimate for the true parameter can be established under quite general conditions. The algorithm is illustrated by using high dimensional Gaussian graphical models, high dimensional variable selection and a random-coefficient model.
Collapse
Affiliation(s)
| | | | | | - Qizhai Li
- Chinese Academy of Sciences, Beijing, People's Republic of China
| | - Ye Luo
- University of Florida, Gainesville, USA
| |
Collapse
|
52
|
Chan J, Leon-Gonzalez R, Strachan RW. Invariant Inference and Efficient Computation in the Static Factor Model. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1287080] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Joshua Chan
- Centre for Applied Macroeconomic Analysis, University of Technology Sydney, Sydney, NSW, Australia
| | - Roberto Leon-Gonzalez
- National Graduate Institute for Policy Studies, Tokyo, Japan
- Rimini Center for Economic Analysis, Rimini, Italy
| | - Rodney W. Strachan
- Rimini Center for Economic Analysis, Rimini, Italy
- Centre for Applied Macroeconomic Analysis, Canberra, ACT, Australia
- University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
53
|
Guillaume B, Wang C, Poh J, Shen MJ, Ong ML, Tan PF, Karnani N, Meaney M, Qiu A. Improving mass-univariate analysis of neuroimaging data by modelling important unknown covariates: Application to Epigenome-Wide Association Studies. Neuroimage 2018; 173:57-71. [PMID: 29448075 DOI: 10.1016/j.neuroimage.2018.01.073] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 01/03/2018] [Accepted: 01/28/2018] [Indexed: 10/18/2022] Open
Abstract
Statistical inference on neuroimaging data is often conducted using a mass-univariate model, equivalent to fitting a linear model at every voxel with a known set of covariates. Due to the large number of linear models, it is challenging to check if the selection of covariates is appropriate and to modify this selection adequately. The use of standard diagnostics, such as residual plotting, is clearly not practical for neuroimaging data. However, the selection of covariates is crucial for linear regression to ensure valid statistical inference. In particular, the mean model of regression needs to be reasonably well specified. Unfortunately, this issue is often overlooked in the field of neuroimaging. This study aims to adopt the existing Confounder Adjusted Testing and Estimation (CATE) approach and to extend it for use with neuroimaging data. We propose a modification of CATE that can yield valid statistical inferences using Principal Component Analysis (PCA) estimators instead of Maximum Likelihood (ML) estimators. We then propose a non-parametric hypothesis testing procedure that can improve upon parametric testing. Monte Carlo simulations show that the modification of CATE allows for more accurate modelling of neuroimaging data and can in turn yield a better control of False Positive Rate (FPR) and Family-Wise Error Rate (FWER). We demonstrate its application to an Epigenome-Wide Association Study (EWAS) on neonatal brain imaging and umbilical cord DNA methylation data obtained as part of a longitudinal cohort study. Software for this CATE study is freely available at http://www.bioeng.nus.edu.sg/cfa/Imaging_Genetics2.html.
Collapse
Affiliation(s)
- Bryan Guillaume
- Department of Biomedical Engineering, National University of Singapore, Singapore
| | - Changqing Wang
- Department of Biomedical Engineering, National University of Singapore, Singapore
| | - Joann Poh
- Department of Biomedical Engineering, National University of Singapore, Singapore; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Mo Jun Shen
- Department of Biomedical Engineering, National University of Singapore, Singapore; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Mei Lyn Ong
- Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Pei Fang Tan
- Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Neerja Karnani
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 119228, Singapore; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Michael Meaney
- Ludmer Centre for Neuroinformatics and Mental Health, Douglas Mental Health University Institute, McGill University, Canada; Sackler Program for Epigenetics and Psychobiology at McGill University, Canada; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore
| | - Anqi Qiu
- Department of Biomedical Engineering, National University of Singapore, Singapore; Clinical Imaging Research Centre, National University of Singapore, Singapore; Singapore Institute for Clinical Sciences, Agency for Science, Technology, and Research, Singapore.
| |
Collapse
|
54
|
Diffey SM, Smith AB, Welsh AH, Cullis BR. A new REML (parameter expanded) EM algorithm for linear mixed models. AUST NZ J STAT 2017. [DOI: 10.1111/anzs.12208] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- S. M. Diffey
- Centre for Bioinformatics and Biometrics; University of Wollongong; Wollongong NSW 2522 Australia
| | - A. B. Smith
- Centre for Bioinformatics and Biometrics; University of Wollongong; Wollongong NSW 2522 Australia
| | - A. H. Welsh
- Mathematical Sciences Institute; Australian National University; Canberra ACT 0200 Australia
| | - B. R. Cullis
- Centre for Bioinformatics and Biometrics; University of Wollongong; Wollongong NSW 2522 Australia
| |
Collapse
|
55
|
Kurban H, Jenne M, Dalkilic MM. Using data to build a better EM: EM* for big data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2017. [DOI: 10.1007/s41060-017-0062-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
56
|
|
57
|
Hughes RA, Kenward MG, Sterne JAC, Tilling K. Estimation of the linear mixed integrated Ornstein-Uhlenbeck model. J STAT COMPUT SIM 2017; 87:1541-1558. [PMID: 28515536 PMCID: PMC5407356 DOI: 10.1080/00949655.2016.1277425] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2015] [Accepted: 12/25/2016] [Indexed: 12/01/2022]
Abstract
The linear mixed model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed IOU model) allows for serial correlation and estimation of the degree of derivative tracking. It is rarely used, partly due to the lack of available software. We implemented the linear mixed IOU model in Stata and using simulations we assessed the feasibility of fitting the model by restricted maximum likelihood when applied to balanced and unbalanced data. We compared different (1) optimization algorithms, (2) parameterizations of the IOU process, (3) data structures and (4) random-effects structures. Fitting the model was practical and feasible when applied to large and moderately sized balanced datasets (20,000 and 500 observations), and large unbalanced datasets with (non-informative) dropout and intermittent missingness. Analysis of a real dataset showed that the linear mixed IOU model was a better fit to the data than the standard linear mixed model (i.e. independent within-subject errors with constant variance).
Collapse
Affiliation(s)
- Rachael A Hughes
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Michael G Kenward
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
| | | | - Kate Tilling
- School of Social and Community Medicine, University of Bristol, Bristol, UK
| |
Collapse
|
58
|
Lofland CL, Rodríguez A, Moser S. Assessing differences in legislators’ revealed preferences: A case study on the 107th U.S. Senate. Ann Appl Stat 2017. [DOI: 10.1214/16-aoas951] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
59
|
Affiliation(s)
- Veronika Ročková
- Booth School of Business, University of Chicago, Chicago, IL, USA
- The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Edward I. George
- The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
60
|
Montano D. Multivariate hierarchical Bayesian models and choice of priors in the analysis of survey data. J Appl Stat 2016. [DOI: 10.1080/02664763.2016.1267120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
61
|
Grigorova D, Gueorguieva R. Correlated probit analysis of repeatedly measured ordinal and continuous outcomes with application to the Health and Retirement Study. Stat Med 2016; 35:4202-25. [PMID: 27222058 DOI: 10.1002/sim.6982] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Revised: 03/29/2016] [Accepted: 04/17/2016] [Indexed: 12/18/2022]
Abstract
The Health and Retirement Study was designed to evaluate changes in health and labor force participation during and after the transition from working to retirement. Every 2 years, participants provided information about their self-rated health (SRH), body mass index (BMI), smoking status, and other characteristics. Our goal was to assess the effects of smoking and gender on trajectories of change in BMI and SRH over time. Joint longitudinal analysis of outcome measures is preferable to separate analyses because it allows to account for the correlation between the measures, to test the effects of predictors while controlling type I error, and potentially to improve efficiency. However, because SRH is an ordinal measure while BMI is continuous, formulating a joint model and parameter estimation is challenging. A joint correlated probit model allowed us to seamlessly account for the correlations between the measures over time. Established estimating procedures for such models are based on quasi-likelihood or numerical approximations that may be biased or fail to converge. Therefore, we proposed a novel expectation-maximization algorithm for parameter estimation and a Monte Carlo bootstrap approach for standard errors approximation. Expectation-maximization algorithms have been previously considered for combinations of binary and/or continuous repeated measures; however, modifications were needed to handle combinations of ordinal and continuous responses. A simulation study demonstrated that the algorithm converged and provided approximately unbiased estimates with sufficiently large sample sizes. In the Health and Retirement Study, male gender and smoking were independently associated with steeper deterioration in self-rated health and with lower average BMI. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- D Grigorova
- Department of Probability, Operational Research and Statistics, Faculty of Mathematics and Informatics, Sofia University 'St. Kliment Ohridski', 5 James Bourchier Blvd., 1164 Sofia, Bulgaria
| | - R Gueorguieva
- Department of Biostatistics, School of Public Health, Yale University, 60 College St, New Haven, CT 06520, U.S.A
| |
Collapse
|
62
|
Jiang Z, Ding P. Robust modeling using non-elliptically contoured multivariate t distributions. J Stat Plan Inference 2016. [DOI: 10.1016/j.jspi.2016.04.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
63
|
Abstract
Joint modelling of location and scale parameters has generally been confined to exponential families. In this paper the location and scale parameters of the t distribution are allowed to depend on covariates. The closed form of the likelihood allows inference to proceed in a similar fashion to the Gaussian location and scale model and provides a framework for a simple scoring algorithm to estimate the parameters. The algorithm includes a procedure to estimate the degrees of freedom parameter of the t distribution. Homogeneity and asymptotic tests are discussed and a methodology is derived to detect heteroscedasticity when the response is t distributed. Simulations reveal considerable bias in the estimates of the degrees of freedom parameter and only minor bias in the estimated fixed effects associated with the scale parameter. In comparison, the estimated location effects are well behaved. To illustrate the joint modelling of location and scale parameters of the t distribution the methodology is applied to two data sets.
Collapse
Affiliation(s)
- Julian Taylor
- BiometricsSA, The University of Adelaide and South Australian Research
and Development Institute, Australia,
| | - Arūnas Verbyla
- BiometricsSA, The University of Adelaide and South Australian Research
and Development Institute, Australia
| |
Collapse
|
64
|
Regier MD, Moodie EEM. The Orthogonally Partitioned EM Algorithm: Extending the EM Algorithm for Algorithmic Stability and Bias Correction Due to Imperfect Data. Int J Biostat 2016; 12:65-77. [DOI: 10.1515/ijb-2015-0016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.
Collapse
|
65
|
El Leithy HA, Abdel Wahed ZA, Abdallah MS. On non-negative estimation of variance components in mixed linear models. J Adv Res 2016; 7:59-68. [PMID: 26843971 PMCID: PMC4703422 DOI: 10.1016/j.jare.2015.02.001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Revised: 02/04/2015] [Accepted: 02/12/2015] [Indexed: 10/30/2022] Open
Abstract
Alternative estimators have been derived for estimating the variance components according to Iterative Almost Unbiased Estimation (IAUE). As a result two modified IAUEs are introduced. The relative performances of the proposed estimators and other estimators are studied by simulating their bias, Mean Square Error and the probability of getting negative estimates under unbalanced nested-factorial model with two fixed crossed factorial and one nested random factor. Finally the Empirical Quantile Dispersion Graph (EQDG), which provides a comprehensive picture of the quality of estimation, is depicted corresponding to all the studied methods.
Collapse
Affiliation(s)
- Heba A El Leithy
- Statistical Department, Faculty of Political and Economic Sciences, Cairo University, Egypt
| | - Zakaria A Abdel Wahed
- Statistical Department, Faculty of Political and Economic Sciences, Cairo University, Egypt
| | - Mohamed S Abdallah
- Statistical Department, Faculty of Political and Economic Sciences, Cairo University, Egypt
| |
Collapse
|
66
|
Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2015.09.025] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
67
|
Frumento P, Mealli F, Pacini B, Rubin DB. The fragility of standard inferential approaches in principal stratification models relative to direct likelihood approaches. Stat Anal Data Min 2015. [DOI: 10.1002/sam.11299] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Paolo Frumento
- Institute of Environmental Medicine (IMM), Karolinska Institutet; Stockolm Sweden
| | - Fabrizia Mealli
- Department of Statistics, Informatics, Applications; University of Florence; Florence Italy
| | - Barbara Pacini
- Department of Political Science; University of Pisa; Via Serafini 3 Pisa 56126 Italy
| | | |
Collapse
|
68
|
Ala-Luhtala J, Piché R. Gaussian Scale Mixture Models for Robust Linear Multivariate Regression with Missing Data. COMMUN STAT-SIMUL C 2015. [DOI: 10.1080/03610918.2013.875565] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Juha Ala-Luhtala
- Department of MathematicsTampere University of Technology, Tampere, Finland
| | - Robert Piché
- Department of Automation Science and EngineeringTampere University of Technology, Tampere, Finland
| |
Collapse
|
69
|
Data augmentation and parameter expansion for independent or spatially correlated ordinal data. Comput Stat Data Anal 2015. [DOI: 10.1016/j.csda.2015.03.020] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
70
|
Rodríguez A, Moser S. Measuring and accounting for strategic abstentions in the US Senate, 1989-2012. J R Stat Soc Ser C Appl Stat 2015. [DOI: 10.1111/rssc.12099] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
71
|
Wang WL, Lin TI. Bayesian analysis of multivariatetlinear mixed models with missing responses at random. J STAT COMPUT SIM 2014. [DOI: 10.1080/00949655.2014.989852] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
72
|
|
73
|
Lee KE, Kim Y, Xu R. Bayesian Variable Selection under the Proportional Hazards Mixed-effects Model. Comput Stat Data Anal 2014; 75:53-65. [PMID: 24795490 PMCID: PMC4005803 DOI: 10.1016/j.csda.2014.02.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Over the past decade much statistical research has been carried out to develop models for correlated survival data; however, methods for model selection are still very limited. A stochastic search variable selection (SSVS) approach under the proportional hazards mixed-effects model (PHMM) is developed. The SSVS method has previously been applied to linear and generalized linear mixed models, and to the proportional hazards model with high dimensional data. Because the method has mainly been developed for hierarchical normal mixture distributions, it operates on the linear predictor under the Cox type models. The PHMM naturally incorporates the normal distribution via the random effects, which enables SSVS to efficiently search through the candidate variable space. The approach was evaluated through simulation, and applied to a multi-center lung cancer clinical trial data set, for which the variable selection problem was previously debated upon in the literature.
Collapse
Affiliation(s)
- Kyeong Eun Lee
- Department of Statistics, Kyungpook National University, Daegu, 702-701, Korea
| | - Yongku Kim
- Department of Statistics, Kyungpook National University, Daegu, 702-701, Korea
| | - Ronghui Xu
- Division of Biostatistics and Bioinformatics, Department of Family and Preventive Medicine and Department of Mathematics, University of California, San Diego, USA
| |
Collapse
|
74
|
Chen SC(G, Lindsay B. Improving mixture tree construction using better EM algorithms. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.11.010] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
75
|
|
76
|
Zhao J, Shi L. Automated learning of factor analysis with complete and incomplete data. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.11.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
77
|
Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Comput Stat Data Anal 2014. [DOI: 10.1016/j.csda.2013.02.020] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
78
|
|
79
|
Maximum likelihood estimation for second level fMRI data analysis with expectation trust region algorithm. Magn Reson Imaging 2013; 32:132-49. [PMID: 24321307 DOI: 10.1016/j.mri.2013.10.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2012] [Revised: 08/06/2013] [Accepted: 10/11/2013] [Indexed: 11/24/2022]
Abstract
The trust region method which originated from the Levenberg-Marquardt (LM) algorithm for mixed effect model estimation are considered in the context of second level functional magnetic resonance imaging (fMRI) data analysis. We first present the mathematical and optimization details of the method for the mixed effect model analysis, then we compare the proposed methods with the conventional expectation-maximization (EM) algorithm based on a series of datasets (synthetic and real human fMRI datasets). From simulation studies, we found a higher damping factor for the LM algorithm is better than lower damping factor for the fMRI data analysis. More importantly, in most cases, the expectation trust region algorithm is superior to the EM algorithm in terms of accuracy if the random effect variance is large. We also compare these algorithms on real human datasets which comprise repeated measures of fMRI in phased-encoded and random block experiment designs. We observed that the proposed method is faster in computation and robust to Gaussian noise for the fMRI analysis. The advantages and limitations of the suggested methods are discussed.
Collapse
|
80
|
Hether TD, Hohenlohe PA. Genetic regulatory network motifs constrain adaptation through curvature in the landscape of mutational (co)variance. Evolution 2013; 68:950-64. [PMID: 24219635 DOI: 10.1111/evo.12313] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2013] [Accepted: 10/29/2013] [Indexed: 01/02/2023]
Abstract
Systems biology is accumulating a wealth of understanding about the structure of genetic regulatory networks, leading to a more complete picture of the complex genotype-phenotype relationship. However, models of multivariate phenotypic evolution based on quantitative genetics have largely not incorporated a network-based view of genetic variation. Here we model a set of two-node, two-phenotype genetic network motifs, covering a full range of regulatory interactions. We find that network interactions result in different patterns of mutational (co)variance at the phenotypic level (the M-matrix), not only across network motifs but also across phenotypic space within single motifs. This effect is due almost entirely to mutational input of additive genetic (co)variance. Variation in M has the effect of stretching and bending phenotypic space with respect to evolvability, analogous to the curvature of space-time under general relativity, and similar mathematical tools may apply in each case. We explored the consequences of curvature in mutational variation by simulating adaptation under divergent selection with gene flow. Both standing genetic variation (the G-matrix) and rate of adaptation are constrained by M, so that G and adaptive trajectories are curved across phenotypic space. Under weak selection the phenotypic mean at migration-selection balance also depends on M.
Collapse
Affiliation(s)
- Tyler D Hether
- Department of Biological Sciences and Institute for Bioinformatics and Evolutionary Studies, University of Idaho, Moscow, Idaho, 83844-3051
| | | |
Collapse
|
81
|
Gruhl J, Erosheva EA, Crane PK. A semiparametric approach to mixed outcome latent variable models: Estimating the association between cognition and regional brain volumes. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas675] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
82
|
|
83
|
Rundel CW, Wunder MB, Alvarado AH, Ruegg KC, Harrigan R, Schuh A, Kelly JF, Siegel RB, DeSante DF, Smith TB, Novembre J. Novel statistical methods for integrating genetic and stable isotope data to infer individual-level migratory connectivity. Mol Ecol 2013; 22:4163-4176. [DOI: 10.1111/mec.12393] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Revised: 04/27/2013] [Accepted: 05/03/2013] [Indexed: 12/01/2022]
Affiliation(s)
- Colin W. Rundel
- Department of Statistical Sciences; Duke University; Durham NC 27708 USA
- Department of Statistics; University of California, Los Angeles; Los Angeles CA 90095 USA
| | - Michael B. Wunder
- Department of Integrative Biology; University of Colorado, Denver; Denver CO 80217 USA
| | - Allison H. Alvarado
- Center for Tropical Research Institute of the Environment and Sustainability; University of California, Los Angeles; Los Angeles CA 90095 USA
- Department of Ecology and Evolutionary Biology; University of California, Los Angeles; Los Angeles CA 90095 USA
| | - Kristen C. Ruegg
- Department of Integrative Biology; University of Colorado, Denver; Denver CO 80217 USA
- Department of Ecology and Evolutionary Biology; University of California, Santa Cruz; Santa Cruz CA 95064 USA
| | - Ryan Harrigan
- Center for Tropical Research Institute of the Environment and Sustainability; University of California, Los Angeles; Los Angeles CA 90095 USA
| | - Andrew Schuh
- Cooperative Institute for Research in the Atmosphere (CIRA); Fort Collins CO 80523 USA
| | - Jeffrey F. Kelly
- Oklahoma Biological Survey and Department of Biology; Ecology and Evolutionary Biology Program; University of Oklahoma; Norman OK 73019 USA
| | - Rodney B. Siegel
- The Institute for Bird Populations; Point Reyes Station CA 94956 USA
| | - David F. DeSante
- The Institute for Bird Populations; Point Reyes Station CA 94956 USA
| | - Thomas B. Smith
- Center for Tropical Research Institute of the Environment and Sustainability; University of California, Los Angeles; Los Angeles CA 90095 USA
- Department of Ecology and Evolutionary Biology; University of California, Los Angeles; Los Angeles CA 90095 USA
| | - John Novembre
- Department of Ecology and Evolutionary Biology; University of California, Los Angeles; Los Angeles CA 90095 USA
- Department of Human Genetics; Chicago IL 60637 USA
| |
Collapse
|
84
|
Solgi R, Mira A. A Bayesian Semiparametric Multiplicative Error Model With an Application to Realized Volatility. J Comput Graph Stat 2013. [DOI: 10.1080/10618600.2013.810151] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
85
|
An X, Yang Q, Bentler PM. A latent factor linear mixed model for high-dimensional longitudinal data analysis. Stat Med 2013; 32:4229-39. [PMID: 23640746 DOI: 10.1002/sim.5825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2012] [Revised: 02/17/2013] [Accepted: 03/21/2013] [Indexed: 11/07/2022]
Abstract
High-dimensional longitudinal data involving latent variables such as depression and anxiety that cannot be quantified directly are often encountered in biomedical and social sciences. Multiple responses are used to characterize these latent quantities, and repeated measures are collected to capture their trends over time. Furthermore, substantive research questions may concern issues such as interrelated trends among latent variables that can only be addressed by modeling them jointly. Although statistical analysis of univariate longitudinal data has been well developed, methods for modeling multivariate high-dimensional longitudinal data are still under development. In this paper, we propose a latent factor linear mixed model (LFLMM) for analyzing this type of data. This model is a combination of the factor analysis and multivariate linear mixed models. Under this modeling framework, we reduced the high-dimensional responses to low-dimensional latent factors by the factor analysis model, and then we used the multivariate linear mixed model to study the longitudinal trends of these latent factors. We developed an expectation-maximization algorithm to estimate the model. We used simulation studies to investigate the computational properties of the expectation-maximization algorithm and compare the LFLMM model with other approaches for high-dimensional longitudinal data analysis. We used a real data example to illustrate the practical usefulness of the model.
Collapse
|
86
|
Azevedo CLN, Andrade DF. CADEM: A conditional augmented data EM algorithm for fitting one parameter probit models. BRAZ J PROBAB STAT 2013. [DOI: 10.1214/11-bjps172] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
87
|
References. Comput Stat 2013. [DOI: 10.1002/9781118555552.refs] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
88
|
|
89
|
Abstract
The item factor analysis model for investigating multidimensional latent spaces has proved to be useful. Parameter estimation in this model requires computationally demanding high-dimensional integrations. While several approaches to approximate such integrations have been proposed, they suffer various computational difficulties. This paper proposes a Nesting Monte Carlo Expectation-Maximization (MCEM) algorithm for item factor analysis with binary data. Simulation studies and a real data example suggest that the Nesting MCEM approach can significantly improve computational efficiency while also enjoying the good properties of stable convergence and easy implementation.
Collapse
Affiliation(s)
- Xinming An
- Department of Psychology, University of California, 1285 Franz Hall, Box 951563, Los Angeles, CA, USA
| | | |
Collapse
|
90
|
Yang M. Bayesian nonparametric centered random effects models with variable selection. Biom J 2013; 55:217-30. [PMID: 23322356 DOI: 10.1002/bimj.201100149] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2011] [Revised: 11/19/2012] [Accepted: 11/30/2012] [Indexed: 11/08/2022]
Abstract
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject-specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross-country and interlaboratory rodent uterotrophic bioassay.
Collapse
Affiliation(s)
- Mingan Yang
- Department of Mathematics, Central Michigan University, Mt. Pleasant, MI 48859, USA.
| |
Collapse
|
91
|
On the statistical interpretation of site-specific variables in phylogeny-based substitution models. Genetics 2012; 193:557-64. [PMID: 23222651 DOI: 10.1534/genetics.112.145722] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Phylogeny-based modeling of heterogeneity across the positions of multiple-sequence alignments has generally been approached from two main perspectives. The first treats site specificities as random variables drawn from a statistical law, and the likelihood function takes the form of an integral over this law. The second assigns distinct variables to each position, and, in a maximum-likelihood context, adjusts these variables, along with global parameters, to optimize a joint likelihood function. Here, it is emphasized that while the first approach directly enjoys the statistical guaranties of traditional likelihood theory, the latter does not, and should be approached with particular caution when the site-specific variables are high dimensional. Using a phylogeny-based mutation-selection framework, it is shown that the difference in interpretation of site-specific variables explains the incongruities in recent studies regarding distributions of selection coefficients.
Collapse
|
92
|
Berlinet A, Roland C. Acceleration of the EM algorithm: P-EM versus epsilon algorithm. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2012.03.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
93
|
Bayesian variable selection for logistic mixed model with nonparametric random effects. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.12.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
94
|
Wu SH, Black MA, North RA, Rodrigo AG. A Bayesian model for classifying all differentially expressed proteins simultaneously in 2D PAGE gels. BMC Bioinformatics 2012; 13:137. [PMID: 22712439 PMCID: PMC3505467 DOI: 10.1186/1471-2105-13-137] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 05/30/2012] [Indexed: 11/23/2022] Open
Abstract
Background Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is commonly used to identify differentially expressed proteins under two or more experimental or observational conditions. Wu et al (2009) developed a univariate probabilistic model which was used to identify differential expression between Case and Control groups, by applying a Likelihood Ratio Test (LRT) to each protein on a 2D PAGE. In contrast to commonly used statistical approaches, this model takes into account the two possible causes of missing values in 2D PAGE: either (1) the non-expression of a protein; or (2) a level of expression that falls below the limit of detection. Results We develop a global Bayesian model which extends the previously described model. Unlike the univariate approach, the model reported here is able treat all differentially expressed proteins simultaneously. Whereas each protein is modelled by the univariate likelihood function previously described, several global distributions are used to model the underlying relationship between the parameters associated with individual proteins. These global distributions are able to combine information from each protein to give more accurate estimates of the true parameters. In our implementation of the procedure, all parameters are recovered by Markov chain Monte Carlo (MCMC) integration. The 95% highest posterior density (HPD) intervals for the marginal posterior distributions are used to determine whether differences in protein expression are due to differences in mean expression intensities, and/or differences in the probabilities of expression. Conclusions Simulation analyses showed that the global model is able to accurately recover the underlying global distributions, and identify more differentially expressed proteins than the simple application of a LRT. Additionally, simulations also indicate that the probability of incorrectly identifying a protein as differentially expressed (i.e., the False Discovery Rate) is very low. The source code is available at https://github.com/stevenhwu/BIDE-2D.
Collapse
Affiliation(s)
- Steven H Wu
- Bioinformatics Institute, University of Auckland, Private Bag, 92019, Auckland, New Zealand.
| | | | | | | |
Collapse
|
95
|
|
96
|
Yang Y, Longini IM, Halloran ME, Obenchain V. A hybrid EM and Monte Carlo EM algorithm and its application to analysis of transmission of infectious diseases. Biometrics 2012; 68:1238-49. [PMID: 22506893 DOI: 10.1111/j.1541-0420.2012.01757.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In epidemics of infectious diseases such as influenza, an individual may have one of four possible final states: prior immune, escaped from infection, infected with symptoms, and infected asymptomatically. The exact state is often not observed. In addition, the unobserved transmission times of asymptomatic infections further complicate analysis. Under the assumption of missing at random, data-augmentation techniques can be used to integrate out such uncertainties. We adapt an importance-sampling-based Monte Carlo Expectation-Maximization (MCEM) algorithm to the setting of an infectious disease transmitted in close contact groups. Assuming the independence between close contact groups, we propose a hybrid EM-MCEM algorithm that applies the MCEM or the traditional EM algorithms to each close contact group depending on the dimension of missing data in that group, and discuss the variance estimation for this practice. In addition, we propose a bootstrap approach to assess the total Monte Carlo error and factor that error into the variance estimation. The proposed methods are evaluated using simulation studies. We use the hybrid EM-MCEM algorithm to analyze two influenza epidemics in the late 1970s to assess the effects of age and preseason antibody levels on the transmissibility and pathogenicity of the viruses.
Collapse
Affiliation(s)
- Yang Yang
- Department of Biostatistics and Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA.
| | | | | | | |
Collapse
|
97
|
Tian GL, Tang ML, Liu C. Accelerating the quadratic lower-bound algorithm via optimizing the shrinkage parameter. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.07.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
98
|
An X, Bentler PM. Efficient direct sampling MCEM algorithm for latent variable models with binary responses. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.06.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
99
|
He Y, Liu C. The dynamic ‘expectation-conditional maximization either’ algorithm. J R Stat Soc Series B Stat Methodol 2012. [DOI: 10.1111/j.1467-9868.2011.01013.x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
100
|
Chrétien S, Hero A, Perdry H. Space alternating penalized Kullback proximal point algorithms for maximizing likelihood with nondifferentiable penalty. ANN I STAT MATH 2011. [DOI: 10.1007/s10463-011-0333-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|