1
|
Xu T, Chen K, Li G. TENSOR REGRESSION FOR INCOMPLETE OBSERVATIONS WITH APPLICATION TO LONGITUDINAL STUDIES. Ann Appl Stat 2024; 18:1195-1212. [PMID: 39360180 PMCID: PMC11446469 DOI: 10.1214/23-aoas1830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2024]
Abstract
Multivariate longitudinal data are frequently encountered in practice such as in our motivating longitudinal microbiome study. It is of general interest to associate such high-dimensional, longitudinal measures with some univariate continuous outcome. However, incomplete observations are common in a regular study design, as not all samples are measured at every time point, giving rise to the so-called blockwise missing values. Such missing structure imposes significant challenges for association analysis and defies many existing methods that require complete samples. In this paper we propose to represent multivariate longitudinal data as a three-way tensor array (i.e., sample-by-feature-by-time) and exploit a parsimonious scalar-on-tensor regression model for association analysis. We develop a regularized covariance-based estimation procedure that effectively leverages all available observations without imputation. The method achieves variable selection and smooth estimation of time-varying effects. The application to the motivating microbiome study reveals interesting links between the preterm infant's gut microbiome dynamics and their neurodevelopment. Additional numerical studies on synthetic data and a longitudinal aging study further demonstrate the efficacy of the proposed method.
Collapse
Affiliation(s)
| | - Kun Chen
- Department of Statistics, University of Connecticut
| | - Gen Li
- Department of Biostatistics, University of Michigan, Ann Arbor
| |
Collapse
|
5
|
Yang D, Goh G, Wang H. A fully Bayesian approach to sparse reduced-rank multivariate regression. STAT MODEL 2020. [DOI: 10.1177/1471082x20948697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the context of high-dimensional multivariate linear regression, sparse reduced-rank regression (SRRR) provides a way to handle both variable selection and low-rank estimation problems. Although there has been extensive research on SRRR, statistical inference procedures that deal with the uncertainty due to variable selection and rank reduction are still limited. To fill this research gap, we develop a fully Bayesian approach to SRRR. A major difficulty that occurs in a fully Bayesian framework is that the dimension of parameter space varies with the selected variables and the reduced-rank. Due to the varying-dimensional problems, traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampler and Metropolis-Hastings algorithm are inapplicable in our Bayesian framework. To address this issue, we propose a new posterior computation procedure based on the Laplace approximation within the collapsed Gibbs sampler. A key feature of our fully Bayesian method is that the model uncertainty is automatically integrated out by the proposed MCMC computation. The proposed method is examined via simulation study and real data analysis.
Collapse
Affiliation(s)
- Dunfu Yang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Gyuhyeong Goh
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Haiyan Wang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| |
Collapse
|
7
|
Lesot MJ, Vieira S, Reformat MZ, Carvalho JP, Wilbik A, Bouchon-Meunier B, Yager RR. High Dimensional Bayesian Regularization in Regressions Involving Symmetric Tensors. INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS 2020. [PMCID: PMC7274680 DOI: 10.1007/978-3-030-50153-2_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This article develops a regression framework with a symmetric tensor response and vector predictors. The existing literature involving symmetric tensor response and vector predictors proceeds by vectorizing the tensor response to a multivariate vector, thus ignoring the structural information in the tensor. A few recent approaches have proposed novel regression frameworks exploiting the structure of the symmetric tensor and assume symmetric tensor coefficients corresponding to scalar predictors to be low-rank. Although low-rank constraint on coefficient tensors are computationally efficient, they might appear to be restrictive in some real data applications. Motivated by this, we propose a novel class of regularization or shrinkage priors for the symmetric tensor coefficients. Our modeling framework a-priori expresses a symmetric tensor coefficient as sum of low rank and sparse structures, with both these structures being suitably regularized using Bayesian regularization techniques. The proposed framework allows identification of tensor nodes significantly influenced by each scalar predictor. Our framework is implemented using an efficient Markov Chain Monte Carlo algorithm. Empirical results in simulation studies show competitive performance of the proposed approach over its competitors.
Collapse
Affiliation(s)
| | - Susana Vieira
- IDMEC, IST, Universidade de Lisboa, Lisbon, Portugal
| | | | | | - Anna Wilbik
- Eindhoven University of Technology, Eindhoven, The Netherlands
| | | | | |
Collapse
|
8
|
Jiang D, Armour CR, Hu C, Mei M, Tian C, Sharpton TJ, Jiang Y. Microbiome Multi-Omics Network Analysis: Statistical Considerations, Limitations, and Opportunities. Front Genet 2019; 10:995. [PMID: 31781153 PMCID: PMC6857202 DOI: 10.3389/fgene.2019.00995] [Citation(s) in RCA: 87] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 09/18/2019] [Indexed: 12/21/2022] Open
Abstract
The advent of large-scale microbiome studies affords newfound analytical opportunities to understand how these communities of microbes operate and relate to their environment. However, the analytical methodology needed to model microbiome data and integrate them with other data constructs remains nascent. This emergent analytical toolset frequently ports over techniques developed in other multi-omics investigations, especially the growing array of statistical and computational techniques for integrating and representing data through networks. While network analysis has emerged as a powerful approach to modeling microbiome data, oftentimes by integrating these data with other types of omics data to discern their functional linkages, it is not always evident if the statistical details of the approach being applied are consistent with the assumptions of microbiome data or how they impact data interpretation. In this review, we overview some of the most important network methods for integrative analysis, with an emphasis on methods that have been applied or have great potential to be applied to the analysis of multi-omics integration of microbiome data. We compare advantages and disadvantages of various statistical tools, assess their applicability to microbiome data, and discuss their biological interpretability. We also highlight on-going statistical challenges and opportunities for integrative network analysis of microbiome data.
Collapse
Affiliation(s)
- Duo Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Courtney R Armour
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Chenxiao Hu
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Meng Mei
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Chuan Tian
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| | - Thomas J Sharpton
- Department of Statistics, Oregon State University, Corvallis, OR, United States
- Department of Microbiology, Oregon State University, Corvallis, OR, United States
| | - Yuan Jiang
- Department of Statistics, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
9
|
Uematsu Y, Fan Y, Chen K, Lv J, Lin W. SOFAR: Large-Scale Association Network Learning. IEEE TRANSACTIONS ON INFORMATION THEORY 2019; 65:4924-4939. [PMID: 33746241 PMCID: PMC7970712 DOI: 10.1109/tit.2019.2909889] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.
Collapse
Affiliation(s)
- Yoshimasa Uematsu
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Yingying Fan
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Kun Chen
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Jinchi Lv
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Wei Lin
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| |
Collapse
|