1
|
Wang K, Xu Y. Bayesian tensor-on-tensor regression with efficient computation. STATISTICS AND ITS INTERFACE 2024; 17:199-217. [PMID: 38469276 PMCID: PMC10927259 DOI: 10.4310/23-sii786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/13/2024]
Abstract
We propose a Bayesian tensor-on-tensor regression approach to predict a multidimensional array (tensor) of arbitrary dimensions from another tensor of arbitrary dimensions, building upon the Tucker decomposition of the regression coefficient tensor. Traditional tensor regression methods making use of the Tucker decomposition either assume the dimension of the core tensor to be known or estimate it via cross-validation or some model selection criteria. However, no existing method can simultaneously estimate the model dimension (the dimension of the core tensor) and other model parameters. To fill this gap, we develop an efficient Markov Chain Monte Carlo (MCMC) algorithm to estimate both the model dimension and parameters for posterior inference. Besides the MCMC sampler, we also develop an ultra-fast optimization-based computing algorithm wherein the maximum a posteriori estimators for parameters are computed, and the model dimension is optimized via a simulated annealing algorithm. The proposed Bayesian framework provides a natural way for uncertainty quantification. Through extensive simulation studies, we evaluate the proposed Bayesian tensor-on-tensor regression model and show its superior performance compared to alternative methods. We also demonstrate its practical effectiveness by applying it to two real-world datasets, including facial imaging data and 3D motion data.
Collapse
Affiliation(s)
- Kunbo Wang
- 3400 N. Charles Street, Baltimore, MD 21218
| | - Yanxun Xu
- 3400 N. Charles Street, Baltimore, MD 21218
| |
Collapse
|
2
|
Kim J, Sandri BJ, Rao RB, Lock EF. Bayesian predictive modeling of multi-source multi-way data. Comput Stat Data Anal 2023; 186:107783. [PMID: 37274461 PMCID: PMC10237362 DOI: 10.1016/j.csda.2023.107783] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A Bayesian approach to predict a continuous or binary outcome from data that are collected from multiple sources with a multi-way (i.e., multidimensional tensor) structure is described. As a motivating example, molecular data from multiple 'omics sources, each measured over multiple developmental time points, as predictors of early-life iron deficiency (ID) in a rhesus monkey model are considered. The method uses a linear model with a low-rank structure on the coefficients to capture multi-way dependence and model the variance of the coefficients separately across each source to infer their relative contributions. Conjugate priors facilitate an efficient Gibbs sampling algorithm for posterior inference, assuming a continuous outcome with normal errors or a binary outcome with a probit link. Simulations demonstrate that the model performs as expected in terms of misclassification rates and correlation of estimated coefficients with true coefficients, with large gains in performance by incorporating multi-way structure and modest gains when accounting for differing signal sizes across the different sources. Moreover, it provides robust classification of ID monkeys for the motivating application.
Collapse
Affiliation(s)
- Jonathan Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| | - Brian J. Sandri
- Division of Neonatology, Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA
| | - Raghavendra B. Rao
- Division of Neonatology, Department of Pediatrics, University of Minnesota, Minneapolis, MN, USA
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, MN, USA
| | - Eric F. Lock
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455, USA
| |
Collapse
|
3
|
Zhang Y, Zhang X, Zhang H, Liu A, Liu CC. Low-rank latent matrix-factor prediction modeling for generalized high-dimensional matrix-variate regression. Stat Med 2023; 42:3616-3635. [PMID: 37314066 DOI: 10.1002/sim.9821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/19/2023] [Accepted: 06/01/2023] [Indexed: 06/15/2023]
Abstract
Motivated by diagnosing the COVID-19 disease using two-dimensional (2D) image biomarkers from computed tomography (CT) scans, we propose a novel latent matrix-factor regression model to predict responses that may come from an exponential distribution family, where covariates include high-dimensional matrix-variate biomarkers. A latent generalized matrix regression (LaGMaR) is formulated, where the latent predictor is a low-dimensional matrix factor score extracted from the low-rank signal of the matrix variate through a cutting-edge matrix factor model. Unlike the general spirit of penalizing vectorization plus the necessity of tuning parameters in the literature, instead, our prediction modeling in LaGMaR conducts dimension reduction that respects the geometric characteristic of intrinsic 2D structure of the matrix covariate and thus avoids iteration. This greatly relieves the computation burden, and meanwhile maintains structural information so that the latent matrix factor feature can perfectly replace the intractable matrix-variate owing to high-dimensionality. The estimation procedure of LaGMaR is subtly derived by transforming the bilinear form matrix factor model onto a high-dimensional vector factor model, so that the method of principle components can be applied. We establish bilinear-form consistency of the estimated matrix coefficient of the latent predictor and consistency of prediction. The proposed approach can be implemented conveniently. Through simulation experiments, the prediction capability of LaGMaR is shown to outperform some existing penalized methods under diverse scenarios of generalized matrix regressions. Through the application to a real COVID-19 dataset, the proposed approach is shown to predict efficiently the COVID-19.
Collapse
Affiliation(s)
- Yuzhe Zhang
- School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Xu Zhang
- School of Mathematical Sciences, South China Normal University, Guangzhou, Guangdong, China
| | - Hong Zhang
- School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Aiyi Liu
- National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, USA
| | - Catherine C Liu
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR
| |
Collapse
|
4
|
Chakraborty D, Zhuang Z, Xue H, Fiecas MB, Shen X, Pan W. Deep Learning-Based Feature Extraction with MRI Data in Neuroimaging Genetics for Alzheimer's Disease. Genes (Basel) 2023; 14:626. [PMID: 36980898 PMCID: PMC10047952 DOI: 10.3390/genes14030626] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 02/27/2023] [Accepted: 02/27/2023] [Indexed: 03/06/2023] Open
Abstract
The prognosis and treatment of patients suffering from Alzheimer's disease (AD) have been among the most important and challenging problems over the last few decades. To better understand the mechanism of AD, it is of great interest to identify genetic variants associated with brain atrophy. Commonly, in these analyses, neuroimaging features are extracted based on one of many possible brain atlases with FreeSurf and other popular software; this, however, may cause the loss of important information due to our incomplete knowledge about brain function embedded in these suboptimal atlases. To address the issue, we propose convolutional neural network (CNN) models applied to three-dimensional MRI data for the whole brain or multiple, divided brain regions to perform completely data-driven and automatic feature extraction. These image-derived features are then used as endophenotypes in genome-wide association studies (GWASs) to identify associated genetic variants. When we applied this method to ADNI data, we identified several associated SNPs that have been previously shown to be related to several neurodegenerative/mental disorders, such as AD, depression, and schizophrenia.
Collapse
Affiliation(s)
- Dipnil Chakraborty
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Zhong Zhuang
- Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Haoran Xue
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Mark B. Fiecas
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xiatong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
5
|
Kang K, Song X. Joint Modeling of Longitudinal Imaging and Survival Data. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2102027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Kai Kang
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| | - Xinyuan Song
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
6
|
Zhou Y, He K. An improved tensor regression model via location smoothing. Stat (Int Stat Inst) 2021. [DOI: 10.1002/sta4.377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ya Zhou
- Center for Applied Statistics and Institute of Statistics and Big Data Renmin University of China Beijing China
| | - Kejun He
- Center for Applied Statistics and Institute of Statistics and Big Data Renmin University of China Beijing China
| |
Collapse
|
7
|
Tang X, Bi X, Qu A. Individualized Multilayer Tensor Learning With an Application in Imaging Analysis. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1585254] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Xiwei Tang
- Department of Statistics, University of Virginia, Charlottesville, VA
| | - Xuan Bi
- Department of Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, MN
| | - Annie Qu
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| |
Collapse
|
8
|
Bi X, Qu A, Shen X. Multilayer tensor factorization with applications to recommender systems. Ann Stat 2018. [DOI: 10.1214/17-aos1659] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
9
|
Happ C, Greven S, Schmid VJ. The impact of model assumptions in scalar-on-image regression. Stat Med 2018; 37:4298-4317. [PMID: 30132932 DOI: 10.1002/sim.7915] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Revised: 06/20/2018] [Accepted: 06/27/2018] [Indexed: 11/11/2022]
Abstract
Complex statistical models such as scalar-on-image regression often require strong assumptions to overcome the issue of nonidentifiability. While in theory, it is well understood that model assumptions can strongly influence the results, this seems to be underappreciated, or played down, in practice. This article gives a systematic overview of the main approaches for scalar-on-image regression with a special focus on their assumptions. We categorize the assumptions and develop measures to quantify the degree to which they are met. The impact of model assumptions and the practical usage of the proposed measures are illustrated in a simulation study and in an application to neuroimaging data. The results show that different assumptions indeed lead to quite different estimates with similar predictive ability, raising the question of their interpretability. We give recommendations for making modeling and interpretation decisions in practice based on the new measures and simulations using hypothetic coefficient images and the observed data.
Collapse
Affiliation(s)
- Clara Happ
- Department of Statistics, LMU Munich, Munich, Germany
| | - Sonja Greven
- Department of Statistics, LMU Munich, Munich, Germany
| | | |
Collapse
|