1
|
Tuft M, Hall MH, Krafty RT. Spectra in low-rank localized layers (SpeLLL) for interpretable time-frequency analysis. Biometrics 2023; 79:304-318. [PMID: 34609738 PMCID: PMC8980115 DOI: 10.1111/biom.13577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 07/25/2021] [Indexed: 11/26/2022]
Abstract
The time-varying frequency characteristics of many biomedical time series contain important scientific information. However, the high-dimensional nature of the time-varying power spectrum as a surface in time and frequency limits its direct use by applied researchers and clinicians for elucidating complex mechanisms. In this article, we introduce a new approach to time-frequency analysis that decomposes the time-varying power spectrum in to orthogonal rank-one layers in time and frequency to provide a parsimonious representation that illustrates relationships between power at different times and frequencies. The approach can be used in fully nonparametric analyses or in semiparametric analyses that account for exogenous information and time-varying covariates. An estimation procedure is formulated within a penalized reduced-rank regression framework that provides estimates of layers that are interpretable as power localized within time blocks and frequency bands. Empirical properties of the procedure are illustrated in simulation studies and its practical use is demonstrated through an analysis of heart rate variability during sleep.
Collapse
Affiliation(s)
- Marie Tuft
- Statistical Sciences, Sandia National Laboratories, Albuquerque, New Mexico, 87185, U.S.A
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, U.S.A
| | - Martica H. Hall
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, 15213, U.S.A
| | - Robert T. Krafty
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania, 15261, U.S.A
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, 30322, U.S.A
| |
Collapse
|
2
|
Guo W, Balakrishnan N, He M. Envelope-based sparse reduced-rank regression for multivariate linear model. J MULTIVARIATE ANAL 2023. [DOI: 10.1016/j.jmva.2023.105159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
3
|
Wen C, Wang Q, Jiang Y. Stability Approach to Regularization Selection for Reduced-Rank Regression. J Comput Graph Stat 2022; 32:974-984. [PMID: 37810194 PMCID: PMC10554232 DOI: 10.1080/10618600.2022.2119986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 08/22/2022] [Indexed: 10/17/2022]
Abstract
The reduced-rank regression model is a popular model to deal with multivariate response and multiple predictors, and is widely used in biology, chemometrics, econometrics, engineering, and other fields. In the reduced-rank regression modelling, a central objective is to estimate the rank of the coefficient matrix that represents the number of effective latent factors in predicting the multivariate response. Although theoretical results such as rank estimation consistency have been established for various methods, in practice rank determination still relies on information criterion based methods such as AIC and BIC or subsampling based methods such as cross validation. Unfortunately, the theoretical properties of these practical methods are largely unknown. In this paper, we present a novel method called StARS-RRR that selects the tuning parameter and then estimates the rank of the coefficient matrix for reduced-rank regression based on the stability approach. We prove that StARS-RRR achieves rank estimation consistency, i.e., the rank estimated with the tuning parameter selected by StARS-RRR is consistent to the true rank. Through a simulation study, we show that StARS-RRR outperforms other tuning parameter selection methods including AIC, BIC, and cross validation as it provides the most accurate estimated rank. In addition, when applied to a breast cancer dataset, StARS-RRR discovers a reasonable number of genetic pathways that affect the DNA copy number variations and results in a smaller prediction error than the other methods with a random-splitting process.
Collapse
Affiliation(s)
- Canhong Wen
- International Institute of Finance, School of Management, University of Science and Technology of China
| | - Qin Wang
- International Institute of Finance, School of Management, University of Science and Technology of China
| | - Yuan Jiang
- Department of Statistics, Oregon State University
| |
Collapse
|
4
|
Robust Sparse Reduced-Rank Regression with Response Dependency. Symmetry (Basel) 2022. [DOI: 10.3390/sym14081617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In multiple response regression, the reduced rank regression model is an effective method to reduce the number of model parameters and it takes advantage of interrelation among the response variables. To improve the prediction performance of the multiple response regression, a method for the sparse robust reduced rank regression with covariance estimation(Cov-SR4) is proposed, which can carry out variable selection, outlier detection, and covariance estimation simultaneously. The random error term of this model follows a multivariate normal distribution which is a symmetric distribution and the covariance matrix or precision matrix must be a symmetric matrix that reduces the number of parameters. Both the element-wise penalty function and row-wise penalty function can be used to handle different types of outliers. A numerical algorithm with a covariance estimation method is proposed to solve the robust sparse reduced rank regression. We compare our method with three recent reduced rank regression methods in a simulation study and real data analysis. Our method exhibits competitive performance both in prediction error and variable selection accuracy.
Collapse
|
5
|
Hu J, Huang J, Liu X, Liu X. Response Best-subset Selector for Multivariate Regression with High-dimensional Response Variables. Biometrika 2022. [DOI: 10.1093/biomet/asac037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
This article investigates the statistical problem of response-variable selection with high-dimensional response variables and a diverging number of predictor variables with respect to the sample size in the framework of multivariate linear regression. A response best-subset selection model is proposed by introducing a 0–1 selection indictor for each response variable, then a response best-subset selector is developed by introducing a separation parameter and a novel penalized least-squares function. The developed procedure can perform response-variable selection and regression-coefficient estimation simultaneously, and the proposed response best-subset selector has model consistency under mild conditions for both fixed and diverging numbers of predictor variables. Also, consistency and asymptotic normality of regression-coefficient estimators are presented for cases with a fixed dimension, and it is discovered that the Bonferroni test is a special response best-subset selector. Finite-sample simulations show that the response best-subset selector has strong advantages over existing competitors in terms of the Matthews correlation coefficient, a criterion aimed at balancing accuracies for both true and false response variables. An analysis of actual data demonstrates the effectiveness of the response best-subset selector in an application involving the identification of dosage-sensitive genes.
Collapse
Affiliation(s)
- Jianhua Hu
- Shanghai University of Finance and Economics School of Statistics and Management, , Shanghai 200433, China
| | - Jian Huang
- University of Iowa Department of Statistics and Actuarial Science, , Iowa, U.S.A
| | - Xiaoqian Liu
- York University Department of Mathematics and Statistics, , Toronto, Ontario M3J 1P3, Canada
| | - Xu Liu
- Shanghai University of Finance and Economics School of Statistics and Management, , Shanghai 200433, China
| |
Collapse
|
6
|
Mishra AK, Müller CL. Negative binomial factor regression with application to microbiome data analysis. Stat Med 2022; 41:2786-2803. [PMID: 35466418 PMCID: PMC9325477 DOI: 10.1002/sim.9384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 02/28/2022] [Accepted: 03/07/2022] [Indexed: 11/17/2022]
Abstract
The human microbiome provides essential physiological functions and helps maintain host homeostasis via the formation of intricate ecological host‐microbiome relationships. While it is well established that the lifestyle of the host, dietary preferences, demographic background, and health status can influence microbial community composition and dynamics, robust generalizable associations between specific host‐associated factors and specific microbial taxa have remained largely elusive. Here, we propose factor regression models that allow the estimation of structured parsimonious associations between host‐related features and amplicon‐derived microbial taxa. To account for the overdispersed nature of the amplicon sequencing count data, we propose negative binomial reduced rank regression (NB‐RRR) and negative binomial co‐sparse factor regression (NB‐FAR). While NB‐RRR encodes the underlying dependency among the microbial abundances as outcomes and the host‐associated features as predictors through a rank‐constrained coefficient matrix, NB‐FAR uses a sparse singular value decomposition of the coefficient matrix. The latter approach avoids the notoriously difficult joint parameter estimation by extracting sparse unit‐rank components of the coefficient matrix sequentially, effectively delivering interpretable bi‐clusters of taxa and host‐associated factors. To solve the nonconvex optimization problems associated with these factor regression models, we present a novel iterative block‐wise majorization procedure. Extensive simulation studies and an application to the microbial abundance data from the American Gut Project (AGP) demonstrate the efficacy of the proposed procedure. In the AGP data, we identify several factors that strongly link dietary habits and host life style to specific microbial families.
Collapse
Affiliation(s)
- Aditya K. Mishra
- Center for Computational Mathematics, Flatiron Institute Simons Foundation New York New York USA
| | - Christian L. Müller
- Center for Computational Mathematics, Flatiron Institute Simons Foundation New York New York USA
- Department of Statistics LMU München Munich Germany
- Institute of Computational Biology Helmholtz Zentrum München Munich Germany
| |
Collapse
|
7
|
Tan KM, Sun Q, Witten D. Sparse Reduced Rank Huber Regression in High Dimensions. J Am Stat Assoc 2022; 118:2383-2393. [PMID: 38283734 PMCID: PMC10812838 DOI: 10.1080/01621459.2022.2050243] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Accepted: 02/04/2022] [Indexed: 10/18/2022]
Abstract
We propose a sparse reduced rank Huber regression for analyzing large and complex high-dimensional data with heavy-tailed random noise. The proposed method is based on a convex relaxation of a rank- and sparsity-constrained nonconvex optimization problem, which is then solved using a block coordinate descent and an alternating direction method of multipliers algorithm. We establish nonasymptotic estimation error bounds under both Frobenius and nuclear norms in the high-dimensional setting. This is a major contribution over existing results in reduced rank regression, which mainly focus on rank selection and prediction consistency. Our theoretical results quantify the tradeoff between heavy-tailedness of the random noise and statistical bias. For random noise with bounded ( 1 + δ ) th moment with δ ∈ ( 0 , 1 ) , the rate of convergence is a function of δ , and is slower than the sub-Gaussian-type deviation bounds; for random noise with bounded second moment, we obtain a rate of convergence as if sub-Gaussian noise were assumed. We illustrate the performance of the proposed method via extensive numerical studies and a data application. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Kean Ming Tan
- Department of Statistics, University of Michigan, Ann Arbor, MI
| | - Qiang Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Daniela Witten
- Departments of Statistics and Biostatistics, University of Washington, Seattle, WA
| |
Collapse
|
8
|
Sparse reduced-rank regression for simultaneous rank and variable selection via manifold optimization. Comput Stat 2022. [DOI: 10.1007/s00180-022-01216-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractWe consider the problem of constructing a reduced-rank regression model whose coefficient parameter is represented as a singular value decomposition with sparse singular vectors. The traditional estimation procedure for the coefficient parameter often fails when the true rank of the parameter is high. To overcome this issue, we develop an estimation algorithm with rank and variable selection via sparse regularization and manifold optimization, which enables us to obtain an accurate estimation of the coefficient parameter even if the true rank of the coefficient parameter is high. Using sparse regularization, we can also select an optimal value of the rank. We conduct Monte Carlo experiments and a real data analysis to illustrate the effectiveness of our proposed method.
Collapse
|
9
|
Some aspects of response variable selection and estimation in multivariate linear regression. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Liu X, Ma S, Chen K. Multivariate Functional Regression Via Nested Reduced-Rank Regularization. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1960850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA
| | - Shujie Ma
- Department of Statistics, University of California, Riverside, CA
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, CT
| |
Collapse
|
11
|
Dong R, Li D, Zheng Z. Parallel integrative learning for large-scale multi-response regression with incomplete outcomes. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Kobak D, Bernaerts Y, Weis MA, Scala F, Tolias AS, Berens P. Sparse reduced‐rank regression for exploratory visualisation of paired multivariate data. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12494] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Dmitry Kobak
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
| | - Yves Bernaerts
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
- International Max Planck Research School for Intelligent Systems Germany
| | - Marissa A. Weis
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
| | - Federico Scala
- Department of Neuroscience Baylor College of Medicine Houston Texas USA
| | - Andreas S. Tolias
- Department of Neuroscience Baylor College of Medicine Houston Texas USA
| | - Philipp Berens
- Institute for Ophthalmic Research University of Tübingen Tübingen Germany
- Department of Computer Science University of Tübingen Tübingen Germany
| |
Collapse
|
13
|
|
14
|
Wang D, Zheng Y, Lian H, Li G. High-Dimensional Vector Autoregressive Time Series Modeling via Tensor Decomposition. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2020.1855183] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Di Wang
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Yao Zheng
- Department of Statistics, University of Connecticut, Storrs, CT
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Guodong Li
- Department of Statistics and Actuarial Science, University of Hong Kong, Pok Fu Lam, Hong Kong
| |
Collapse
|
15
|
Mokhtaridoost M, Gönen M. An efficient framework to identify key miRNA-mRNA regulatory modules in cancer. Bioinformatics 2020; 36:i592-i600. [PMID: 33381822 DOI: 10.1093/bioinformatics/btaa798] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Micro-RNAs (miRNAs) are known as the important components of RNA silencing and post-transcriptional gene regulation, and they interact with messenger RNAs (mRNAs) either by degradation or by translational repression. miRNA alterations have a significant impact on the formation and progression of human cancers. Accordingly, it is important to establish computational methods with high predictive performance to identify cancer-specific miRNA-mRNA regulatory modules. RESULTS We presented a two-step framework to model miRNA-mRNA relationships and identify cancer-specific modules between miRNAs and mRNAs from their matched expression profiles of more than 9000 primary tumors. We first estimated the regulatory matrix between miRNA and mRNA expression profiles by solving multiple linear programming problems. We then formulated a unified regularized factor regression (RFR) model that simultaneously estimates the effective number of modules (i.e. latent factors) and extracts modules by decomposing regulatory matrix into two low-rank matrices. Our RFR model groups correlated miRNAs together and correlated mRNAs together, and also controls sparsity levels of both matrices. These attributes lead to interpretable results with high predictive performance. We applied our method on a very comprehensive data collection by including 32 TCGA cancer types. To find the biological relevance of our approach, we performed functional gene set enrichment and survival analyses. A large portion of the identified modules are significantly enriched in Hallmark, PID and KEGG pathways/gene sets. To validate the identified modules, we also performed literature validation as well as validation using experimentally supported miRTarBase database. AVAILABILITY AND IMPLEMENTATION Our implementation of proposed two-step RFR algorithm in R is available at https://github.com/MiladMokhtaridoost/2sRFR together with the scripts that replicate the reported experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, İstanbul 34450, Turkey.,School of Medicine, Koç University, İstanbul 34450, Turkey.,Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
16
|
Yang D, Goh G, Wang H. A fully Bayesian approach to sparse reduced-rank multivariate regression. STAT MODEL 2020. [DOI: 10.1177/1471082x20948697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In the context of high-dimensional multivariate linear regression, sparse reduced-rank regression (SRRR) provides a way to handle both variable selection and low-rank estimation problems. Although there has been extensive research on SRRR, statistical inference procedures that deal with the uncertainty due to variable selection and rank reduction are still limited. To fill this research gap, we develop a fully Bayesian approach to SRRR. A major difficulty that occurs in a fully Bayesian framework is that the dimension of parameter space varies with the selected variables and the reduced-rank. Due to the varying-dimensional problems, traditional Markov chain Monte Carlo (MCMC) methods such as Gibbs sampler and Metropolis-Hastings algorithm are inapplicable in our Bayesian framework. To address this issue, we propose a new posterior computation procedure based on the Laplace approximation within the collapsed Gibbs sampler. A key feature of our fully Bayesian method is that the model uncertainty is automatically integrated out by the proposed MCMC computation. The proposed method is examined via simulation study and real data analysis.
Collapse
Affiliation(s)
- Dunfu Yang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Gyuhyeong Goh
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Haiyan Wang
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| |
Collapse
|
17
|
Hilafu H, Safo SE, Haine L. Sparse reduced-rank regression for integrating omics data. BMC Bioinformatics 2020; 21:283. [PMID: 32620072 PMCID: PMC7333421 DOI: 10.1186/s12859-020-03606-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Accepted: 06/16/2020] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND The problem of assessing associations between multiple omics data including genomics and metabolomics data to identify biomarkers potentially predictive of complex diseases has garnered considerable research interest nowadays. A popular epidemiology approach is to consider an association of each of the predictors with each of the response using a univariate linear regression model, and to select predictors that meet a priori specified significance level. Although this approach is simple and intuitive, it tends to require larger sample size which is costly. It also assumes variables for each data type are independent, and thus ignores correlations that exist between variables both within each data type and across the data types. RESULTS We consider a multivariate linear regression model that relates multiple predictors with multiple responses, and to identify multiple relevant predictors that are simultaneously associated with the responses. We assume the coefficient matrix of the responses on the predictors is both row-sparse and of low-rank, and propose a group Dantzig type formulation to estimate the coefficient matrix. CONCLUSION Extensive simulations demonstrate the competitive performance of our proposed method when compared to existing methods in terms of estimation, prediction, and variable selection. We use the proposed method to integrate genomics and metabolomics data to identify genetic variants that are potentially predictive of atherosclerosis cardiovascular disease (ASCVD) beyond well-established risk factors. Our analysis shows some genetic variants that increase prediction of ASCVD beyond some well-established factors of ASCVD, and also suggest a potential utility of the identified genetic variants in explaining possible association between certain metabolites and ASCVD.
Collapse
Affiliation(s)
- Haileab Hilafu
- Department of Business Analytics and Statistics, University of Tennessee, Knoxville, 37996 TN USA
| | - Sandra E. Safo
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455 MN USA
| | - Lillian Haine
- Division of Biostatistics, University of Minnesota, Minneapolis, 55455 MN USA
| |
Collapse
|
18
|
Yu M, Gupta V, Kolar M. Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach. Electron J Stat 2020. [DOI: 10.1214/19-ejs1658] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Zhao W, Zhang F, Li R, Lian H. Principal single-index varying-coefficient models for dimension reduction in quantile regression. J STAT COMPUT SIM 2019. [DOI: 10.1080/00949655.2019.1707831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Weihua Zhao
- School of Science, Nantong University, Nantong, People's Republic of China
| | - Fode Zhang
- Center of Statistical Research and School of Statistics, Southwestern University of Finance and Economics, Chengdu, People's Republic of China
| | - Rui Li
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, People's Republic of China
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong
- City University of Hong Kong, Shenzhen Research Institute, Shenzhen, People's Republic of China
| |
Collapse
|
20
|
Li R, Duan R, Kember RL, Rader DJ, Damrauer SM, Moore JH, Chen Y. A regression framework to uncover pleiotropy in large-scale electronic health record data. J Am Med Inform Assoc 2019; 26:1083-1090. [PMID: 31529123 DOI: 10.1093/jamia/ocz084] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 04/17/2019] [Accepted: 05/16/2019] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Pleiotropy, where 1 genetic locus affects multiple phenotypes, can offer significant insights in understanding the complex genotype-phenotype relationship. Although individual genotype-phenotype associations have been thoroughly explored, seemingly unrelated phenotypes can be connected genetically through common pleiotropic loci or genes. However, current analyses of pleiotropy have been challenged by both methodologic limitations and a lack of available suitable data sources. MATERIALS AND METHODS In this study, we propose to utilize a new regression framework, reduced rank regression, to simultaneously analyze multiple phenotypes and genotypes to detect pleiotropic effects. We used a large-scale biobank linked electronic health record data from the Penn Medicine BioBank to select 5 cardiovascular diseases (hypertension, cardiac dysrhythmias, ischemic heart disease, congestive heart failure, and heart valve disorders) and 5 mental disorders (mood disorders; anxiety, phobic and dissociative disorders; alcohol-related disorders; neurological disorders; and delirium dementia) to validate our framework. RESULTS Compared with existing methods, reduced rank regression showed a higher power to distinguish known associated single-nucleotide polymorphisms from random single-nucleotide polymorphisms. In addition, genome-wide gene-based investigation of pleiotropy showed that reduced rank regression was able to identify candidate genetic variants with novel pleiotropic effects compared to existing methods. CONCLUSION The proposed regression framework offers a new approach to account for the phenotype and genotype correlations when identifying pleiotropic effects. By jointly modeling multiple phenotypes and genotypes together, the method has the potential to distinguish confounding from causal genotype and phenotype associations.
Collapse
Affiliation(s)
- Ruowang Li
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rui Duan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Rachel L Kember
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania, USA
- Regeneron Genetics Center, Tarrytown, New York, USA
| | - Daniel J Rader
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Scott M Damrauer
- Corporal Michael J. Crescenz VA Medical Center, Philadelphia, Pennsylvania, USA
- Department of Surgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Yong Chen
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
21
|
Uematsu Y, Fan Y, Chen K, Lv J, Lin W. SOFAR: Large-Scale Association Network Learning. IEEE TRANSACTIONS ON INFORMATION THEORY 2019; 65:4924-4939. [PMID: 33746241 PMCID: PMC7970712 DOI: 10.1109/tit.2019.2909889] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Many modern big data applications feature large scale in both numbers of responses and predictors. Better statistical efficiency and scientific insights can be enabled by understanding the large-scale response-predictor association network structures via layers of sparse latent factors ranked by importance. Yet sparsity and orthogonality have been two largely incompatible goals. To accommodate both features, in this paper we suggest the method of sparse orthogonal factor regression (SOFAR) via the sparse singular value decomposition with orthogonality constrained optimization to learn the underlying association networks, with broad applications to both unsupervised and supervised learning tasks such as biclustering with sparse singular value decomposition, sparse principal component analysis, sparse factor analysis, and spare vector autoregression analysis. Exploiting the framework of convexity-assisted nonconvex optimization, we derive nonasymptotic error bounds for the suggested procedure characterizing the theoretical advantages. The statistical guarantees are powered by an efficient SOFAR algorithm with convergence property. Both computational and theoretical advantages of our procedure are demonstrated with several simulations and real data examples.
Collapse
Affiliation(s)
- Yoshimasa Uematsu
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Yingying Fan
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Kun Chen
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Jinchi Lv
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| | - Wei Lin
- Yoshimasa Uematsu is Assistant Professor, Department of Economics and Management, Tohoku University, Sendai 980-8576, Japan. Yingying Fan is Dean's Associate Professor in Business Administration, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Kun Chen is Associate Professor, Department of Statistics, University of Connecticut, Storrs, CT 06269. Jinchi Lv is Kenneth King Stonier Chair in Business Administration and Professor, Data Sciences and Operations Department, Marshall School of Business, University of Southern California, Los Angeles, CA 90089. Wei Lin is Assistant Professor, School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing, China 100871
| |
Collapse
|
22
|
Li G, Liu X, Chen K. Integrative multi-view regression: Bridging group-sparse and low-rank models. Biometrics 2019; 75:593-602. [PMID: 30456759 PMCID: PMC6849205 DOI: 10.1111/biom.13006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 10/24/2018] [Indexed: 11/30/2022]
Abstract
Multi-view data have been routinely collected in various fields of science and engineering. A general problem is to study the predictive association between multivariate responses and multi-view predictor sets, all of which can be of high dimensionality. It is likely that only a few views are relevant to prediction, and the predictors within each relevant view contribute to the prediction collectively rather than sparsely. We cast this new problem under the familiar multivariate regression framework and propose an integrative reduced-rank regression (iRRR), where each view has its own low-rank coefficient matrix. As such, latent features are extracted from each view in a supervised fashion. For model estimation, we develop a convex composite nuclear norm penalization approach, which admits an efficient algorithm via alternating direction method of multipliers. Extensions to non-Gaussian and incomplete data are discussed. Theoretically, we derive non-asymptotic oracle bounds of iRRR under a restricted eigenvalue condition. Our results recover oracle bounds of several special cases of iRRR including Lasso, group Lasso, and nuclear norm penalized regression. Therefore, iRRR seamlessly bridges group-sparse and low-rank methods and can achieve substantially faster convergence rate under realistic settings of multi-view learning. Simulation studies and an application in the Longitudinal Studies of Aging further showcase the efficacy of the proposed methods.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Columbia University, New York
| | - Xiaokang Liu
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| |
Collapse
|
23
|
A principal varying-coefficient model for quantile regression: Joint variable selection and dimension reduction. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2018.05.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
24
|
She Y, Tran H. On cross-validation for sparse reduced rank regression. J R Stat Soc Series B Stat Methodol 2018. [DOI: 10.1111/rssb.12295] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yiyuan She
- Florida State University; Tallahassee USA
| | - Hoang Tran
- Florida State University; Tallahassee USA
| |
Collapse
|
25
|
Luo C, Liang J, Li G, Wang F, Zhang C, Dey DK, Chen K. Leveraging mixed and incomplete outcomes via reduced-rank modeling. J MULTIVARIATE ANAL 2018. [DOI: 10.1016/j.jmva.2018.04.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
26
|
Liang J, Chen K, Lin M, Zhang C, Wang F. Robust finite mixture regression for heterogeneous targets. Data Min Knowl Discov 2018. [DOI: 10.1007/s10618-018-0564-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
27
|
He K, Lian H, Ma S, Huang JZ. Dimensionality Reduction and Variable Selection in Multivariate Varying-Coefficient Models With a Large Number of Covariates. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2017.1285774] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Kejun He
- Institute of Statistics and Big Data, Renmin University of China, Beijing, China
| | - Heng Lian
- Department of Mathematics, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Shujie Ma
- Department of Statistics, University of California-Riverside, Riverside, CA
| | - Jianhua Z. Huang
- Department of Statistics, Texas A & M University, College Station, TX
| |
Collapse
|
28
|
Xin X, Hu J, Liu L. On the oracle property of a generalized adaptive elastic-net for multivariate linear regression with a diverging number of parameters. J MULTIVARIATE ANAL 2017. [DOI: 10.1016/j.jmva.2017.08.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Abstract
Many modern statistical problems can be cast in the framework of multivariate regression, where the main task is to make statistical inference for a possibly sparse and low-rank coefficient matrix. The low-rank structure in the coefficient matrix is of intrinsic multivariate nature, which, when combined with sparsity, can further lift dimension reduction, conduct variable selection, and facilitate model interpretation. Using a Bayesian approach, we develop a unified sparse and low-rank multivariate regression method to both estimate the coefficient matrix and obtain its credible region for making inference. The newly developed sparse and low-rank prior for the coefficient matrix enables rank reduction, predictor selection and response selection simultaneously. We utilize the marginal likelihood to determine the regularization hyperparameter, so our method maximizes its posterior probability given the data. For theoretical aspect, the posterior consistency is established to discuss an asymptotic behavior of the proposed method. The efficacy of the proposed approach is demonstrated via simulation studies and a real application on yeast cell cycle data.
Collapse
Affiliation(s)
- Gyuhyeong Goh
- Department of Statistics, Kansas State University, Manhattan, KS 66506, United States
| | - Dipak K Dey
- Department of Statistics, University of Connecticut, Storrs, CT 06269, United States
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, CT 06269, United States
| |
Collapse
|
30
|
|
31
|
Abstract
In multivariate regression models, a sparse singular value decomposition of the regression component matrix is appealing for reducing dimensionality and facilitating interpretation. However, the recovery of such a decomposition remains very challenging, largely due to the simultaneous presence of orthogonality constraints and co-sparsity regularization. By delving into the underlying statistical data generation mechanism, we reformulate the problem as a supervised co-sparse factor analysis, and develop an efficient computational procedure, named sequential factor extraction via co-sparse unit-rank estimation (SeCURE), that completely bypasses the orthogonality requirements. At each step, the problem reduces to a sparse multivariate regression with a unit-rank constraint. Nicely, each sequentially extracted sparse and unit-rank coefficient matrix automatically leads to co-sparsity in its pair of singular vectors. Each latent factor is thus a sparse linear combination of the predictors and may influence only a subset of responses. The proposed algorithm is guaranteed to converge, and it ensures efficient computation even with incomplete data and/or when enforcing exact orthogonality is desired. Our estimators enjoy the oracle properties asymptotically; a non-asymptotic error bound further reveals some interesting finite-sample behaviors of the estimators. The efficacy of SeCURE is demonstrated by simulation studies and two applications in genetics.
Collapse
Affiliation(s)
| | - Dipak K Dey
- Department of Statistics, University of Connecticut
| | - Kun Chen
- Department of Statistics, University of Connecticut
| |
Collapse
|
32
|
|
33
|
Chen K, Hoffman EA, Seetharaman I, Jiao F, Lin CL, Chan KS. LINKING LUNG AIRWAY STRUCTURE TO PULMONARY FUNCTION VIA COMPOSITE BRIDGE REGRESSION. Ann Appl Stat 2016; 10:1880-1906. [PMID: 28280520 PMCID: PMC5340208 DOI: 10.1214/16-aoas947] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The human lung airway is a complex inverted tree-like structure. Detailed airway measurements can be extracted from MDCT-scanned lung images, such as segmental wall thickness, airway diameter, parent-child branch angles, etc. The wealth of lung airway data provides a unique opportunity for advancing our understanding of the fundamental structure-function relationships within the lung. An important problem is to construct and identify important lung airway features in normal subjects and connect these to standardized pulmonary function test results such as FEV1%. Among other things, the problem is complicated by the fact that a particular airway feature may be an important (relevant) predictor only when it pertains to segments of certain generations. Thus, the key is an efficient, consistent method for simultaneously conducting group selection (lung airway feature types) and within-group variable selection (airway generations), i.e., bi-level selection. Here we streamline a comprehensive procedure to process the lung airway data via imputation, normalization, transformation and groupwise principal component analysis, and then adopt a new composite penalized regression approach for conducting bi-level feature selection. As a prototype of composite penalization, the proposed composite bridge regression method is shown to admit an efficient algorithm, enjoy bi-level oracle properties, and outperform several existing methods. We analyze the MDCT lung image data from a cohort of 132 subjects with normal lung function. Our results show that, lung function in terms of FEV1% is promoted by having a less dense and more homogeneous lung comprising an airway whose segments enjoy more heterogeneity in wall thicknesses, larger mean diameters, lumen areas and branch angles. These data hold the potential of defining more accurately the "normal" subject population with borderline atypical lung functions that are clearly influenced by many genetic and environmental factors.
Collapse
|
34
|
Feng S, Lian H, Zhu F. Reduced rank regression with possibly non-smooth criterion functions: An empirical likelihood approach. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2016.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
35
|
Abstract
Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Since nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modeling and estimation procedure in a multi-covariate multi-response problem concerning concrete.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269, U.S.A
| | - Yanyuan Ma
- Department of Statistics, University of South Carolina, 1523 Greene Street Columbia, SC 29208, U.S.A
| |
Collapse
|
36
|
Chen K, Chan KS. A note on rank reduction in sparse multivariate regression. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2016; 10:100-120. [PMID: 26997938 PMCID: PMC4797956 DOI: 10.1080/15598608.2015.1081573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
A reduced-rank regression with sparse singular value decomposition (RSSVD) approach was proposed by Chen et al. for conducting variable selection in a reduced-rank model. To jointly model the multivariate response, the method efficiently constructs a prespecified number of latent variables as some sparse linear combinations of the predictors. Here, we generalize the method to also perform rank reduction, and enable its usage in reduced-rank vector autoregressive (VAR) modeling to perform automatic rank determination and order selection. We show that in the context of stationary time-series data, the generalized approach correctly identifies both the model rank and the sparse dependence structure between the multivariate response and the predictors, with probability one asymptotically. We demonstrate the efficacy of the proposed method by simulations and analyzing a macro-economical multivariate time series using a reduced-rank VAR model.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| | - Kung-Sik Chan
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
37
|
Chen J, Zhang S. Integrative analysis for identifying joint modular patterns of gene-expression and drug-response data. Bioinformatics 2016; 32:1724-32. [DOI: 10.1093/bioinformatics/btw059] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 01/27/2016] [Indexed: 12/13/2022] Open
|
38
|
Lian H, Kim Y. Nonconvex penalized reduced rank regression and its oracle properties in high dimensions. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2015.09.023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
39
|
Lin W, Feng R, Li H. Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics. J Am Stat Assoc 2015; 110:270-288. [PMID: 26392642 DOI: 10.1080/01621459.2014.908125] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
In genetical genomics studies, it is important to jointly analyze gene expression data and genetic variants in exploring their associations with complex traits, where the dimensionality of gene expressions and genetic variants can both be much larger than the sample size. Motivated by such modern applications, we consider the problem of variable selection and estimation in high-dimensional sparse instrumental variables models. To overcome the difficulty of high dimensionality and unknown optimal instruments, we propose a two-stage regularization framework for identifying and estimating important covariate effects while selecting and estimating optimal instruments. The methodology extends the classical two-stage least squares estimator to high dimensions by exploiting sparsity using sparsity-inducing penalty functions in both stages. The resulting procedure is efficiently implemented by coordinate descent optimization. For the representative L1 regularization and a class of concave regularization methods, we establish estimation, prediction, and model selection properties of the two-stage regularized estimators in the high-dimensional setting where the dimensionality of co-variates and instruments are both allowed to grow exponentially with the sample size. The practical performance of the proposed method is evaluated by simulation studies and its usefulness is illustrated by an analysis of mouse obesity data. Supplementary materials for this article are available online.
Collapse
Affiliation(s)
- Wei Lin
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Rui Feng
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
40
|
|
41
|
|
42
|
Mukherjee A, Chen K, Wang N, Zhu J. On the degrees of freedom of reduced-rank estimators in multivariate regression. Biometrika 2015; 102:457-477. [PMID: 26702155 DOI: 10.1093/biomet/asu067] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.
Collapse
Affiliation(s)
- A Mukherjee
- Smart Forecasting Team, @WalmartLabs, 850 Cherry Avenue, San Bruno, California 94066, U.S.A
| | - K Chen
- Department of Statistics, University of Connecticut, 215 Glenbrook Road U-4120, Storrs, Connecticut 06269, U.S.A
| | - N Wang
- Department of Statistics, University of Michigan, 1085 S. University Avenue, Ann Arbor, Michigan 48109, U.S.A
| | - J Zhu
- Department of Statistics, University of Michigan, 1085 S. University Avenue, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
43
|
Abstract
We formulate a statistical model for the regulation of global gene expression by multiple regulatory programs and propose a thresholding singular value decomposition (T-SVD) regression method for learning such a model from data. Extensive simulations demonstrate that this method offers improved computational speed and higher sensitivity and specificity over competing approaches. The method is used to analyze microRNA (miRNA) and long noncoding RNA (lncRNA) data from The Cancer Genome Atlas (TCGA) consortium. The analysis yields previously unidentified insights into the combinatorial regulation of gene expression by noncoding RNAs, as well as findings that are supported by evidence from the literature.
Collapse
|
44
|
Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.923775] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
45
|
Wang Z, Curry E, Montana G. Network-guided regression for detecting associations between DNA methylation and gene expression. ACTA ACUST UNITED AC 2014; 30:2693-701. [PMID: 24919878 DOI: 10.1093/bioinformatics/btu361] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. These data could yield discoveries that may lead to breakthroughs in the diagnosis and treatment of human disease, but require statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood. RESULTS We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to The Cancer Genome Atlas datasets on primary ovarian tumours. AVAILABILITY AND IMPLEMENTATION R code implementing the NsRRR model is available at http://www2.imperial.ac.uk/∼gmontana CONTACT giovanni.montana@kcl.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zi Wang
- Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK
| | - Edward Curry
- Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK
| | - Giovanni Montana
- Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK
| |
Collapse
|
46
|
Chen K, Ciannelli L, Decker MB, Ladd C, Cheng W, Zhou Z, Chan KS. Reconstructing source-sink dynamics in a population with a pelagic dispersal phase. PLoS One 2014; 9:e95316. [PMID: 24835251 PMCID: PMC4023943 DOI: 10.1371/journal.pone.0095316] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 03/25/2014] [Indexed: 11/21/2022] Open
Abstract
For many organisms, the reconstruction of source-sink dynamics is hampered by limited knowledge of the spatial assemblage of either the source or sink components or lack of information on the strength of the linkage for any source-sink pair. In the case of marine species with a pelagic dispersal phase, these problems may be mitigated through the use of particle drift simulations based on an ocean circulation model. However, when simulated particle trajectories do not intersect sampling sites, the corroboration of model drift simulations with field data is hampered. Here, we apply a new statistical approach for reconstructing source-sink dynamics that overcomes the aforementioned problems. Our research is motivated by the need for understanding observed changes in jellyfish distributions in the eastern Bering Sea since 1990. By contrasting the source-sink dynamics reconstructed with data from the pre-1990 period with that from the post-1990 period, it appears that changes in jellyfish distribution resulted from the combined effects of higher jellyfish productivity and longer dispersal of jellyfish resulting from a shift in the ocean circulation starting in 1991. A sensitivity analysis suggests that the source-sink reconstruction is robust to typical systematic and random errors in the ocean circulation model driving the particle drift simulations. The jellyfish analysis illustrates that new insights can be gained by studying structural changes in source-sink dynamics. The proposed approach is applicable for the spatial source-sink reconstruction of other species and even abiotic processes, such as sediment transport.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, United States of America
| | - Lorenzo Ciannelli
- College of Earth, Ocean and Atmospheric Sciences, Oregon State University, Corvallis, Oregon, United States of America
| | - Mary Beth Decker
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Carol Ladd
- PMEL, NOAA, Seattle, Washington, United States of America
| | - Wei Cheng
- PMEL, NOAA, Seattle, Washington, United States of America
- Joint Institute for the Study of the Atmosphere and Ocean (JISAO), University of Washington, Seattle, Washington, United States of America
| | - Ziqian Zhou
- Department of Statistics and Actuarial Science University of Iowa, Iowa City, Iowa, United States of America
| | - Kung-Sik Chan
- Department of Statistics and Actuarial Science University of Iowa, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
47
|
Chen K, Chan KS, Stenseth NC. Source-Sink Reconstruction Through Regularized Multicomponent Regression Analysis—With Application to Assessing Whether North Sea Cod Larvae Contributed to Local Fjord Cod in Skagerrak. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.898583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
48
|
Hu J, Xin X, You J. Model determination and estimation for the growth curve model via group SCAD penalty. J MULTIVARIATE ANAL 2014. [DOI: 10.1016/j.jmva.2013.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
49
|
Chen K, Dong H, Chan KS. Reduced rank regression via adaptive nuclear norm penalization. Biometrika 2013; 100:901-920. [PMID: 25045172 DOI: 10.1093/biomet/ast036] [Citation(s) in RCA: 63] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We propose an adaptive nuclear norm penalization approach for low-rank matrix approximation, and use it to develop a new reduced rank estimation method for high-dimensional multivariate regression. The adaptive nuclear norm is defined as the weighted sum of the singular values of the matrix, and it is generally non-convex under the natural restriction that the weight decreases with the singular value. However, we show that the proposed non-convex penalized regression method has a global optimal solution obtained from an adaptively soft-thresholded singular value decomposition. The method is computationally efficient, and the resulting solution path is continuous. The rank consistency of and prediction/estimation performance bounds for the estimator are established for a high-dimensional asymptotic regime. Simulation studies and an application in genetics demonstrate its efficacy.
Collapse
Affiliation(s)
- Kun Chen
- Department of Statistics, University of Connecticut, 215 Glenbrook Road, Storrs, Connecticut 06269, U.S.A
| | - Hongbo Dong
- Wisconsin Institutes for Discovery, University of Wisconsin, 330 N. Orchard St., Madison, Wisconsin 53715, U.S.A
| | - Kung-Sik Chan
- Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa 52242, U.S.A
| |
Collapse
|
50
|
Liu J, Huang J, Ma S. Analysis of genome-wide association studies with multiple outcomes using penalization. PLoS One 2012; 7:e51198. [PMID: 23272092 PMCID: PMC3522680 DOI: 10.1371/journal.pone.0051198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Accepted: 10/30/2012] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies have been extensively conducted, searching for markers for biologically meaningful outcomes and phenotypes. Penalization methods have been adopted in the analysis of the joint effects of a large number of SNPs (single nucleotide polymorphisms) and marker identification. This study is partly motivated by the analysis of heterogeneous stock mice dataset, in which multiple correlated phenotypes and a large number of SNPs are available. Existing penalization methods designed to analyze a single response variable cannot accommodate the correlation among multiple response variables. With multiple response variables sharing the same set of markers, joint modeling is first employed to accommodate the correlation. The group Lasso approach is adopted to select markers associated with all the outcome variables. An efficient computational algorithm is developed. Simulation study and analysis of the heterogeneous stock mice dataset show that the proposed method can outperform existing penalization methods.
Collapse
Affiliation(s)
- Jin Liu
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
- * E-mail: (JL); (SM)
| | - Jian Huang
- Department of Statistics & Actuarial Science, Department of Biostatistics, University of Iowa, Iowa City, Iowa, United States of America
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
- * E-mail: (JL); (SM)
| |
Collapse
|