1
|
Lee K, Park Y. Bayesian inference for multivariate probit model with latent envelope. Biometrics 2024; 80:ujae059. [PMID: 38949889 DOI: 10.1093/biomtc/ujae059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/15/2024] [Accepted: 06/11/2024] [Indexed: 07/03/2024]
Abstract
The response envelope model proposed by Cook et al. (2010) is an efficient method to estimate the regression coefficient under the context of the multivariate linear regression model. It improves estimation efficiency by identifying material and immaterial parts of responses and removing the immaterial variation. The response envelope model has been investigated only for continuous response variables. In this paper, we propose the multivariate probit model with latent envelope, in short, the probit envelope model, as a response envelope model for multivariate binary response variables. The probit envelope model takes into account relations between Gaussian latent variables of the multivariate probit model by using the idea of the response envelope model. We address the identifiability of the probit envelope model by employing the essential identifiability concept and suggest a Bayesian method for the parameter estimation. We illustrate the probit envelope model via simulation studies and real-data analysis. The simulation studies show that the probit envelope model has the potential to gain efficiency in estimation compared to the multivariate probit model. The real data analysis shows that the probit envelope model is useful for multi-label classification.
Collapse
Affiliation(s)
- Kwangmin Lee
- Department of Big Data Convergence, Chonnam National University, Gwangju 61186, South Korea
| | - Yeonhee Park
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53726, United States
| |
Collapse
|
2
|
Carabajal MD, Bortolato SA, Lisandrini FT, Olivieri AC. An exhaustive analysis of the use of image moments for second-order calibration. A comparison with multivariate curve resolution-alternating least-squares. Anal Chim Acta 2024; 1288:342177. [PMID: 38220307 DOI: 10.1016/j.aca.2023.342177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/21/2023] [Accepted: 12/22/2023] [Indexed: 01/16/2024]
Abstract
BACKGROUND the chemometric processing of second-order chromatographic-spectral data is usually carried out with the aid of multivariate curve resolution-alternating least-squares (MCR-ALS). Recently, an alternative procedure was described based on the estimation of image moments for each data matrix and subsequent application of multiple linear regression after suitable variable selection. RESULTS The analysis of both simulated and experimental data leads to the conclusion that the image moment method, although can cope with chromatographic lack of reproducibility across injections, it only performs well in the absence of uncalibrated interferents. MCR-ALS, on the other hand, provides good analytical results in all studied situations, whether the test samples contain uncalibrated interferents or not. SIGNIFICANCE The results are useful to assess the real usefulness of newly proposed methodologies for second-order calibration in the case of chromatographic-spectral data sets, especially when samples contain unexpected chemical constituents.
Collapse
Affiliation(s)
- Maira D Carabajal
- Departamento de Química Analítica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina; Instituto de Química Rosario (CONICET-UNR), 27 de Febrero 210 Bis, 2000, Rosario, Argentina
| | - Santiago A Bortolato
- Departamento de Matemática, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina; Instituto de Química Rosario (CONICET-UNR), 27 de Febrero 210 Bis, 2000, Rosario, Argentina
| | - Franco T Lisandrini
- Physikalisches Institut, University of Bonn, Nussallee 12, 53115, Bonn, Germany
| | - Alejandro C Olivieri
- Departamento de Química Analítica, Facultad de Ciencias Bioquímicas y Farmacéuticas, Universidad Nacional de Rosario, Suipacha 531, 2000, Rosario, Argentina; Instituto de Química Rosario (CONICET-UNR), 27 de Febrero 210 Bis, 2000, Rosario, Argentina.
| |
Collapse
|
3
|
Eck DJ. General model-free weighted envelope estimation. Electron J Stat 2023. [DOI: 10.1214/23-ejs2105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Daniel J. Eck
- Department of Statistics, University of Illinois, 605 E. Springfield Ave., Champaign, IL 61820 USA
| |
Collapse
|
4
|
Guo W, Balakrishnan N, He M. Envelope-based sparse reduced-rank regression for multivariate linear model. J MULTIVARIATE ANAL 2023. [DOI: 10.1016/j.jmva.2023.105159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
5
|
Franks AM. Reducing subspace models for large-scale covariance regression. Biometrics 2022; 78:1604-1613. [PMID: 34458980 DOI: 10.1111/biom.13531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Revised: 06/29/2021] [Accepted: 07/08/2021] [Indexed: 12/30/2022]
Abstract
We develop an envelope model for joint mean and covariance regression in the large p, small n setting. In contrast to existing envelope methods, which improve mean estimates by incorporating estimates of the covariance structure, we focus on identifying covariance heterogeneity by incorporating information about mean-level differences. We use a Monte Carlo EM algorithm to identify a low-dimensional subspace that explains differences in both means and covariances as a function of covariates, and then use MCMC to estimate the posterior uncertainty conditional on the inferred low-dimensional subspace. We demonstrate the utility of our model on a motivating application on the metabolomics of aging. We also provide R code that can be used to develop and test other generalizations of the response envelope model.
Collapse
Affiliation(s)
- Alexander M Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, California, USA
| |
Collapse
|
6
|
Park Y, Su Z, Chung D. Envelope-based partial partial least squares with application to cytokine-based biomarker analysis for COVID-19. Stat Med 2022; 41:4578-4592. [PMID: 36111618 PMCID: PMC9350235 DOI: 10.1002/sim.9526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 05/27/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
Partial least squares (PLS) regression is a popular alternative to ordinary least squares regression because of its superior prediction performance demonstrated in many cases. In various contemporary applications, the predictors include both continuous and categorical variables. A common practice in PLS regression is to treat the categorical variable as continuous. However, studies find that this practice may lead to biased estimates and invalid inferences (Schuberth et al., 2018). Based on a connection between the envelope model and PLS, we develop an envelope-based partial PLS estimator that considers the PLS regression on the conditional distributions of the response(s) and continuous predictors on the categorical predictors. Root-n consistency and asymptotic normality are established for this estimator. Numerical study shows that this approach can achieve more efficiency gains in estimation and produce better predictions. The method is applied for the identification of cytokine-based biomarkers for COVID-19 patients, which reveals the association between the cytokine-based biomarkers and patients' clinical information including disease status at admission and demographical characteristics. The efficient estimation leads to a clear scientific interpretation of the results.
Collapse
Affiliation(s)
- Yeonhee Park
- Department of Biostatistics and Medical InformaticsUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Zhihua Su
- Department of StatisticsUniversity of FloridaGainesvilleFloridaUSA
| | - Dongjun Chung
- Department of Biomedical InformaticsThe Ohio State UniversityColumbusOhioUSA
| |
Collapse
|
7
|
Hu J, Huang J, Liu X, Liu X. Response Best-subset Selector for Multivariate Regression with High-dimensional Response Variables. Biometrika 2022. [DOI: 10.1093/biomet/asac037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
This article investigates the statistical problem of response-variable selection with high-dimensional response variables and a diverging number of predictor variables with respect to the sample size in the framework of multivariate linear regression. A response best-subset selection model is proposed by introducing a 0–1 selection indictor for each response variable, then a response best-subset selector is developed by introducing a separation parameter and a novel penalized least-squares function. The developed procedure can perform response-variable selection and regression-coefficient estimation simultaneously, and the proposed response best-subset selector has model consistency under mild conditions for both fixed and diverging numbers of predictor variables. Also, consistency and asymptotic normality of regression-coefficient estimators are presented for cases with a fixed dimension, and it is discovered that the Bonferroni test is a special response best-subset selector. Finite-sample simulations show that the response best-subset selector has strong advantages over existing competitors in terms of the Matthews correlation coefficient, a criterion aimed at balancing accuracies for both true and false response variables. An analysis of actual data demonstrates the effectiveness of the response best-subset selector in an application involving the identification of dosage-sensitive genes.
Collapse
Affiliation(s)
- Jianhua Hu
- Shanghai University of Finance and Economics School of Statistics and Management, , Shanghai 200433, China
| | - Jian Huang
- University of Iowa Department of Statistics and Actuarial Science, , Iowa, U.S.A
| | - Xiaoqian Liu
- York University Department of Mathematics and Statistics, , Toronto, Ontario M3J 1P3, Canada
| | - Xu Liu
- Shanghai University of Finance and Economics School of Statistics and Management, , Shanghai 200433, China
| |
Collapse
|
8
|
Zhao Y, Van Keilegom I, Ding S. Envelopes for censored quantile regression. Scand Stat Theory Appl 2022. [DOI: 10.1111/sjos.12602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yue Zhao
- Research Centre for Operations Research and Statistics (ORSTAT), KU Leuven
| | | | - Shanshan Ding
- Department of Applied Economics and Statistics University of Delaware
| |
Collapse
|
9
|
Some aspects of response variable selection and estimation in multivariate linear regression. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Zhang J, Huang Z. Efficient simultaneous partial envelope model in multivariate linear regression. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1995866] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Jing Zhang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, People's Republic of China
- School of Mathematics and Finance, Chuzhou University, Chuzhou, Anhui, People's Republic of China
| | - Zhensheng Huang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, People's Republic of China
| |
Collapse
|
11
|
Ma L, Liu L, Yang W. Envelope method with ignorable missing data. Electron J Stat 2021; 15:4420-4461. [PMID: 37842008 PMCID: PMC10571183 DOI: 10.1214/21-ejs1881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]
Abstract
Envelope method was recently proposed as a method to reduce the dimension of responses in multivariate regressions. However, when there exists missing data, the envelope method using the complete case observations may lead to biased and inefficient results. In this paper, we generalize the envelope estimation when the predictors and/or the responses are missing at random. Specifically, we incorporate the envelope structure in the expectation-maximization (EM) algorithm. As the parameters under the envelope method are not pointwise identifiable, the EM algorithm for the envelope method was not straightforward and requires a special decomposition. Our method is guaranteed to be more efficient, or at least as efficient as, the standard EM algorithm. Moreover, our method has the potential to outperform the full data MLE. We give asymptotic properties of our method under both normal and non-normal cases. The efficiency gain over the standard EM is confirmed in simulation studies and in an application to the Chronic Renal Insufficiency Cohort (CRIC) study.
Collapse
Affiliation(s)
- Linquan Ma
- Department of Statistics, University of Wisconsin - Madison, Madison, Wisconsin, USA
- School of Statistics, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA
| | - Lan Liu
- School of Statistics, University of Minnesota at Twin Cities, Minneapolis, Minnesota, USA
| | - Wei Yang
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Zhang J, Huang Z, Jiang Z. Groupwise partial envelope model: efficient estimation in multivariate linear regression. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2021.1921800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Jing Zhang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
- School of Mathematics and Finance, Chuzhou University, Chuzhou, Anhui, P. R. China
| | - Zhensheng Huang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| | - Zhiqiang Jiang
- School of Science, Nanjing University of Science and Technology, Nanjing, Jiangsu, P. R. China
| |
Collapse
|
13
|
Liu L, Li W, Su Z, Cook D, Vizioli L, Yacoub E. Efficient estimation via envelope chain in magnetic resonance imaging‐based studies. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Lan Liu
- School of Statistics University of Minnesota at Twin Cities Minneapolis Minnesota USA
| | - Wei Li
- Center for Applied Statistics and School of Statistics Renmin University of China Beijing China
| | - Zhihua Su
- Department of Statistics University of Florida Gainesville Florida USA
| | - Dennis Cook
- School of Statistics University of Minnesota at Twin Cities Minneapolis Minnesota USA
| | - Luca Vizioli
- Center for Magnetic Resonance Research University of Minnesota 2021 6th St SE Minneapolis USA
- Department of Neurosurgery University of Minnesota, 500 SE Harvard St Minneapolis USA
| | - Essa Yacoub
- Department of Radiology University of Minnesota at Twin Cities Minneapolis Minnesota USA
| |
Collapse
|
14
|
Zhu G, Zhao T. Deep-gKnock: Nonlinear group-feature selection with deep neural networks. Neural Netw 2021; 135:139-147. [PMID: 33385830 DOI: 10.1016/j.neunet.2020.12.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 11/26/2020] [Accepted: 12/02/2020] [Indexed: 01/21/2023]
Abstract
Feature selection is central to contemporary high-dimensional data analysis. Group structure among features arises naturally in various scientific problems. Many methods have been proposed to incorporate the group structure information into feature selection. However, these methods are normally restricted to a linear regression setting. To relax the linear constraint, we design a new Deep Neural Network (DNN) architecture and integrating it with the recently proposed knockoff technique to perform nonlinear group-feature selection with controlled group-wise False Discovery Rate (gFDR). Experimental results on high-dimensional synthetic data demonstrate that our method achieves the highest power and accurate gFDR control compared with state-of-the-art methods. The performance of Deep-gKnock is especially superior in the following five situations: (1) nonlinearity relationship; (2) dimension p greater than sample size n; (3) high between-group correlation; (4) high within-group correlation; (5) large number of associated groups. And Deep-gKnock is also demonstrated to be robust to the misspecification of the feature distribution and the change of network architecture. Moreover, Deep-gKnock achieves scientifically meaningful group-feature selection results for cutting-edge real world datasets.
Collapse
Affiliation(s)
- Guangyu Zhu
- Department of Computer Science and Statistics, University of Rhode Island, United States of America.
| | - Tingting Zhao
- Department of Electrical and Computer Engineering, Northeastern University, United States of America
| |
Collapse
|
15
|
Affiliation(s)
- Yuyang Shi
- School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta 30332 GA USA
- School of Statistics University of Minnesota at Twin Cities Minneapolis 55455 MN USA
| | - Linquan Ma
- Department of Statistics University of Wisconsin‐Madison Madison 53706 WI USA
- School of Statistics University of Minnesota at Twin Cities Minneapolis 55455 MN USA
| | - Lan Liu
- School of Statistics University of Minnesota at Twin Cities Minneapolis 55455 MN USA
| |
Collapse
|
16
|
Feng Y, Xiao L, Chi EC. Sparse Single Index Models for Multivariate Responses. J Comput Graph Stat 2020; 30:115-124. [PMID: 34025100 DOI: 10.1080/10618600.2020.1779080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Joint models are popular for analyzing data with multivariate responses. We propose a sparse multivariate single index model, where responses and predictors are linked by unspecified smooth functions and multiple matrix level penalties are employed to select predictors and induce low-rank structures across responses. An alternating direction method of multipliers (ADMM) based algorithm is proposed for model estimation. We demonstrate the effectiveness of proposed model in simulation studies and an application to a genetic association study.
Collapse
Affiliation(s)
- Yuan Feng
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203
| | - Luo Xiao
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203
| | - Eric C Chi
- Department of Statistics, North Carolina State University, Raleigh, NC 27695-8203
| |
Collapse
|
17
|
|
18
|
Affiliation(s)
- Minji Lee
- Department of Statistics University of Florida Gainesville Florida USA
| | - Zhihua Su
- Department of Statistics University of Florida Gainesville Florida USA
| |
Collapse
|
19
|
Chen T, Su Z, Yang Y, Ding S. Efficient estimation in expectile regression using envelope models. Electron J Stat 2020. [DOI: 10.1214/19-ejs1664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Wang W, Zhang X, Li L. Common reducing subspace model and network alternation analysis. Biometrics 2019; 75:1109-1120. [DOI: 10.1111/biom.13099] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 05/22/2019] [Indexed: 12/21/2022]
Affiliation(s)
- Wenjing Wang
- Department of Statistics Florida State University Tallahassee Florida
| | - Xin Zhang
- Department of Statistics Florida State University Tallahassee Florida
| | - Lexin Li
- Department of Biostatistics and Epidemiology University of California Berkeley California
| |
Collapse
|
21
|
Xiao X, Zhou Y. Two-Dimensional Quaternion PCA and Sparse PCA. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:2028-2042. [PMID: 30418886 DOI: 10.1109/tnnls.2018.2872541] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Benefited from quaternion representation that is able to encode the cross-channel correlation of color images, quaternion principle component analysis (QPCA) was proposed to extract features from color images while reducing the feature dimension. A quaternion covariance matrix (QCM) of input samples was constructed, and its eigenvectors were derived to find the solution of QPCA. However, eigen-decomposition leads to the fixed solution for the same input. This solution is susceptible to outliers and cannot be further optimized. To solve this problem, this paper proposes a novel quaternion ridge regression (QRR) model for two-dimensional QPCA (2D-QPCA). We mathematically prove that this QRR model is equivalent to the QCM model of 2D-QPCA. The QRR model is a general framework and is flexible to combine 2D-QPCA with other technologies or constraints to adapt different requirements of real-world applications. Including sparsity constraints, we then propose a quaternion sparse regression model for 2D-QSPCA to improve its robustness for classification. An alternating minimization algorithm is developed to iteratively learn the solution of 2D-QSPCA in the equivalent complex domain. In addition, 2D-QPCA and 2D-QSPCA can preserve the spatial structure of color images and have a low computation cost. Experiments on several challenging databases demonstrate that 2D-QPCA and 2D-QSPCA are effective in color face recognition, and 2D-QSPCA outperforms the state of the arts.
Collapse
|
22
|
Li G, Liu X, Chen K. Integrative multi-view regression: Bridging group-sparse and low-rank models. Biometrics 2019; 75:593-602. [PMID: 30456759 PMCID: PMC6849205 DOI: 10.1111/biom.13006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 10/24/2018] [Indexed: 11/30/2022]
Abstract
Multi-view data have been routinely collected in various fields of science and engineering. A general problem is to study the predictive association between multivariate responses and multi-view predictor sets, all of which can be of high dimensionality. It is likely that only a few views are relevant to prediction, and the predictors within each relevant view contribute to the prediction collectively rather than sparsely. We cast this new problem under the familiar multivariate regression framework and propose an integrative reduced-rank regression (iRRR), where each view has its own low-rank coefficient matrix. As such, latent features are extracted from each view in a supervised fashion. For model estimation, we develop a convex composite nuclear norm penalization approach, which admits an efficient algorithm via alternating direction method of multipliers. Extensions to non-Gaussian and incomplete data are discussed. Theoretically, we derive non-asymptotic oracle bounds of iRRR under a restricted eigenvalue condition. Our results recover oracle bounds of several special cases of iRRR including Lasso, group Lasso, and nuclear norm penalized regression. Therefore, iRRR seamlessly bridges group-sparse and low-rank methods and can achieve substantially faster convergence rate under realistic settings of multi-view learning. Simulation studies and an application in the Longitudinal Studies of Aging further showcase the efficacy of the proposed methods.
Collapse
Affiliation(s)
- Gen Li
- Department of Biostatistics, Columbia University, New York
| | - Xiaokang Liu
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut
| |
Collapse
|
23
|
Jain Y, Ding S, Qiu J. Sliced inverse regression for integrative multi-omics data analysis. Stat Appl Genet Mol Biol 2019; 18:/j/sagmb.ahead-of-print/sagmb-2018-0028/sagmb-2018-0028.xml. [PMID: 30685747 DOI: 10.1515/sagmb-2018-0028] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Advancement in next-generation sequencing, transcriptomics, proteomics and other high-throughput technologies has enabled simultaneous measurement of multiple types of genomic data for cancer samples. These data together may reveal new biological insights as compared to analyzing one single genome type data. This study proposes a novel use of supervised dimension reduction method, called sliced inverse regression, to multi-omics data analysis to improve prediction over a single data type analysis. The study further proposes an integrative sliced inverse regression method (integrative SIR) for simultaneous analysis of multiple omics data types of cancer samples, including MiRNA, MRNA and proteomics, to achieve integrative dimension reduction and to further improve prediction performance. Numerical results show that integrative analysis of multi-omics data is beneficial as compared to single data source analysis, and more importantly, that supervised dimension reduction methods possess advantages in integrative data analysis in terms of classification and prediction as compared to unsupervised dimension reduction methods.
Collapse
Affiliation(s)
- Yashita Jain
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA
| | - Shanshan Ding
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| | - Jing Qiu
- Center for Bioinformatics and Computational Biology, University of Delaware, 15 Innovation Way, Newark, DE 19711, USA.,Department of Applied Economics and Statistics, University of Delaware, 531 S College Ave., Newark, DE 19711, USA
| |
Collapse
|
24
|
Affiliation(s)
- Lei Wang
- Department of Applied Economics and Statistics; University of Delaware; Newark DE 19716 USA
| | - Shanshan Ding
- Department of Applied Economics and Statistics; University of Delaware; Newark DE 19716 USA
| |
Collapse
|
25
|
Ding S, Dennis Cook R. Matrix variate regressions and envelope models. J R Stat Soc Series B Stat Methodol 2017. [DOI: 10.1111/rssb.12247] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
26
|
Simultaneous selection of predictors and responses for high dimensional multivariate linear regression. Stat Probab Lett 2017. [DOI: 10.1016/j.spl.2017.04.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
27
|
Eck DJ, Cook RD. Weighted envelope estimation to handle variability in model selection. Biometrika 2017. [DOI: 10.1093/biomet/asx035] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|