1
|
Qin X, Hu J, Ma S, Wu M. Estimation of multiple networks with common structures in heterogeneous subgroups. J MULTIVARIATE ANAL 2024; 202:105298. [PMID: 38433779 PMCID: PMC10907012 DOI: 10.1016/j.jmva.2024.105298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.
Collapse
Affiliation(s)
- Xing Qin
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
| | - Jianhua Hu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| |
Collapse
|
2
|
Zhou RR, Zucker DM, Zhao SD. Power of testing for exposure effects under incomplete mediation. Int J Biostat 2024; 20:217-228. [PMID: 37084462 DOI: 10.1515/ijb-2022-0106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 03/25/2023] [Indexed: 04/23/2023]
Abstract
Mediation analysis studies situations where an exposure may affect an outcome both directly and indirectly through intervening variables called mediators. It is frequently of interest to test for the effect of the exposure on the outcome, and the standard approach is simply to regress the latter on the former. However, it seems plausible that a more powerful test statistic could be achieved by also incorporating the mediators. This would be useful in cases where the exposure effect size might be small, which for example is common in genomics applications. Previous work has shown that this is indeed possible under complete mediation, where there is no direct effect. In most applications, however, the direct effect is likely nonzero. In this paper we study linear mediation models and find that under certain conditions, power gain is still possible under this incomplete mediation setting for testing the null hypothesis that there is neither a direct nor an indirect effect. We study a class of procedures that can achieve this performance and develop their application to both low- and high-dimensional mediators. We then illustrate their performances in simulations as well as in an analysis using DNA methylation mediators to study the effect of cigarette smoking on gene expression.
Collapse
Affiliation(s)
| | - David M Zucker
- Department of Statistics and Data Science, Hebrew University, Jerusalem, Israel
| | - Sihai D Zhao
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
3
|
Wang Y, Shojaie A, Randolph T, Knight P, Ma J. GENERALIZED MATRIX DECOMPOSITION REGRESSION: ESTIMATION AND INFERENCE FOR TWO-WAY STRUCTURED DATA. Ann Appl Stat 2023; 17:2944-2969. [PMID: 38149262 PMCID: PMC10751029 DOI: 10.1214/23-aoas1746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Motivated by emerging applications in ecology, microbiology, and neuroscience, this paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage auxiliary information on row and column structures. GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for incorporating relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse, but constrains the coordinate system representing the regression coefficients according to the column structure. GMDI also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI in simulation studies and an application to human microbiome data.
Collapse
Affiliation(s)
- Yue Wang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus
| | - Ali Shojaie
- Department of Biostatistics, University of Washington
| | | | | | - Jing Ma
- Public Health Sciences Division, Fred Hutchinson Cancer Center
| |
Collapse
|
4
|
Chen J, Li Q, Chen HY. Testing generalized linear models with high-dimensional nuisance parameter. Biometrika 2023; 110:83-99. [PMID: 36816791 PMCID: PMC9933885 DOI: 10.1093/biomet/asac021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Generalized linear models often have a high-dimensional nuisance parameters, as seen in applications such as testing gene-environment interactions or gene-gene interactions. In these scenarios, it is essential to test the significance of a high-dimensional sub-vector of the model's coefficients. Although some existing methods can tackle this problem, they often rely on the bootstrap to approximate the asymptotic distribution of the test statistic, and thus are computationally expensive. Here, we propose a computationally efficient test with a closed-form limiting distribution, which allows the parameter being tested to be either sparse or dense. We show that under certain regularity conditions, the type I error of the proposed method is asymptotically correct, and we establish its power under high-dimensional alternatives. Extensive simulations demonstrate the good performance of the proposed test and its robustness when certain sparsity assumptions are violated. We also apply the proposed method to Chinese famine sample data in order to show its performance when testing the significance of gene-environment interactions.
Collapse
Affiliation(s)
- Jinsong Chen
- College of Applied Health Sciences, University of Illinois at Chicago, 1919 W Taylor St, Chicago, Illinois 60612, U.S.A
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Hua Yun Chen
- School of Public Health, University of Illinois at Chicago, 2121 W Taylor St, Chicago, Illinois 60612, U.S.A
| |
Collapse
|
5
|
Fan J, Lou Z, Yu M. Are Latent Factor Regression and Sparse Regression Adequate? J Am Stat Assoc 2023. [DOI: 10.1080/01621459.2023.2169700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Affiliation(s)
- Jianqing Fan
- Frederick L. Moore ’18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University
| | - Zhipeng Lou
- Department of Operations Research and Financial Engineering, Princeton University
| | - Mengxin Yu
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
6
|
A unified precision matrix estimation framework via sparse column-wise inverse operator under weak sparsity. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00856-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
7
|
Carpentier A, Collier O, Comminges L, Tsybakov AB, Wang Y. Estimation of the ℓ2-norm and testing in sparse linear regression with unknown variance. BERNOULLI 2022. [DOI: 10.3150/21-bej1436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Olivier Collier
- Modal’X, Université Paris-Nanterre, Nanterre and CREST, Paris, France
| | | | | | - Yuhao Wang
- Tsinghua University, Beijing, China and Shanghai Qi Zhi Institute, Shanghai, China
| |
Collapse
|
8
|
Gao F, Wang T. Two-sample testing of high-dimensional linear regression coefficients via complementary sketching. Ann Stat 2022. [DOI: 10.1214/22-aos2216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Fengnan Gao
- School of Data Science, Shanghai Center for Mathematical Sciences, Fudan University
| | - Tengyao Wang
- Department of Statistics, London School of Economics
| |
Collapse
|
9
|
Cai TT, Zhang AR, Zhou Y. Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference. IEEE TRANSACTIONS ON INFORMATION THEORY 2022; 68:5975-6002. [PMID: 36865503 PMCID: PMC9974176 DOI: 10.1109/tit.2022.3175455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model - an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.
Collapse
Affiliation(s)
- T Tony Cai
- Department of Statistics & Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Anru R Zhang
- Departments of Biostatistics & Bioinformatics, Computer Science, Mathematics, and Statistical Science, Duke University, Durham, NC 27710
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706
| | - Yuchen Zhou
- Department of Statistics & Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706
| |
Collapse
|
10
|
Zamanzadeh A, Cavoli T. The effect of nonpharmaceutical interventions on COVID-19 infections for lower and middle-income countries: A debiased LASSO approach. PLoS One 2022; 17:e0271586. [PMID: 35867692 PMCID: PMC9307185 DOI: 10.1371/journal.pone.0271586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 07/05/2022] [Indexed: 11/18/2022] Open
Abstract
This paper investigates the determinants of COVID-19 infection in the first 100 days of government actions. Using a debiased LASSO estimator, we explore how different measures of government nonpharmaceutical interventions affect new infections of COVID-19 for 37 lower and middle-income countries (LMCs). We find that closing schools, stay-at-home restrictions, and contact tracing reduce the growth of new infections, as do economic support to households and the number of health care workers. Notably, we find no significant effects of business closures. Finally, infections become higher in countries with greater income inequality, higher tourist inflows, poorly educated adults, and weak governance quality. We conclude that several policy interventions reduce infection rates for poorer countries. Further, economic and institutional factors are important; thereby justifying the use, and ultimately success, of economic support to households during the initial infection period.
Collapse
Affiliation(s)
- Akbar Zamanzadeh
- UniSA Business School, University of South Australia, Adelaide, SA, Australia
- * E-mail:
| | - Tony Cavoli
- UniSA Business School, University of South Australia, Adelaide, SA, Australia
| |
Collapse
|
11
|
Liu Y, Pi P, Luo S. A semi-parametric approach to feature selection in high-dimensional linear regression models. Comput Stat 2022. [DOI: 10.1007/s00180-022-01254-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
12
|
Estimation of Error Variance in Regularized Regression Models via Adaptive Lasso. MATHEMATICS 2022. [DOI: 10.3390/math10111937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Estimation of error variance in a regression model is a fundamental problem in statistical modeling and inference. In high-dimensional linear models, variance estimation is a difficult problem, due to the issue of model selection. In this paper, we propose a novel approach for variance estimation that combines the reparameterization technique and the adaptive lasso, which is called the natural adaptive lasso. This method can, simultaneously, select and estimate the regression and variance parameters. Moreover, we show that the natural adaptive lasso, for regression parameters, is equivalent to the adaptive lasso. We establish the asymptotic properties of the natural adaptive lasso, for regression parameters, and derive the mean squared error bound for the variance estimator. Our theoretical results show that under appropriate regularity conditions, the natural adaptive lasso for error variance is closer to the so-called oracle estimator than some other existing methods. Finally, Monte Carlo simulations are presented, to demonstrate the superiority of the proposed method.
Collapse
|
13
|
Zhang Y, Politis DN. Ridge regression revisited: Debiasing, thresholding and bootstrap. Ann Stat 2022. [DOI: 10.1214/21-aos2156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yunyi Zhang
- Department of Mathematics, University of California, San Diego
| | - Dimitris N. Politis
- Department of Mathematics and Halicioglu Data Science Institute, University of California, San Diego
| |
Collapse
|
14
|
Klaassen S, Kueck J, Spindler M, Chernozhukov V. Uniform Inference in high-Dimensional Gaussian Graphical Models. Biometrika 2022. [DOI: 10.1093/biomet/asac030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
Graphical models have become a popular tool for representing dependencies within large sets of variables and are crucial for representing causal structures. We provide results for uniform inference on high-dimensional graphical models in which the number of target parameters d is potentially much larger than the sample size under approximate sparsity. Our results highlight how graphical models can be estimated and recovered using modern machine learning methods in high-dimensional complex settings. To construct simultaneous confidence regions on many target parameters, it is crucial to have sufficiently fast estimation rates of the nuisance functions. In this context, we establish uniform estimation rates and sparsity guarantees for the square-root lasso estimator in a random design under approximate sparsity conditions. These might be of independent interest for related problems in high dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties in comparison to existing methods, and we present two empirical applications.
Collapse
Affiliation(s)
- S Klaassen
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - J Kueck
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - M Spindler
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - V Chernozhukov
- Massachusetts Institute of Technology Department of Economics and Center for Statistics and Data Science, , 50 Memorial Drive, Cambridge, Massachusetts 02142, U.S.A.
| |
Collapse
|
15
|
Affiliation(s)
- Pierre C. Bellec
- Department of Statistics, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA
| | - Cun-Hui Zhang
- Department of Statistics, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
16
|
Lee S, Kim SC, Yu D. An efficient GPU-parallel coordinate descent algorithm for sparse precision matrix estimation via scaled lasso. Comput Stat 2022. [DOI: 10.1007/s00180-022-01224-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Wang B, Yan L, Duan X, Yu T, Zhang H. An integrated surrogate model constructing method: Annealing combinable Gaussian process. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
Contraction of a quasi-Bayesian model with shrinkage priors in precision matrix estimation. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Guo X, Li R, Liu J, Zeng M. High-dimensional mediation analysis for selecting DNA methylation Loci mediating childhood trauma and cortisol stress reactivity*. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2053136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Xu Guo
- School of Statistics, Beijing Normal University, P.R China
| | - Runze Li
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802-2111, USA
| | - Jingyuan Liu
- MOE Key Laboratory of Econometrics, Department of Statistics, School of Economics, Wang Yanan Institute for Studies in Economics
- Fujian Key Lab of Statistics, Xiamen University, P.R China
| | - Mudong Zeng
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802-2111, USA
| |
Collapse
|
20
|
Li X, Wang Y, Ruiz R. A Survey on Sparse Learning Models for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1642-1660. [PMID: 32386172 DOI: 10.1109/tcyb.2020.2982445] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection is important in both machine learning and pattern recognition. Successfully selecting informative features can significantly increase learning accuracy and improve result comprehensibility. Various methods have been proposed to identify informative features from high-dimensional data by removing redundant and irrelevant features to improve classification accuracy. In this article, we systematically survey existing sparse learning models for feature selection from the perspectives of individual sparse feature selection and group sparse feature selection, and analyze the differences and connections among various sparse learning models. Promising research directions and topics on sparse learning models are analyzed.
Collapse
|
21
|
Zhou RR, Zhao SD, Parast L. Estimation of the proportion of treatment effect explained by a high-dimensional surrogate. Stat Med 2022; 41:2227-2246. [PMID: 35189671 DOI: 10.1002/sim.9352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/23/2021] [Accepted: 01/27/2022] [Indexed: 11/07/2022]
Abstract
Clinical studies examining the effectiveness of a treatment with respect to some primary outcome often require long-term follow-up of patients and/or costly or burdensome measurements of the primary outcome of interest. Identifying a surrogate marker for the primary outcome of interest may allow one to evaluate a treatment effect with less follow-up time, less cost, or less burden. While much clinical and statistical work has focused on identifying and validating surrogate markers, available approaches tend to focus on settings in which only a single surrogate marker is of interest. Limited work has been done to accommodate the high-dimensional surrogate marker setting where the number of potential surrogates is greater than the sample size. In this article, we develop methods to estimate the proportion of treatment effect explained by high-dimensional surrogates. We study the asymptotic properties of our proposed estimator, propose inference procedures, and examine finite sample performance via a simulation study. We illustrate our proposed methods using data from a randomized study comparing a novel whey-based oral nutrition supplement with a standard supplement with respect to change in body fat percentage over 12 weeks, where the surrogate markers of interest are gene expression probesets.
Collapse
Affiliation(s)
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Layla Parast
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, USA
| |
Collapse
|
22
|
Li S, Cai TT, Li H. Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality. J R Stat Soc Series B Stat Methodol 2022; 84:149-173. [PMID: 35210933 PMCID: PMC8863181 DOI: 10.1111/rssb.12479] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
This paper considers estimation and prediction of a high-dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxiliary studies is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and show its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating data from multiple different tissues as auxiliary samples.
Collapse
Affiliation(s)
- Sai Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennvania, Philadelphia, PA 19104
| | - T. Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
23
|
Shah RD, Bühlmann P. Double-Estimation-Friendly Inference for High-Dimensional Misspecified Models. Stat Sci 2022. [DOI: 10.1214/22-sts850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Rajen D. Shah
- Rajen D. Shah is Professor of Statistics, Statistical Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Peter Bühlmann
- Peter Bühlmann is Professor of Statistics, Seminar for Statistics, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
24
|
Livne I, Azriel D, Goldberg Y. Improved estimators for semi-supervised high-dimensional regression model. Electron J Stat 2022. [DOI: 10.1214/22-ejs2070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ilan Livne
- The Faculty of Industrial Engineering and Management, Technion, Israel
| | - David Azriel
- The Faculty of Industrial Engineering and Management, Technion, Israel
| | - Yair Goldberg
- The Faculty of Industrial Engineering and Management, Technion, Israel
| |
Collapse
|
25
|
Liu X, Cong X, Li G, Maas K, Chen K. Multivariate log-contrast regression with sub-compositional predictors: Testing the association between preterm infants' gut microbiome and neurobehavioral outcomes. Stat Med 2021; 41:580-594. [PMID: 34897772 DOI: 10.1002/sim.9273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 09/25/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022]
Abstract
To link a clinical outcome with compositional predictors in microbiome analysis, the linear log-contrast model is a popular choice, and the inference procedure for assessing the significance of each covariate is also available. However, with the existence of multiple potentially interrelated outcomes and the information of the taxonomic hierarchy of bacteria, a multivariate analysis method that considers the group structure of compositional covariates and an accompanying group inference method are still lacking. Motivated by a study for identifying the microbes in the gut microbiome of preterm infants that impact their later neurobehavioral outcomes, we formulate a constrained integrative multi-view regression. The neurobehavioral scores form multivariate responses, the log-transformed sub-compositional microbiome data form multi-view feature matrices, and a set of linear constraints on their corresponding sub-coefficient matrices ensures the sub-compositional nature. We assume all the sub-coefficient matrices are possible of low-rank to enable joint selection and inference of sub-compositions/views. We propose a scaled composite nuclear norm penalization approach for model estimation and develop a hypothesis testing procedure through de-biasing to assess the significance of different views. Simulation studies confirm the effectiveness of the proposed procedure. We apply the method to the preterm infant study, and the identified microbes are mostly consistent with existing studies and biological understandings.
Collapse
Affiliation(s)
- Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Xiaomei Cong
- School of Nursing, University of Connecticut, Storrs, Connecticut, USA
| | - Gen Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kendra Maas
- Microbial Analysis, Resources, and Services, University of Connecticut, Storrs, Connecticut, USA
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
26
|
Deshpande Y, Javanmard A, Mehrabi M. Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1979011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yash Deshpande
- Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, MA
| | - Adel Javanmard
- Data Sciences and Operations Department, University of Southern California, Los Angeles, CA
| | - Mohammad Mehrabi
- Data Sciences and Operations Department, University of Southern California, Los Angeles, CA
| |
Collapse
|
27
|
Zhao S, Witten D, Shojaie A. In Defense of the Indefensible: A Very Naïve Approach to High-Dimensional Inference. Stat Sci 2021; 36:562-577. [PMID: 37860618 PMCID: PMC10586523 DOI: 10.1214/20-sts815] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very naïve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the naïve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the naïve confidence interval, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the naïve score test, which can be used to test the hypotheses regarding the full-model regression coefficients.
Collapse
Affiliation(s)
- Sen Zhao
- 1600Amphitheatre Parkway, Mountain View, California 94043, USA
| | - Daniela Witten
- University of Washington, Health Sciences Building, Box 357232, Seattle, Washington 98195, USA
| | - Ali Shojaie
- University of Washington, Health Sciences Building, Box 357232, Seattle, Washington 98195, USA
| |
Collapse
|
28
|
Zhang Q. High-Dimensional Mediation Analysis with Applications to Causal Gene Identification. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09328-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
29
|
Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process. Comput Stat 2021. [DOI: 10.1007/s00180-021-01148-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractStatistical testing is classically used as an exploratory tool to search for association between a phenotype and many possible explanatory variables. This approach often leads to multiple testing under dependence. We assume a hierarchical structure between tests via an Ornstein-Uhlenbeck process on a tree. The process correlation structure is used for smoothing the p-values. We design a penalized estimation of the mean of the Ornstein-Uhlenbeck process for p-value computation. The performances of the algorithm are assessed via simulations. Its ability to discover new associations is demonstrated on a metagenomic dataset. The corresponding R package is available from https://github.com/abichat/zazou.
Collapse
|
30
|
Freijeiro‐González L, Febrero‐Bande M, González‐Manteiga W. A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates. Int Stat Rev 2021. [DOI: 10.1111/insr.12469] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Laura Freijeiro‐González
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| | - Manuel Febrero‐Bande
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| | - Wenceslao González‐Manteiga
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| |
Collapse
|
31
|
Liu Y, Gao Y, Fang R, Cao H, Sa J, Wang J, Liu H, Wang T, Cui Y. Identifying complex gene-gene interactions: a mixed kernel omnibus testing approach. Brief Bioinform 2021; 22:6346804. [PMID: 34373892 DOI: 10.1093/bib/bbab305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/01/2021] [Accepted: 07/17/2021] [Indexed: 11/12/2022] Open
Abstract
Genes do not function independently; rather, they interact with each other to fulfill their joint tasks. Identification of gene-gene interactions has been critically important in elucidating the molecular mechanisms responsible for the variation of a phenotype. Regression models are commonly used to model the interaction between two genes with a linear product term. The interaction effect of two genes can be linear or nonlinear, depending on the true nature of the data. When nonlinear interactions exist, the linear interaction model may not be able to detect such interactions; hence, it suffers from substantial power loss. While the true interaction mechanism (linear or nonlinear) is generally unknown in practice, it is critical to develop statistical methods that can be flexible to capture the underlying interaction mechanism without assuming a specific model assumption. In this study, we develop a mixed kernel function which combines both linear and Gaussian kernels with different weights to capture the linear or nonlinear interaction of two genes. Instead of optimizing the weight function, we propose a grid search strategy and use a Cauchy transformation of the P-values obtained under different weights to aggregate the P-values. We further extend the two-gene interaction model to a high-dimensional setup using a de-biased LASSO algorithm. Extensive simulation studies are conducted to verify the performance of the proposed method. Application to two case studies further demonstrates the utility of the model. Our method provides a flexible and computationally efficient tool for disentangling complex gene-gene interactions associated with complex traits.
Collapse
Affiliation(s)
- Yan Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuzhao Gao
- School of Statistics, Shanxi University of Finance and Economics, Taiyuan, PR China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jian Sa
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Hongqi Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Tong Wang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
32
|
Bellec PC, Zhang CH. Second-order Stein: SURE for SURE and other applications in high-dimensional inference. Ann Stat 2021. [DOI: 10.1214/20-aos2005] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
33
|
Zhou K, Li KC, Zhou Q. Honest Confidence Sets for High-Dimensional Regression by Projection and Shrinkage. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1938581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Kun Zhou
- Department of Statistics, University of California, Los Angeles, CA
| | - Ker-Chau Li
- Department of Statistics, University of California, Los Angeles, CA
- Institute of Statistical Science, Academia Sinica, Nangang, Taiwan
| | - Qing Zhou
- Department of Statistics, University of California, Los Angeles, CA
| |
Collapse
|
34
|
Qiu Y, Zhou XH. Inference on Multi-level Partial Correlations Based on Multi-subject Time Series Data. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1917417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Yumou Qiu
- Department of Statistics, Iowa State University, Ames, IA
| | - Xiao-Hua Zhou
- Beijing International Center for Mathematical Research, Department of Biostatistics, and National Engineering Lab for Big Data Analysis and Applications, Peking University, Beijing, China
| |
Collapse
|
35
|
Yu Q, Li Y, Wang Y, Yang Y, Zheng Z. Scalable and efficient inference via CPE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1936044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Qin Yu
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yang Li
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yumeng Wang
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yachong Yang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Zemin Zheng
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| |
Collapse
|
36
|
Zhou J, Zheng Z, Zhou H, Dong R. Innovated scalable efficient inference for ultra-large graphical models. Stat Probab Lett 2021. [DOI: 10.1016/j.spl.2021.109085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
37
|
Comminges L, Collier O, Ndaoud M, Tsybakov AB. Adaptive robust estimation in sparse vector model. Ann Stat 2021. [DOI: 10.1214/20-aos2002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- L. Comminges
- CEREMADE, Université Paris-Dauphine, PSL and CREST
| | - O. Collier
- Modal’X, UPL, Université Paris Nanterre and CREST
| | | | | |
Collapse
|
38
|
Evaluating Visual Properties via Robust HodgeRank. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01438-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
39
|
Law M, Ritov Y. Inference without compatibility: Using exponential weighting for inference on a parameter of a linear model. BERNOULLI 2021. [DOI: 10.3150/20-bej1280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Michael Law
- Department of Statistics, University of Michigan, Ann Arbor, USA
| | - Ya’acov Ritov
- Department of Statistics, University of Michigan, Ann Arbor, USA
| |
Collapse
|
40
|
Li S, Cai TT, Li H. Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. J Am Stat Assoc 2021; 117:1835-1846. [PMID: 36793369 PMCID: PMC9928173 DOI: 10.1080/01621459.2021.1888740] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 01/15/2021] [Accepted: 02/04/2021] [Indexed: 10/22/2022]
Abstract
Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.
Collapse
Affiliation(s)
- Sai Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
41
|
Guo X, Cheng G. Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1893177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiao Guo
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Guang Cheng
- Department of Statistics, Purdue University, West Lafayette, IN
| |
Collapse
|
42
|
Zheng Z, Liu L, Li Y, Zhao N. High-dimensional statistical inference via DATE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1909733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Zemin Zheng
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Lei Liu
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yang Li
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Ni Zhao
- School of Mathematics and Physics Sciences, Anhui Jianzhu University, Hefei, Anhui, P. R. China
| |
Collapse
|
43
|
Wang Y, Zhao SD. A nonparametric empirical Bayes approach to large-scale multivariate regression. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107130] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
44
|
Pun CS, Hadimaja MZ. A self-calibrated direct approach to precision matrix estimation and linear discriminant analysis in high dimensions. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
45
|
Kim J, Zhu H, Wang X, Do K. Scalable network estimation with
L
0
penalty. Stat Anal Data Min 2021; 14:18-30. [DOI: 10.1002/sam.11483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Junghi Kim
- Center for Drug Evaluation and Research U.S. Food and Drug Administration Silver Spring Maryland USA
| | - Hongtu Zhu
- Department of Biostatistics University of North Carolina Chapel Hill North Carolina USA
| | - Xiao Wang
- Department of Statistics Purdue University West Lafayette Indiana USA
| | - Kim‐Anh Do
- Department of Biostatistics University of Texas MD Anderson Cancer Center Houston Texas USA
| |
Collapse
|
46
|
Affiliation(s)
- Xiaoou Pan
- Department of Mathematics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Qiang Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
| | - Wen-Xin Zhou
- Department of Mathematics, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
47
|
Fan J, Ma C, Wang K. Comment on “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1837138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University , Princeton , NJ
| | - Cong Ma
- Department of Electrical Engineering and Computer Sciences, UC Berkeley , Berkeley , CA
| | - Kaizheng Wang
- Department of Industrial Engineering and Operations Research, Columbia University , New York , NY
| |
Collapse
|
48
|
Wang L, Peng B, Bradic J, Li R, Wu Y. Rejoinder to “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1843865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Lan Wang
- Department of Management Science, University of Miami , Coral Gables , FL
| | - Bo Peng
- Adobe Systems, Inc. San Jose , CA
| | - Jelena Bradic
- Department of Mathematics, Halicioglu Data Science Institute, University of California at San Diego , La Jolla , CA
| | - Runze Li
- Department of Statistics, Pennsylvania State University , University Park , PA
| | - Yunan Wu
- School of Statistics, University of Minnesota , Minneapolis , MN
| |
Collapse
|
49
|
Li X, Shojaie A. Discussion of “A Tuning-Free Robust and Efficient Approach to High-Dimensional Regression”. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1837139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiudi Li
- Department of Biostatistics, University of Washington , Seattle , WA , USA
| | - Ali Shojaie
- Department of Biostatistics, University of Washington , Seattle , WA , USA
| |
Collapse
|
50
|
Lin L, Drton M, Shojaie A. Statistical significance in high-dimensional linear mixed models. FODS '20 : PROCEEDINGS OF THE 2020 ACM-IMS FOUNDATIONS OF DATA SCIENCE CONFERENCE : OCTOBER 19-20, 2020, VIRTUAL EVENT, USA. ACM-IMS FOUNDATIONS OF DATA SCIENCE CONFERENCE (2020 : ONLINE) 2020; 2020:171-181. [PMID: 35497571 PMCID: PMC9053448 DOI: 10.1145/3412815.3416883] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper concerns the development of an inferential framework for high-dimensional linear mixed effect models. These are suitable models, for instance, when we have n repeated measurements for M subjects. We consider a scenario where the number of fixed effects p is large (and may be larger than M), but the number of random effects q is small. Our framework is inspired by a recent line of work that proposes de-biasing penalized estimators to perform inference for high-dimensional linear models with fixed effects only. In particular, we demonstrate how to correct a 'naive' ridge estimator in extension of work by Bühlmann (2013) to build asymptotically valid confidence intervals for mixed effect models. We validate our theoretical results with numerical experiments, in which we show our method outperforms those that fail to account for correlation induced by the random effects. For a practical demonstration we consider a riboflavin production dataset that exhibits group structure, and show that conclusions drawn using our method are consistent with those obtained on a similar dataset without group structure.
Collapse
Affiliation(s)
- Lina Lin
- Department of Statistics, University of Washington
| | - Mathias Drton
- Department of Mathematics, Technical University of Munich
| | - Ali Shojaie
- Department of Biostatistics, University of Washington
| |
Collapse
|