1
|
Ahmed SE, Arabi Belaghi R, Hussein AA. Efficient Post-Shrinkage Estimation Strategies in High-Dimensional Cox's Proportional Hazards Models. ENTROPY (BASEL, SWITZERLAND) 2025; 27:254. [PMID: 40149178 PMCID: PMC11941331 DOI: 10.3390/e27030254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2025] [Revised: 02/17/2025] [Accepted: 02/19/2025] [Indexed: 03/29/2025]
Abstract
Regularization methods such as LASSO, adaptive LASSO, Elastic-Net, and SCAD are widely employed for variable selection in statistical modeling. However, these methods primarily focus on variables with strong effects while often overlooking weaker signals, potentially leading to biased parameter estimates. To address this limitation, Gao, Ahmed, and Feng (2017) introduced a corrected shrinkage estimator that incorporates both weak and strong signals, though their results were confined to linear models. The applicability of such approaches to survival data remains unclear, despite the prevalence of survival regression involving both strong and weak effects in biomedical research. To bridge this gap, we propose a novel class of post-selection shrinkage estimators tailored to the Cox model framework. We establish the asymptotic properties of the proposed estimators and demonstrate their potential to enhance estimation and prediction accuracy through simulations that explicitly incorporate weak signals. Finally, we validate the practical utility of our approach by applying it to two real-world datasets, showcasing its advantages over existing methods.
Collapse
Affiliation(s)
- Syed Ejaz Ahmed
- Department of Mathematics and Statistics, Brock University, St. Catharines, ON L2S 3A1, Canada;
| | - Reza Arabi Belaghi
- Department of Energy and Technology, Swedish University of Agricultural Sciences, P.O. Box 7032, 750 07 Uppsala, Sweden;
| | | |
Collapse
|
2
|
Adcock B, Brugiapaglia S, Dexter N, Moraga S. Near-optimal learning of Banach-valued, high-dimensional functions via deep neural networks. Neural Netw 2025; 181:106761. [PMID: 39454372 DOI: 10.1016/j.neunet.2024.106761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 07/15/2024] [Accepted: 09/23/2024] [Indexed: 10/28/2024]
Abstract
The past decade has seen increasing interest in applying Deep Learning (DL) to Computational Science and Engineering (CSE). Driven by impressive results in applications such as computer vision, Uncertainty Quantification (UQ), genetics, simulations and image processing, DL is increasingly supplanting classical algorithms, and seems poised to revolutionize scientific computing. However, DL is not yet well-understood from the standpoint of numerical analysis. Little is known about the efficiency and reliability of DL from the perspectives of stability, robustness, accuracy, and, crucially, sample complexity. For example, approximating solutions to parametric PDEs is a key task in UQ for CSE. Yet, training data for such problems is often scarce and corrupted by errors. Moreover, the target function, while often smooth, is a potentially infinite-dimensional function taking values in the PDE solution space, which is generally an infinite-dimensional Banach space. This paper provides arguments for Deep Neural Network (DNN) approximation of such functions, with both known and unknown parametric dependence, that overcome the curse of dimensionality. We establish practical existence theorems that describe classes of DNNs with dimension-independent architecture widths and depths, and training procedures based on minimizing a (regularized) ℓ2-loss which achieve near-optimal algebraic rates of convergence in terms of the amount of training data m. These results involve key extensions of compressed sensing for recovering Banach-valued vectors and polynomial emulation with DNNs. When approximating solutions of parametric PDEs, our results account for all sources of error, i.e., sampling, optimization, approximation and physical discretization, and allow for training high-fidelity DNN approximations from coarse-grained sample data. Our theoretical results fall into the category of non-intrusive methods, providing a theoretical alternative to classical methods for high-dimensional approximation.
Collapse
Affiliation(s)
- Ben Adcock
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby BC, Canada, V5A 1S6.
| | - Simone Brugiapaglia
- Department of Mathematics and Statistics, Concordia University, J.W. McConnell Building, 1400 De Maisonneuve Blvd. W., Montréal, QC, Canada, H3G 1M8.
| | - Nick Dexter
- Department of Scientific Computing, Florida State University, 400 Dirac Science Library, Tallahassee, FL, 32306-4120, USA.
| | - Sebastian Moraga
- Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby BC, Canada, V5A 1S6.
| |
Collapse
|
3
|
Zhao H, Wang T. Debiased high-dimensional regression calibration for errors-in-variables log-contrast models. Biometrics 2024; 80:ujae153. [PMID: 39679737 DOI: 10.1093/biomtc/ujae153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 10/31/2024] [Accepted: 11/25/2024] [Indexed: 12/17/2024]
Abstract
Motivated by the challenges in analyzing gut microbiome and metagenomic data, this work aims to tackle the issue of measurement errors in high-dimensional regression models that involve compositional covariates. This paper marks a pioneering effort in conducting statistical inference on high-dimensional compositional data affected by mismeasured or contaminated data. We introduce a calibration approach tailored for the linear log-contrast model. Under relatively lenient conditions regarding the sparsity level of the parameter, we have established the asymptotic normality of the estimator for inference. Numerical experiments and an application in microbiome study have demonstrated the efficacy of our high-dimensional calibration strategy in minimizing bias and achieving the expected coverage rates for confidence intervals. Moreover, the potential application of our proposed methodology extends well beyond compositional data, suggesting its adaptability for a wide range of research contexts.
Collapse
Affiliation(s)
- Huali Zhao
- Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China
| | - Tianying Wang
- Department of Statistics, Colorado State University, Fort Collins, CO 80523, United States
| |
Collapse
|
4
|
Fan K, Subedi S, Yang G, Lu X, Ren J, Wu C. Is Seeing Believing? A Practitioner's Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies. ENTROPY (BASEL, SWITZERLAND) 2024; 26:794. [PMID: 39330127 PMCID: PMC11430850 DOI: 10.3390/e26090794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/23/2024] [Accepted: 09/06/2024] [Indexed: 09/28/2024]
Abstract
Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, p values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the "unpenalized loss function + penalty term" formulation for regularization methods and the "likelihood function × shrinkage prior" framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.
Collapse
Affiliation(s)
- Kun Fan
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Srijana Subedi
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Gongshun Yang
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| | - Xi Lu
- Department of Pharmaceutical Health Outcomes and Policy, College of Pharmacy, University of Houston, Houston, TX 77204, USA
| | - Jie Ren
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA
| |
Collapse
|
5
|
Qin X, Hu J, Ma S, Wu M. Estimation of multiple networks with common structures in heterogeneous subgroups. J MULTIVARIATE ANAL 2024; 202:105298. [PMID: 38433779 PMCID: PMC10907012 DOI: 10.1016/j.jmva.2024.105298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Network estimation has been a critical component of high-dimensional data analysis and can provide an understanding of the underlying complex dependence structures. Among the existing studies, Gaussian graphical models have been highly popular. However, they still have limitations due to the homogeneous distribution assumption and the fact that they are only applicable to small-scale data. For example, cancers have various levels of unknown heterogeneity, and biological networks, which include thousands of molecular components, often differ across subgroups while also sharing some commonalities. In this article, we propose a new joint estimation approach for multiple networks with unknown sample heterogeneity, by decomposing the Gaussian graphical model (GGM) into a collection of sparse regression problems. A reparameterization technique and a composite minimax concave penalty are introduced to effectively accommodate the specific and common information across the networks of multiple subgroups, making the proposed estimator significantly advancing from the existing heterogeneity network analysis based on the regularized likelihood of GGM directly and enjoying scale-invariant, tuning-insensitive, and optimization convexity properties. The proposed analysis can be effectively realized using parallel computing. The estimation and selection consistency properties are rigorously established. The proposed approach allows the theoretical studies to focus on independent network estimation only and has the significant advantage of being both theoretically and computationally applicable to large-scale data. Extensive numerical experiments with simulated data and the TCGA breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.
Collapse
Affiliation(s)
- Xing Qin
- School of Statistics and Information, Shanghai University of International Business and Economics, Shanghai, China
| | - Jianhua Hu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, New Haven, USA
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| |
Collapse
|
6
|
Zhou RR, Zucker DM, Zhao SD. Power of testing for exposure effects under incomplete mediation. Int J Biostat 2024; 20:217-228. [PMID: 37084462 DOI: 10.1515/ijb-2022-0106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 03/25/2023] [Indexed: 04/23/2023]
Abstract
Mediation analysis studies situations where an exposure may affect an outcome both directly and indirectly through intervening variables called mediators. It is frequently of interest to test for the effect of the exposure on the outcome, and the standard approach is simply to regress the latter on the former. However, it seems plausible that a more powerful test statistic could be achieved by also incorporating the mediators. This would be useful in cases where the exposure effect size might be small, which for example is common in genomics applications. Previous work has shown that this is indeed possible under complete mediation, where there is no direct effect. In most applications, however, the direct effect is likely nonzero. In this paper we study linear mediation models and find that under certain conditions, power gain is still possible under this incomplete mediation setting for testing the null hypothesis that there is neither a direct nor an indirect effect. We study a class of procedures that can achieve this performance and develop their application to both low- and high-dimensional mediators. We then illustrate their performances in simulations as well as in an analysis using DNA methylation mediators to study the effect of cigarette smoking on gene expression.
Collapse
Affiliation(s)
| | - David M Zucker
- Department of Statistics and Data Science, Hebrew University, Jerusalem, Israel
| | - Sihai D Zhao
- Department of Statistics, University of Illinois Urbana-Champaign, Champaign, IL, USA
| |
Collapse
|
7
|
Wang Y, Shojaie A, Randolph T, Knight P, Ma J. GENERALIZED MATRIX DECOMPOSITION REGRESSION: ESTIMATION AND INFERENCE FOR TWO-WAY STRUCTURED DATA. Ann Appl Stat 2023; 17:2944-2969. [PMID: 38149262 PMCID: PMC10751029 DOI: 10.1214/23-aoas1746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Motivated by emerging applications in ecology, microbiology, and neuroscience, this paper studies high-dimensional regression with two-way structured data. To estimate the high-dimensional coefficient vector, we propose the generalized matrix decomposition regression (GMDR) to efficiently leverage auxiliary information on row and column structures. GMDR extends the principal component regression (PCR) to two-way structured data, but unlike PCR, GMDR selects the components that are most predictive of the outcome, leading to more accurate prediction. For inference on regression coefficients of individual variables, we propose the generalized matrix decomposition inference (GMDI), a general high-dimensional inferential framework for a large family of estimators that include the proposed GMDR estimator. GMDI provides more flexibility for incorporating relevant auxiliary row and column structures. As a result, GMDI does not require the true regression coefficients to be sparse, but constrains the coordinate system representing the regression coefficients according to the column structure. GMDI also allows dependent and heteroscedastic observations. We study the theoretical properties of GMDI in terms of both the type-I error rate and power and demonstrate the effectiveness of GMDR and GMDI in simulation studies and an application to human microbiome data.
Collapse
Affiliation(s)
- Yue Wang
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus
| | - Ali Shojaie
- Department of Biostatistics, University of Washington
| | | | | | - Jing Ma
- Public Health Sciences Division, Fred Hutchinson Cancer Center
| |
Collapse
|
8
|
Chen J, Li Q, Chen HY. Testing generalized linear models with high-dimensional nuisance parameter. Biometrika 2023; 110:83-99. [PMID: 36816791 PMCID: PMC9933885 DOI: 10.1093/biomet/asac021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Generalized linear models often have a high-dimensional nuisance parameters, as seen in applications such as testing gene-environment interactions or gene-gene interactions. In these scenarios, it is essential to test the significance of a high-dimensional sub-vector of the model's coefficients. Although some existing methods can tackle this problem, they often rely on the bootstrap to approximate the asymptotic distribution of the test statistic, and thus are computationally expensive. Here, we propose a computationally efficient test with a closed-form limiting distribution, which allows the parameter being tested to be either sparse or dense. We show that under certain regularity conditions, the type I error of the proposed method is asymptotically correct, and we establish its power under high-dimensional alternatives. Extensive simulations demonstrate the good performance of the proposed test and its robustness when certain sparsity assumptions are violated. We also apply the proposed method to Chinese famine sample data in order to show its performance when testing the significance of gene-environment interactions.
Collapse
Affiliation(s)
- Jinsong Chen
- College of Applied Health Sciences, University of Illinois at Chicago, 1919 W Taylor St, Chicago, Illinois 60612, U.S.A
| | - Quefeng Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Hua Yun Chen
- School of Public Health, University of Illinois at Chicago, 2121 W Taylor St, Chicago, Illinois 60612, U.S.A
| |
Collapse
|
9
|
Fan J, Lou Z, Yu M. Are Latent Factor Regression and Sparse Regression Adequate? J Am Stat Assoc 2023; 119:1076-1088. [PMID: 39268549 PMCID: PMC11390100 DOI: 10.1080/01621459.2023.2169700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 01/13/2023] [Indexed: 01/19/2023]
Abstract
We propose the Factor Augmented (sparse linear) Regression Model (FARM) that not only admits both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1 + ϑ) -th moment, for all ϑ > 0) respectively. In addition, the existing works on supervised learning often assume the latent factor regression or sparse linear regression is the true underlying model without justifying its adequacy. To fill in such an important gap on high-dimensional inference, we also leverage our model as the alternative model to test the sufficiency of the latent factor regression and the sparse linear regression models. To accomplish these goals, we propose the Factor-Adjusted deBiased Test (FabTest) and a two-stage ANOVA type test respectively. We also conduct large-scale numerical experiments including both synthetic and FRED macroeconomics data to corroborate the theoretical properties of our methods. Numerical results illustrate the robustness and effectiveness of our model against latent factor regression and sparse linear regression models.
Collapse
Affiliation(s)
- Jianqing Fan
- Frederick L. Moore '18 Professor of Finance, Professor of Statistics, and Professor of Operations Research and Financial Engineering at the Princeton University
| | - Zhipeng Lou
- Department of Operations Research and Financial Engineering, Princeton University
| | - Mengxin Yu
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
10
|
A unified precision matrix estimation framework via sparse column-wise inverse operator under weak sparsity. ANN I STAT MATH 2022. [DOI: 10.1007/s10463-022-00856-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
11
|
Carpentier A, Collier O, Comminges L, Tsybakov AB, Wang Y. Estimation of the ℓ2-norm and testing in sparse linear regression with unknown variance. BERNOULLI 2022. [DOI: 10.3150/21-bej1436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Olivier Collier
- Modal’X, Université Paris-Nanterre, Nanterre and CREST, Paris, France
| | | | | | - Yuhao Wang
- Tsinghua University, Beijing, China and Shanghai Qi Zhi Institute, Shanghai, China
| |
Collapse
|
12
|
Gao F, Wang T. Two-sample testing of high-dimensional linear regression coefficients via complementary sketching. Ann Stat 2022. [DOI: 10.1214/22-aos2216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Fengnan Gao
- School of Data Science, Shanghai Center for Mathematical Sciences, Fudan University
| | - Tengyao Wang
- Department of Statistics, London School of Economics
| |
Collapse
|
13
|
Cai TT, Zhang AR, Zhou Y. Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference. IEEE TRANSACTIONS ON INFORMATION THEORY 2022; 68:5975-6002. [PMID: 36865503 PMCID: PMC9974176 DOI: 10.1109/tit.2022.3175455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model - an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.
Collapse
Affiliation(s)
- T Tony Cai
- Department of Statistics & Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Anru R Zhang
- Departments of Biostatistics & Bioinformatics, Computer Science, Mathematics, and Statistical Science, Duke University, Durham, NC 27710
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706
| | - Yuchen Zhou
- Department of Statistics & Data Science, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
- Department of Statistics, University of Wisconsin-Madison, Madison, WI 53706
| |
Collapse
|
14
|
Zamanzadeh A, Cavoli T. The effect of nonpharmaceutical interventions on COVID-19 infections for lower and middle-income countries: A debiased LASSO approach. PLoS One 2022; 17:e0271586. [PMID: 35867692 PMCID: PMC9307185 DOI: 10.1371/journal.pone.0271586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 07/05/2022] [Indexed: 11/18/2022] Open
Abstract
This paper investigates the determinants of COVID-19 infection in the first 100 days of government actions. Using a debiased LASSO estimator, we explore how different measures of government nonpharmaceutical interventions affect new infections of COVID-19 for 37 lower and middle-income countries (LMCs). We find that closing schools, stay-at-home restrictions, and contact tracing reduce the growth of new infections, as do economic support to households and the number of health care workers. Notably, we find no significant effects of business closures. Finally, infections become higher in countries with greater income inequality, higher tourist inflows, poorly educated adults, and weak governance quality. We conclude that several policy interventions reduce infection rates for poorer countries. Further, economic and institutional factors are important; thereby justifying the use, and ultimately success, of economic support to households during the initial infection period.
Collapse
Affiliation(s)
- Akbar Zamanzadeh
- UniSA Business School, University of South Australia, Adelaide, SA, Australia
| | - Tony Cavoli
- UniSA Business School, University of South Australia, Adelaide, SA, Australia
| |
Collapse
|
15
|
Liu Y, Pi P, Luo S. A semi-parametric approach to feature selection in high-dimensional linear regression models. Comput Stat 2022. [DOI: 10.1007/s00180-022-01254-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Estimation of Error Variance in Regularized Regression Models via Adaptive Lasso. MATHEMATICS 2022. [DOI: 10.3390/math10111937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Estimation of error variance in a regression model is a fundamental problem in statistical modeling and inference. In high-dimensional linear models, variance estimation is a difficult problem, due to the issue of model selection. In this paper, we propose a novel approach for variance estimation that combines the reparameterization technique and the adaptive lasso, which is called the natural adaptive lasso. This method can, simultaneously, select and estimate the regression and variance parameters. Moreover, we show that the natural adaptive lasso, for regression parameters, is equivalent to the adaptive lasso. We establish the asymptotic properties of the natural adaptive lasso, for regression parameters, and derive the mean squared error bound for the variance estimator. Our theoretical results show that under appropriate regularity conditions, the natural adaptive lasso for error variance is closer to the so-called oracle estimator than some other existing methods. Finally, Monte Carlo simulations are presented, to demonstrate the superiority of the proposed method.
Collapse
|
17
|
Zhang Y, Politis DN. Ridge regression revisited: Debiasing, thresholding and bootstrap. Ann Stat 2022. [DOI: 10.1214/21-aos2156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yunyi Zhang
- Department of Mathematics, University of California, San Diego
| | - Dimitris N. Politis
- Department of Mathematics and Halicioglu Data Science Institute, University of California, San Diego
| |
Collapse
|
18
|
Klaassen S, Kueck J, Spindler M, Chernozhukov V. Uniform Inference in high-Dimensional Gaussian Graphical Models. Biometrika 2022. [DOI: 10.1093/biomet/asac030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
Graphical models have become a popular tool for representing dependencies within large sets of variables and are crucial for representing causal structures. We provide results for uniform inference on high-dimensional graphical models in which the number of target parameters d is potentially much larger than the sample size under approximate sparsity. Our results highlight how graphical models can be estimated and recovered using modern machine learning methods in high-dimensional complex settings. To construct simultaneous confidence regions on many target parameters, it is crucial to have sufficiently fast estimation rates of the nuisance functions. In this context, we establish uniform estimation rates and sparsity guarantees for the square-root lasso estimator in a random design under approximate sparsity conditions. These might be of independent interest for related problems in high dimensions. We also demonstrate in a comprehensive simulation study that our procedure has good small sample properties in comparison to existing methods, and we present two empirical applications.
Collapse
Affiliation(s)
- S Klaassen
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - J Kueck
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - M Spindler
- University of Hamburg Department of Statistics, , Moorweidenstr. 18, 20148 Hamburg, Germany
| | - V Chernozhukov
- Massachusetts Institute of Technology Department of Economics and Center for Statistics and Data Science, , 50 Memorial Drive, Cambridge, Massachusetts 02142, U.S.A.
| |
Collapse
|
19
|
Affiliation(s)
- Pierre C. Bellec
- Department of Statistics, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA
| | - Cun-Hui Zhang
- Department of Statistics, Hill Center, Busch Campus, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
20
|
Lee S, Kim SC, Yu D. An efficient GPU-parallel coordinate descent algorithm for sparse precision matrix estimation via scaled lasso. Comput Stat 2022. [DOI: 10.1007/s00180-022-01224-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Wang B, Yan L, Duan X, Yu T, Zhang H. An integrated surrogate model constructing method: Annealing combinable Gaussian process. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.01.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
22
|
Contraction of a quasi-Bayesian model with shrinkage priors in precision matrix estimation. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2022.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
23
|
Guo X, Li R, Liu J, Zeng M. High-dimensional mediation analysis for selecting DNA methylation Loci mediating childhood trauma and cortisol stress reactivity*. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2053136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Xu Guo
- School of Statistics, Beijing Normal University, P.R China
| | - Runze Li
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802-2111, USA
| | - Jingyuan Liu
- MOE Key Laboratory of Econometrics, Department of Statistics, School of Economics, Wang Yanan Institute for Studies in Economics
- Fujian Key Lab of Statistics, Xiamen University, P.R China
| | - Mudong Zeng
- Department of Statistics, The Pennsylvania State University, University Park, PA 16802-2111, USA
| |
Collapse
|
24
|
Li X, Wang Y, Ruiz R. A Survey on Sparse Learning Models for Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:1642-1660. [PMID: 32386172 DOI: 10.1109/tcyb.2020.2982445] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Feature selection is important in both machine learning and pattern recognition. Successfully selecting informative features can significantly increase learning accuracy and improve result comprehensibility. Various methods have been proposed to identify informative features from high-dimensional data by removing redundant and irrelevant features to improve classification accuracy. In this article, we systematically survey existing sparse learning models for feature selection from the perspectives of individual sparse feature selection and group sparse feature selection, and analyze the differences and connections among various sparse learning models. Promising research directions and topics on sparse learning models are analyzed.
Collapse
|
25
|
Zhou RR, Zhao SD, Parast L. Estimation of the proportion of treatment effect explained by a high-dimensional surrogate. Stat Med 2022; 41:2227-2246. [PMID: 35189671 DOI: 10.1002/sim.9352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 12/23/2021] [Accepted: 01/27/2022] [Indexed: 11/07/2022]
Abstract
Clinical studies examining the effectiveness of a treatment with respect to some primary outcome often require long-term follow-up of patients and/or costly or burdensome measurements of the primary outcome of interest. Identifying a surrogate marker for the primary outcome of interest may allow one to evaluate a treatment effect with less follow-up time, less cost, or less burden. While much clinical and statistical work has focused on identifying and validating surrogate markers, available approaches tend to focus on settings in which only a single surrogate marker is of interest. Limited work has been done to accommodate the high-dimensional surrogate marker setting where the number of potential surrogates is greater than the sample size. In this article, we develop methods to estimate the proportion of treatment effect explained by high-dimensional surrogates. We study the asymptotic properties of our proposed estimator, propose inference procedures, and examine finite sample performance via a simulation study. We illustrate our proposed methods using data from a randomized study comparing a novel whey-based oral nutrition supplement with a standard supplement with respect to change in body fat percentage over 12 weeks, where the surrogate markers of interest are gene expression probesets.
Collapse
Affiliation(s)
| | - Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois, USA
| | - Layla Parast
- Department of Statistics and Data Sciences, University of Texas at Austin, Austin, USA
| |
Collapse
|
26
|
Liu X, Cong X, Li G, Maas K, Chen K. Multivariate log-contrast regression with sub-compositional predictors: Testing the association between preterm infants' gut microbiome and neurobehavioral outcomes. Stat Med 2022; 41:580-594. [PMID: 34897772 DOI: 10.1002/sim.9273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 09/25/2021] [Accepted: 11/15/2021] [Indexed: 11/10/2022]
Abstract
To link a clinical outcome with compositional predictors in microbiome analysis, the linear log-contrast model is a popular choice, and the inference procedure for assessing the significance of each covariate is also available. However, with the existence of multiple potentially interrelated outcomes and the information of the taxonomic hierarchy of bacteria, a multivariate analysis method that considers the group structure of compositional covariates and an accompanying group inference method are still lacking. Motivated by a study for identifying the microbes in the gut microbiome of preterm infants that impact their later neurobehavioral outcomes, we formulate a constrained integrative multi-view regression. The neurobehavioral scores form multivariate responses, the log-transformed sub-compositional microbiome data form multi-view feature matrices, and a set of linear constraints on their corresponding sub-coefficient matrices ensures the sub-compositional nature. We assume all the sub-coefficient matrices are possible of low-rank to enable joint selection and inference of sub-compositions/views. We propose a scaled composite nuclear norm penalization approach for model estimation and develop a hypothesis testing procedure through de-biasing to assess the significance of different views. Simulation studies confirm the effectiveness of the proposed procedure. We apply the method to the preterm infant study, and the identified microbes are mostly consistent with existing studies and biological understandings.
Collapse
Affiliation(s)
- Xiaokang Liu
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Xiaomei Cong
- School of Nursing, University of Connecticut, Storrs, Connecticut, USA
| | - Gen Li
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Kendra Maas
- Microbial Analysis, Resources, and Services, University of Connecticut, Storrs, Connecticut, USA
| | - Kun Chen
- Department of Statistics, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
27
|
Li S, Cai TT, Li H. Transfer Learning for High-Dimensional Linear Regression: Prediction, Estimation and Minimax Optimality. J R Stat Soc Series B Stat Methodol 2022; 84:149-173. [PMID: 35210933 PMCID: PMC8863181 DOI: 10.1111/rssb.12479] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
This paper considers estimation and prediction of a high-dimensional linear regression in the setting of transfer learning where, in addition to observations from the target model, auxiliary samples from different but possibly related regression models are available. When the set of informative auxiliary studies is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. When the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and show its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating data from multiple different tissues as auxiliary samples.
Collapse
Affiliation(s)
- Sai Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennvania, Philadelphia, PA 19104
| | - T. Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
28
|
Shah RD, Bühlmann P. Double-Estimation-Friendly Inference for High-Dimensional Misspecified Models. Stat Sci 2022. [DOI: 10.1214/22-sts850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Rajen D. Shah
- Rajen D. Shah is Professor of Statistics, Statistical Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Peter Bühlmann
- Peter Bühlmann is Professor of Statistics, Seminar for Statistics, ETH Zurich, Zurich, Switzerland
| |
Collapse
|
29
|
Livne I, Azriel D, Goldberg Y. Improved estimators for semi-supervised high-dimensional regression model. Electron J Stat 2022. [DOI: 10.1214/22-ejs2070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Ilan Livne
- The Faculty of Industrial Engineering and Management, Technion, Israel
| | - David Azriel
- The Faculty of Industrial Engineering and Management, Technion, Israel
| | - Yair Goldberg
- The Faculty of Industrial Engineering and Management, Technion, Israel
| |
Collapse
|
30
|
Deshpande Y, Javanmard A, Mehrabi M. Online Debiasing for Adaptively Collected High-Dimensional Data With Applications to Time Series Analysis. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1979011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yash Deshpande
- Institute for Data, Systems and Society, Massachusetts Institute of Technology, Cambridge, MA
| | - Adel Javanmard
- Data Sciences and Operations Department, University of Southern California, Los Angeles, CA
| | - Mohammad Mehrabi
- Data Sciences and Operations Department, University of Southern California, Los Angeles, CA
| |
Collapse
|
31
|
Zhao S, Witten D, Shojaie A. In Defense of the Indefensible: A Very Naïve Approach to High-Dimensional Inference. Stat Sci 2021; 36:562-577. [PMID: 37860618 PMCID: PMC10586523 DOI: 10.1214/20-sts815] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2023]
Abstract
A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very naïve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and p-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the naïve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the naïve confidence interval, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the naïve score test, which can be used to test the hypotheses regarding the full-model regression coefficients.
Collapse
Affiliation(s)
- Sen Zhao
- 1600Amphitheatre Parkway, Mountain View, California 94043, USA
| | - Daniela Witten
- University of Washington, Health Sciences Building, Box 357232, Seattle, Washington 98195, USA
| | - Ali Shojaie
- University of Washington, Health Sciences Building, Box 357232, Seattle, Washington 98195, USA
| |
Collapse
|
32
|
Zhang Q. High-Dimensional Mediation Analysis with Applications to Causal Gene Identification. STATISTICS IN BIOSCIENCES 2021. [DOI: 10.1007/s12561-021-09328-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
33
|
Hierarchical correction of p-values via an ultrametric tree running Ornstein-Uhlenbeck process. Comput Stat 2021. [DOI: 10.1007/s00180-021-01148-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
AbstractStatistical testing is classically used as an exploratory tool to search for association between a phenotype and many possible explanatory variables. This approach often leads to multiple testing under dependence. We assume a hierarchical structure between tests via an Ornstein-Uhlenbeck process on a tree. The process correlation structure is used for smoothing the p-values. We design a penalized estimation of the mean of the Ornstein-Uhlenbeck process for p-value computation. The performances of the algorithm are assessed via simulations. Its ability to discover new associations is demonstrated on a metagenomic dataset. The corresponding R package is available from https://github.com/abichat/zazou.
Collapse
|
34
|
Freijeiro‐González L, Febrero‐Bande M, González‐Manteiga W. A Critical Review of LASSO and Its Derivatives for Variable Selection Under Dependence Among Covariates. Int Stat Rev 2021. [DOI: 10.1111/insr.12469] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Laura Freijeiro‐González
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| | - Manuel Febrero‐Bande
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| | - Wenceslao González‐Manteiga
- Department of Statistics Mathematical Analysis and Optimization; Santiago de Compostela University Santiago de Compostela Spain
| |
Collapse
|
35
|
Liu Y, Gao Y, Fang R, Cao H, Sa J, Wang J, Liu H, Wang T, Cui Y. Identifying complex gene-gene interactions: a mixed kernel omnibus testing approach. Brief Bioinform 2021; 22:6346804. [PMID: 34373892 DOI: 10.1093/bib/bbab305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/01/2021] [Accepted: 07/17/2021] [Indexed: 11/12/2022] Open
Abstract
Genes do not function independently; rather, they interact with each other to fulfill their joint tasks. Identification of gene-gene interactions has been critically important in elucidating the molecular mechanisms responsible for the variation of a phenotype. Regression models are commonly used to model the interaction between two genes with a linear product term. The interaction effect of two genes can be linear or nonlinear, depending on the true nature of the data. When nonlinear interactions exist, the linear interaction model may not be able to detect such interactions; hence, it suffers from substantial power loss. While the true interaction mechanism (linear or nonlinear) is generally unknown in practice, it is critical to develop statistical methods that can be flexible to capture the underlying interaction mechanism without assuming a specific model assumption. In this study, we develop a mixed kernel function which combines both linear and Gaussian kernels with different weights to capture the linear or nonlinear interaction of two genes. Instead of optimizing the weight function, we propose a grid search strategy and use a Cauchy transformation of the P-values obtained under different weights to aggregate the P-values. We further extend the two-gene interaction model to a high-dimensional setup using a de-biased LASSO algorithm. Extensive simulation studies are conducted to verify the performance of the proposed method. Application to two case studies further demonstrates the utility of the model. Our method provides a flexible and computationally efficient tool for disentangling complex gene-gene interactions associated with complex traits.
Collapse
Affiliation(s)
- Yan Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuzhao Gao
- School of Statistics, Shanxi University of Finance and Economics, Taiyuan, PR China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jian Sa
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Hongqi Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Tong Wang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
36
|
Bellec PC, Zhang CH. Second-order Stein: SURE for SURE and other applications in high-dimensional inference. Ann Stat 2021. [DOI: 10.1214/20-aos2005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
37
|
Zhou K, Li KC, Zhou Q. Honest Confidence Sets for High-Dimensional Regression by Projection and Shrinkage. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1938581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Kun Zhou
- Department of Statistics, University of California, Los Angeles, CA
| | - Ker-Chau Li
- Department of Statistics, University of California, Los Angeles, CA
- Institute of Statistical Science, Academia Sinica, Nangang, Taiwan
| | - Qing Zhou
- Department of Statistics, University of California, Los Angeles, CA
| |
Collapse
|
38
|
Qiu Y, Zhou XH. Inference on Multi-level Partial Correlations Based on Multi-subject Time Series Data. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1917417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Yumou Qiu
- Department of Statistics, Iowa State University, Ames, IA
| | - Xiao-Hua Zhou
- Beijing International Center for Mathematical Research, Department of Biostatistics, and National Engineering Lab for Big Data Analysis and Applications, Peking University, Beijing, China
| |
Collapse
|
39
|
Yu Q, Li Y, Wang Y, Yang Y, Zheng Z. Scalable and efficient inference via CPE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1936044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Qin Yu
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yang Li
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yumeng Wang
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yachong Yang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Zemin Zheng
- International Institute of Finance, The School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| |
Collapse
|
40
|
Zhou J, Zheng Z, Zhou H, Dong R. Innovated scalable efficient inference for ultra-large graphical models. Stat Probab Lett 2021. [DOI: 10.1016/j.spl.2021.109085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
41
|
Comminges L, Collier O, Ndaoud M, Tsybakov AB. Adaptive robust estimation in sparse vector model. Ann Stat 2021. [DOI: 10.1214/20-aos2002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- L. Comminges
- CEREMADE, Université Paris-Dauphine, PSL and CREST
| | - O. Collier
- Modal’X, UPL, Université Paris Nanterre and CREST
| | | | | |
Collapse
|
42
|
Evaluating Visual Properties via Robust HodgeRank. Int J Comput Vis 2021. [DOI: 10.1007/s11263-021-01438-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
43
|
Law M, Ritov Y. Inference without compatibility: Using exponential weighting for inference on a parameter of a linear model. BERNOULLI 2021. [DOI: 10.3150/20-bej1280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Michael Law
- Department of Statistics, University of Michigan, Ann Arbor, USA
| | - Ya’acov Ritov
- Department of Statistics, University of Michigan, Ann Arbor, USA
| |
Collapse
|
44
|
Li S, Cai TT, Li H. Inference for high-dimensional linear mixed-effects models: A quasi-likelihood approach. J Am Stat Assoc 2021; 117:1835-1846. [PMID: 36793369 PMCID: PMC9928173 DOI: 10.1080/01621459.2021.1888740] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Revised: 01/15/2021] [Accepted: 02/04/2021] [Indexed: 10/22/2022]
Abstract
Linear mixed-effects models are widely used in analyzing clustered or repeated measures data. We propose a quasi-likelihood approach for estimation and inference of the unknown parameters in linear mixed-effects models with high-dimensional fixed effects. The proposed method is applicable to general settings where the dimension of the random effects and the cluster sizes are possibly large. Regarding the fixed effects, we provide rate optimal estimators and valid inference procedures that do not rely on the structural information of the variance components. We also study the estimation of variance components with high-dimensional fixed effects in general settings. The algorithms are easy to implement and computationally fast. The proposed methods are assessed in various simulation settings and are applied to a real study regarding the associations between body mass index and genetic polymorphic markers in a heterogeneous stock mice population.
Collapse
Affiliation(s)
- Sai Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104
| |
Collapse
|
45
|
Guo X, Cheng G. Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1893177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Xiao Guo
- International Institute of Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Guang Cheng
- Department of Statistics, Purdue University, West Lafayette, IN
| |
Collapse
|
46
|
Zheng Z, Liu L, Li Y, Zhao N. High-dimensional statistical inference via DATE. COMMUN STAT-THEOR M 2021. [DOI: 10.1080/03610926.2021.1909733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Zemin Zheng
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Lei Liu
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Yang Li
- School of Management, University of Science and Technology of China, Hefei, Anhui, P. R. China
| | - Ni Zhao
- School of Mathematics and Physics Sciences, Anhui Jianzhu University, Hefei, Anhui, P. R. China
| |
Collapse
|
47
|
Wang Y, Zhao SD. A nonparametric empirical Bayes approach to large-scale multivariate regression. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107130] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
48
|
Pun CS, Hadimaja MZ. A self-calibrated direct approach to precision matrix estimation and linear discriminant analysis in high dimensions. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2020.107105] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
49
|
Kim J, Zhu H, Wang X, Do K. Scalable network estimation with
L
0
penalty. Stat Anal Data Min 2021; 14:18-30. [DOI: 10.1002/sam.11483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Junghi Kim
- Center for Drug Evaluation and Research U.S. Food and Drug Administration Silver Spring Maryland USA
| | - Hongtu Zhu
- Department of Biostatistics University of North Carolina Chapel Hill North Carolina USA
| | - Xiao Wang
- Department of Statistics Purdue University West Lafayette Indiana USA
| | - Kim‐Anh Do
- Department of Biostatistics University of Texas MD Anderson Cancer Center Houston Texas USA
| |
Collapse
|
50
|
Affiliation(s)
- Xiaoou Pan
- Department of Mathematics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Qiang Sun
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
| | - Wen-Xin Zhou
- Department of Mathematics, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|