1
|
Yu X, Zhang L, Srinivasan A, Xie MG, Xue L. A unified combination framework for dependent tests with applications to microbiome association studies. Biometrics 2025; 81:ujaf001. [PMID: 39887051 PMCID: PMC11783248 DOI: 10.1093/biomtc/ujaf001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 10/30/2024] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating P-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing P-value combination methods, including the vanilla Cauchy combination method and other methods, the proposed combination framework is flexible and can be adapted to handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to the microbiome association studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.
Collapse
Affiliation(s)
- Xiufan Yu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Linjun Zhang
- Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA
| | | | - Min-ge Xie
- Department of Statistics, Rutgers University, Piscataway, NJ 08854, USA
| | - Lingzhou Xue
- Department of Statistics, Penn State University, University Park, PA 16802, USA
| |
Collapse
|
2
|
Jiang Y, Wen C, Jiang Y, Wang X, Zhang H. Use of random integration to test equality of high dimensional covariance matrices. Stat Sin 2023; 33:2359-2380. [PMID: 37799490 PMCID: PMC10550010 DOI: 10.5705/ss.202020.0486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2023]
Abstract
Testing the equality of two covariance matrices is a fundamental problem in statistics, and especially challenging when the data are high-dimensional. Through a novel use of random integration, we can test the equality of high-dimensional covariance matrices without assuming parametric distributions for the two underlying populations, even if the dimension is much larger than the sample size. The asymptotic properties of our test for arbitrary number of covariates and sample size are studied in depth under a general multivariate model. The finite-sample performance of our test is evaluated through numerical studies. The empirical results demonstrate that our test is highly competitive with existing tests in a wide range of settings. In particular, our proposed test is distinctly powerful under different settings when there exist a few large or many small diagonal disturbances between the two covariance matrices.
Collapse
Affiliation(s)
- Yunlu Jiang
- Jinan University, University of Science and Technology of China, Sun Yat-Sen University, Yale University
| | - Canhong Wen
- Jinan University, University of Science and Technology of China, Sun Yat-Sen University, Yale University
| | - Yukang Jiang
- Jinan University, University of Science and Technology of China, Sun Yat-Sen University, Yale University
| | - Xueqin Wang
- Jinan University, University of Science and Technology of China, Sun Yat-Sen University, Yale University
| | - Heping Zhang
- Jinan University, University of Science and Technology of China, Sun Yat-Sen University, Yale University
| |
Collapse
|
3
|
Li J. Finite sample t-tests for high-dimensional means. J MULTIVARIATE ANAL 2023; 196:105183. [PMID: 37780727 PMCID: PMC10538523 DOI: 10.1016/j.jmva.2023.105183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Abstract
When sample sizes are small, it becomes challenging for an asymptotic test requiring diverging sample sizes to maintain an accurate Type I error rate. In this paper, we consider one-sample, two-sample and ANOVA tests for mean vectors when data are high-dimensional but sample sizes are very small. We establish asymptotic t -distributions of the proposed U -statistics, which only require data dimensionality to diverge but sample sizes to be fixed and no less than 3. The proposed tests maintain accurate Type I error rates for a wide range of sample sizes and data dimensionality. Moreover, the tests are nonparametric and can be applied to data which are normally distributed or heavy-tailed. Simulation studies confirm the theoretical results for the tests. We also apply the proposed tests to an fMRI dataset to demonstrate the practical implementation of the methods.
Collapse
Affiliation(s)
- Jun Li
- Department of Mathematical Sciences, Kent State University, Kent, OH 44242, USA
| |
Collapse
|
4
|
Dörnemann N. Likelihood ratio tests under model misspecification in high dimensions. J MULTIVARIATE ANAL 2023. [DOI: 10.1016/j.jmva.2022.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
5
|
Lai J, Wang X, Zhao K, Zheng S. Block-diagonal test for high-dimensional covariance matrices. TEST-SPAIN 2022. [DOI: 10.1007/s11749-022-00842-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
6
|
Yu X, Li D, Xue L. Fisher’s combined probability test for high-dimensional covariance matrices *. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2126781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Xiufan Yu
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame
| | - Danning Li
- KLAS and School of Mathematics & Statistics, Northeast Normal University
| | - Lingzhou Xue
- Department of Statistics, Pennsylvania State University
| |
Collapse
|
7
|
Huang Y, Li C, Li R, Yang S. An overview of tests on high-dimensional means. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
8
|
Li C, Li R. Linear Hypothesis Testing in Linear Models With High-Dimensional Responses. J Am Stat Assoc 2022; 117:1738-1750. [PMID: 36908313 PMCID: PMC9996668 DOI: 10.1080/01621459.2021.1884561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
In this paper, we propose a new projection test for linear hypotheses on regression coefficient matrices in linear models with high dimensional responses. We systematically study the theoretical properties of the proposed test. We first derive the optimal projection matrix for any given projection dimension to achieve the best power and provide an upper bound for the optimal dimension of projection matrix. We further provide insights into how to construct the optimal projection matrix. One- and two-sample mean problems can be formulated as special cases of linear hypotheses studied in this paper. We both theoretically and empirically demonstrate that the proposed test can outperform the existing ones for one- and two-sample mean problems. We conduct Monte Carlo simulation to examine the finite sample performance and illustrate the proposed test by a real data example.
Collapse
Affiliation(s)
- Changcheng Li
- Department of Statistics, Pennsylvania State University at University Park
| | - Runze Li
- Department of Statistics, Pennsylvania State University at University Park
| |
Collapse
|
9
|
Ghosh S, Ayyala DN, Hellebuyck R. Two-sample high dimensional mean test based on prepivots. Comput Stat Data Anal 2021. [DOI: 10.1016/j.csda.2021.107284] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
|
11
|
Zhang Y, Wang R, Shao X. Adaptive Inference for Change Points in High-Dimensional Data. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1884562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Yangfan Zhang
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Runmin Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX
| | - Xiaofeng Shao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| |
Collapse
|
12
|
Abstract
Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients. This paper constructs a family of U-statistics as unbiased estimators of the ℓ p -norms of those features. We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed. Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution. Based on the asymptotic independence property, we propose an adaptive testing procedure which combines p-values computed from the U-statistics of different orders. We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives.
Collapse
Affiliation(s)
- Yinqiu He
- Department of Statistics, University of Michigan
| | - Gongjun Xu
- Department of Statistics, University of Michigan
| | - Chong Wu
- Department of Statistics, Florida State University
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota
| |
Collapse
|
13
|
Wu C, Xu G, Shen X, Pan W. A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2020; 21:128. [PMID: 32802002 PMCID: PMC7425805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package "aispu" implementing the proposed test on GitHub.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, FL, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, MI, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, MN, USA
| |
Collapse
|