1
|
Robust estimation of the number of factors for the pair-elliptical factor models. Comput Stat 2021. [DOI: 10.1007/s00180-021-01165-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
2
|
Ghosh A, Thoresen M. A robust variable screening procedure for ultra-high dimensional data. Stat Methods Med Res 2021; 30:1816-1832. [PMID: 34053339 DOI: 10.1177/09622802211017299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism.
Collapse
Affiliation(s)
- Abhik Ghosh
- Interdisciplinary Statistical Research Unit, Indian Statistical Institute, Kolkata, India
| | - Magne Thoresen
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| |
Collapse
|
3
|
He Y, Liu P, Zhang X, Zhou W. Robust covariance estimation for high-dimensional compositional data with application to microbial communities analysis. Stat Med 2021; 40:3499-3515. [PMID: 33840134 DOI: 10.1002/sim.8979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 03/13/2021] [Accepted: 03/25/2021] [Indexed: 11/08/2022]
Abstract
Microbial communities analysis is drawing growing attention due to the rapid development fire of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional (lying in a simplex) and even would be leptokurtic and highly skewed due to the existence of overly abundant taxa, which makes the conventional correlation analysis infeasible to study the co-occurrence and co-exclusion relationship between microbial taxa. In this article, we address the challenges of covariance estimation for this kind of data. Assuming the basis covariance matrix lying in a well-recognized class of sparse covariance matrices, we adopt a proxy matrix known as centered log-ratio covariance matrix in the literature. We construct a Median-of-Means estimator for the centered log-ratio covariance matrix and propose a thresholding procedure that is adaptive to the variability of individual entries. By imposing a much weaker finite fourth moment condition compared with the sub-Gaussianity condition in the literature, we derive the optimal rate of convergence under the spectral norm. In addition, we also provide theoretical guarantee on support recovery. The adaptive thresholding procedure of the MOM estimator is easy to implement and gains robustness when outliers or heavy-tailedness exist. Thorough simulation studies are conducted to show the advantages of the proposed procedure over some state-of-the-arts methods. At last, we apply the proposed method to analyze a microbiome dataset in human gut.
Collapse
Affiliation(s)
- Yong He
- Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan, Shandong, China
| | - Pengfei Liu
- School of Mathematics and Statistics and Research Institute of Mathematical Sciences, Jiangsu Normal University, Xuzhou, Jiangsu, China
| | | | - Wang Zhou
- Department of Statistics and Applied Probability, National University of Singapore, Singapore
| |
Collapse
|
4
|
Chen H, Guo Y, He Y, Ji J, Liu L, Shi Y, Wang Y, Yu L, Zhang X. Simultaneous differential network analysis and classification for matrix-variate data with application to brain connectivity. Biostatistics 2021; 23:967-989. [PMID: 33769450 DOI: 10.1093/biostatistics/kxab007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Revised: 02/20/2021] [Accepted: 02/22/2021] [Indexed: 01/03/2023] Open
Abstract
Growing evidence has shown that the brain connectivity network experiences alterations for complex diseases such as Alzheimer's disease (AD). Network comparison, also known as differential network analysis, is thus particularly powerful to reveal the disease pathologies and identify clinical biomarkers for medical diagnoses (classification). Data from neurophysiological measurements are multidimensional and in matrix-form. Naive vectorization method is not sufficient as it ignores the structural information within the matrix. In the article, we adopt the Kronecker product covariance matrices framework to capture both spatial and temporal correlations of the matrix-variate data while the temporal covariance matrix is treated as a nuisance parameter. By recognizing that the strengths of network connections may vary across subjects, we develop an ensemble-learning procedure, which identifies the differential interaction patterns of brain regions between the case group and the control group and conducts medical diagnosis (classification) of the disease simultaneously. Simulation studies are conducted to assess the performance of the proposed method. We apply the proposed procedure to the functional connectivity analysis of an functional magnetic resonance imaging study on AD. The hub nodes and differential interaction patterns identified are consistent with existing experimental studies, and satisfactory out-of-sample classification performance is achieved for medical diagnosis of AD.
Collapse
Affiliation(s)
- Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, 250014, China
| | - Ying Guo
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Yong He
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Jiadong Ji
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Lei Liu
- Division of Biostatistics, Washington University in St.Louis, St. Louis, MO 63110, USA
| | - Yufeng Shi
- Institute for Financial Studies, Shandong University, Jinan, 250100, China
| | - Yikai Wang
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA
| | - Long Yu
- Department of Statistics, School of Management, Fudan University, Shanghai, 200433, China
| | - Xinsheng Zhang
- Department of Statistics, School of Management, Fudan University, Shanghai, 200433, China
| | | |
Collapse
|
5
|
|