1
|
Zhang C, Ye J, Wang X. A Computational Perspective on Projection Pursuit in High Dimensions: Feasible or Infeasible Feature Extraction. Int Stat Rev 2022. [DOI: 10.1111/insr.12517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Chunming Zhang
- Department of Statisics University of Wisconsin‐Madison Madison WI 53706 USA
| | - Jimin Ye
- School of Mathematics and Statistics Xidian University Xi'an Shaanxi 710071 China
| | - Xiaomei Wang
- School of Management Northwestern Polytechnical University Xi'an Shaanxi 710072 China
| |
Collapse
|
2
|
Nordhausen K, Oja H, Tyler DE. Asymptotic and bootstrap tests for subspace dimension. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2021.104830] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
3
|
|
4
|
Loperfido N. Some theoretical properties of two kurtosis matrices, with application to invariant coordinate selection. J MULTIVARIATE ANAL 2021. [DOI: 10.1016/j.jmva.2021.104809] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
5
|
Abstract
This paper provides a systematic and comprehensive treatment for obtaining general expressions of any order, for the moments and cumulants of spherically and elliptically symmetric multivariate distributions; results for the case of multivariate t-distribution and related skew-t distribution are discussed in some detail.
Collapse
|
6
|
Skewness-Based Projection Pursuit as an Eigenvector Problem in Scale Mixtures of Skew-Normal Distributions. Symmetry (Basel) 2021. [DOI: 10.3390/sym13061056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper addresses the projection pursuit problem assuming that the distribution of the input vector belongs to the flexible and wide family of multivariate scale mixtures of skew normal distributions. Under this assumption, skewness-based projection pursuit is set out as an eigenvector problem, described in terms of the third order cumulant matrix, as well as an eigenvector problem that involves the simultaneous diagonalization of the scatter matrices of the model. Both approaches lead to dominant eigenvectors proportional to the shape parametric vector, which accounts for the multivariate asymmetry of the model; they also shed light on the parametric interpretability of the invariant coordinate selection method and point out some alternatives for estimating the projection pursuit direction. The theoretical findings are further investigated through a simulation study whose results provide insights about the usefulness of skewness model-based projection pursuit in the statistical practice.
Collapse
|
7
|
Rao Jammalamadaka S, Taufer E, Terdik GH. Asymptotic theory for statistics based on cumulant vectors with applications. Scand Stat Theory Appl 2021. [DOI: 10.1111/sjos.12521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Emanuele Taufer
- Department of Economics and Management University of Trento Trento Italy
| | - György H. Terdik
- Department of Information Technology University of Debrecen Debrecen Hungary
| |
Collapse
|
8
|
Fischer D, Nordhausen K, Oja H. On linear dimension reduction based on diagonalization of scatter matrices for bioinformatics downstream analyses. Heliyon 2021; 6:e05732. [PMID: 33385080 PMCID: PMC7770551 DOI: 10.1016/j.heliyon.2020.e05732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 06/01/2020] [Accepted: 12/11/2020] [Indexed: 11/16/2022] Open
Abstract
Dimension reduction is often a preliminary step in the analysis of data sets with a large number of variables. Most classical, both supervised and unsupervised, dimension reduction methods such as principal component analysis (PCA), independent component analysis (ICA) or sliced inverse regression (SIR) can be formulated using one, two or several different scatter matrix functionals. Scatter matrices can be seen as different measures of multivariate dispersion and might highlight different features of the data and when compared might reveal interesting structures. Such analysis then searches for a projection onto an interesting (signal) part of the data, and it is also important to know the correct dimension of the signal subspace. These approaches usually make either no model assumptions or work in wide classes of semiparametric models. Theoretical results in the literature are however limited to the case where the sample size exceeds the number of variables which is hardly ever true for data sets encountered in bioinformatics. In this paper, we briefly review the relevant literature and explore if the dimension reduction tools can be used to find relevant and interesting subspaces for small-n-large-p data sets. We illustrate the methods with a microarray dataset of prostate cancer patients and healthy controls.
Collapse
Affiliation(s)
- Daniel Fischer
- Natural Resources Institute Finland (Luke), Applied Statistical Methods, Myllytie 1, 31600 Jokionen, Finland
| | - Klaus Nordhausen
- CSTAT - Computational Statistics, Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Wiedner Hauptstraße 7, A-1040 Vienna, Austria
| | - Hannu Oja
- Department of Mathematics and Statistics, University of Turku, 20014 Turku, Finland
| |
Collapse
|
9
|
Nelson JT, Motamayor JC, Cornejo OE. Environment and pathogens shape local and regional adaptations to climate change in the chocolate tree, Theobroma cacao L. Mol Ecol 2020; 30:656-669. [PMID: 33247971 DOI: 10.1111/mec.15754] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 10/23/2020] [Accepted: 11/13/2020] [Indexed: 12/22/2022]
Abstract
Predicting the potential fate of a species in the face of climate change requires knowing the distribution of molecular adaptations across the geographic range of the species. In this work, we analysed 79 genomes of Theobroma cacao, an Amazonian tree known for the fruit from which chocolate is produced, to evaluate how local and regional molecular signatures of adaptation are distributed across the natural range of the species. We implemented novel techniques that incorporate summary statistics from multiple selection scans to infer selective sweeps. The majority of the molecular adaptations in the genome are not shared among populations. We show that ~71.5% of genes under selection also show significant associations with changes in environmental variables. Our results support the interpretation that these genes contribute to local adaptation of the populations in response to abiotic factors. We also found strong patterns of molecular adaptation in a diverse array of disease resistance genes (6.5% of selective sweeps), suggesting that differential adaptation to pathogens also contributes significantly to local adaptations. Our results are consistent with the interpretation that local selective pressures are more important than regional selective pressures in explaining adaptation across the range of a species.
Collapse
Affiliation(s)
- Joel T Nelson
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| | | | - Omar E Cornejo
- School of Biological Sciences, Washington State University, Pullman, WA, USA
| |
Collapse
|
10
|
Affiliation(s)
| | - Rob J. Hyndman
- Department of Econometrics and Business Statistics, Monash University, Clayton, VIC, Australia
| |
Collapse
|
11
|
Fischer D, Berro A, Nordhausen K, Ruiz-Gazen A. REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2019.1626880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Daniel Fischer
- Applied Statistical Methods, Natural Resources Institute Finland (Luke), Jokioinen, Finland
| | - Alain Berro
- Institut de Recherche en Informatique de Toulouse, University of Toulouse Capitole, Toulouse, France
| | - Klaus Nordhausen
- Institute of Statistics & Mathematical Methods in Economics, Vienna University of Technology, Wien, Austria
| | - Anne Ruiz-Gazen
- Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France
| |
Collapse
|
12
|
|
13
|
Jung S, Ahn J, Jeon Y. Penalized Orthogonal Iteration for Sparse Estimation of Generalized Eigenvalue Problem. J Comput Graph Stat 2019. [DOI: 10.1080/10618600.2019.1568014] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Sungkyu Jung
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Jeongyoun Ahn
- Department of Statistics, University of Georgia, Athens, GA
| | - Yongho Jeon
- Department of Applied Statistics, Yonsei University, Seoul, South Korea
| |
Collapse
|
14
|
|
15
|
Lietzén N, Pitkäniemi J, Heinävaara S, Ilmonen P. On Exploring Hidden Structures Behind Cervical Cancer Incidence. Cancer Control 2018; 25:1073274818801604. [PMID: 30251557 PMCID: PMC6156216 DOI: 10.1177/1073274818801604] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Finding new etiological components is of great interest in disease epidemiology.
We consider time series version of invariant coordinate selection (tICS) as an
exploratory tool in the search of hidden structures in the analysis of
population-based registry data. Increasing cancer burden inspired us to consider
a case study of age-stratified cervical cancer incidence in Finland between the
years 1953 and 2014. The latent components, which we uncover using tICS, show
that the etiology of cervical cancer is age dependent. This is in line with
recent findings related to the epidemiology of cervical cancer. Furthermore, we
are able to explain most of the variation of cervical cancer incidence in
different age groups by using only two latent tICS components. The second tICS
component, in particular, is interesting since it separates the age groups into
three distinct clusters. The factor that separates the three clusters is the
median age of menopause occurrence.
Collapse
Affiliation(s)
- Niko Lietzén
- 1 Department of Mathematics and Systems Analysis, Aalto University School of Science, Espoo, Finland
| | - Janne Pitkäniemi
- 2 Institute for Statistical and Epidemiological Cancer Research, Finnish Cancer Registry, Helsinki, Finland.,3 Department of Public Health, University of Helsinki, Helsinki, Finland
| | - Sirpa Heinävaara
- 2 Institute for Statistical and Epidemiological Cancer Research, Finnish Cancer Registry, Helsinki, Finland.,3 Department of Public Health, University of Helsinki, Helsinki, Finland
| | - Pauliina Ilmonen
- 1 Department of Mathematics and Systems Analysis, Aalto University School of Science, Espoo, Finland
| |
Collapse
|
16
|
|
17
|
|
18
|
|
19
|
Sattler P, Pauly M. Inference for high-dimensional split-plot-designs: A unified approach for small to large numbers of factor levels. Electron J Stat 2018. [DOI: 10.1214/18-ejs1465] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
20
|
Hubert M, Debruyne M, Rousseeuw PJ. Minimum covariance determinant and extensions. ACTA ACUST UNITED AC 2017. [DOI: 10.1002/wics.1421] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Mia Hubert
- Department of Mathematics KU Leuven Leuven Belgium
| | | | | |
Collapse
|
21
|
Virta J, Li B, Nordhausen K, Oja H. Independent component analysis for tensor-valued data. J MULTIVARIATE ANAL 2017. [DOI: 10.1016/j.jmva.2017.09.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Fischer D, Honkatukia M, Tuiskula-Haavisto M, Nordhausen K, Cavero D, Preisinger R, Vilkki J. Subgroup detection in genotype data using invariant coordinate selection. BMC Bioinformatics 2017; 18:173. [PMID: 28302061 PMCID: PMC5356247 DOI: 10.1186/s12859-017-1589-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Accepted: 03/09/2017] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND The current gold standard in dimension reduction methods for high-throughput genotype data is the Principle Component Analysis (PCA). The presence of PCA is so dominant, that other methods usually cannot be found in the analyst's toolbox and hence are only rarely applied. RESULTS We present a modern dimension reduction method called 'Invariant Coordinate Selection' (ICS) and its application to high-throughput genotype data. The more commonly known Independent Component Analysis (ICA) is in this framework just a special case of ICS. We use ICS on both, a simulated and a real dataset to demonstrate first some deficiencies of PCA and how ICS is capable to recover the correct subgroups within the simulated data. Second, we apply the ICS method on a chicken dataset and also detect there two subgroups. These subgroups are then further investigated with respect to their genotype to provide further evidence of the biological relevance of the detected subgroup division. Further, we compare the performance of ICS also to five other popular dimension reduction methods. CONCLUSION The ICS method was able to detect subgroups in data where the PCA fails to detect anything. Hence, we promote the application of ICS to high-throughput genotype data in addition to the established PCA. Especially in statistical programming environments like e.g. R, its application does not add any computational burden to the analysis pipeline.
Collapse
Affiliation(s)
- Daniel Fischer
- Natural Resources Institute Finland (LUKE), Myllytie 1, Jokioinen, Finland
| | - Mervi Honkatukia
- Natural Resources Institute Finland (LUKE), Myllytie 1, Jokioinen, Finland
| | | | - Klaus Nordhausen
- Department of Mathematics and Statistics, University of Turku, Turku, Finland
- University of Tampere, School of Health Sciences, Medisiinarinkatu 3, Tampere, 33014 Finland
| | - David Cavero
- Lohmann Tierzucht GmbH, Am Seedeich 9-11, Cuxhaven, 27454 Germany
| | | | - Johanna Vilkki
- Natural Resources Institute Finland (LUKE), Myllytie 1, Jokioinen, Finland
| |
Collapse
|
23
|
Alashwali F, Kent JT. The use of a common location measure in the invariant coordinate selection and projection pursuit. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2016.08.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
24
|
|
25
|
|
26
|
Dümbgen L, Nordhausen K, Schuhmacher H. New algorithms for M-estimation of multivariate scatter and location. J MULTIVARIATE ANAL 2016. [DOI: 10.1016/j.jmva.2015.11.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Oja H, Paindaveine D, Taskinen S. Affine-invariant rank tests for multivariate independence in independent component models. Electron J Stat 2016. [DOI: 10.1214/16-ejs1174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
28
|
Miettinen J, Taskinen S, Nordhausen K, Oja H. Fourth Moments and Independent Component Analysis. Stat Sci 2015. [DOI: 10.1214/15-sts520] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
29
|
Villa‐Vialaneix N, Ruiz‐Gazen A. Beyond multidimensional data in model visualization: High‐dimensional and complex nonnumeric data. Stat Anal Data Min 2015. [DOI: 10.1002/sam.11274] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
| | - Anne Ruiz‐Gazen
- Toulouse School of Economics 21 allée de Brienne 31000 Toulouse France
| |
Collapse
|
30
|
|
31
|
|
32
|
|
33
|
Yu K, Dang X, Chen Y. Robustness of the Affine Equivariant Scatter Estimator Based on the Spatial Rank Covariance Matrix. COMMUN STAT-THEOR M 2015. [DOI: 10.1080/03610926.2012.755198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
34
|
|
35
|
Vogel D, Tyler DE. Robust estimators for nondecomposable elliptical graphical models. Biometrika 2014. [DOI: 10.1093/biomet/asu041] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
36
|
A characterization of elliptical distributions and some optimality properties of principal components for functional data. J MULTIVARIATE ANAL 2014. [DOI: 10.1016/j.jmva.2014.07.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
37
|
|
38
|
|
39
|
|
40
|
Bookstein FL, Mitteroecker P. Comparing Covariance Matrices by Relative Eigenanalysis, with Applications to Organismal Biology. Evol Biol 2013. [DOI: 10.1007/s11692-013-9260-5] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
41
|
Fan J, Liao Y, Mincheva M. Large Covariance Estimation by Thresholding Principal Orthogonal Complements. J R Stat Soc Series B Stat Methodol 2013; 75. [PMID: 24348088 DOI: 10.1111/rssb.12016] [Citation(s) in RCA: 375] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This paper deals with the estimation of a high-dimensional covariance with a conditional sparsity structure and fast-diverging eigenvalues. By assuming sparse error covariance matrix in an approximate factor model, we allow for the presence of some cross-sectional correlation even after taking out common but unobservable factors. We introduce the Principal Orthogonal complEment Thresholding (POET) method to explore such an approximate factor structure with sparsity. The POET estimator includes the sample covariance matrix, the factor-based covariance matrix (Fan, Fan, and Lv, 2008), the thresholding estimator (Bickel and Levina, 2008) and the adaptive thresholding estimator (Cai and Liu, 2011) as specific examples. We provide mathematical insights when the factor analysis is approximately the same as the principal component analysis for high-dimensional data. The rates of convergence of the sparse residual covariance matrix and the conditional sparse covariance matrix are studied under various norms. It is shown that the impact of estimating the unknown factors vanishes as the dimensionality increases. The uniform rates of convergence for the unobserved factors and their factor loadings are derived. The asymptotic results are also verified by extensive simulation studies. Finally, a real data application on portfolio allocation is presented.
Collapse
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University ; Bendheim Center for Finance, Princeton University
| | - Yuan Liao
- Department of Mathematics, University of Maryland
| | - Martina Mincheva
- Department of Operations Research and Financial Engineering, Princeton University
| |
Collapse
|
42
|
|
43
|
On asymptotic properties of the scatter matrix based estimates for complex valued independent component analysis. Stat Probab Lett 2013. [DOI: 10.1016/j.spl.2013.01.020] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
44
|
|
45
|
Cator EA, Lopuhaä HP. Central limit theorem and influence function for the MCD estimators at general multivariate distributions. BERNOULLI 2012. [DOI: 10.3150/11-bej353] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
46
|
|
47
|
Ilmonen P, Nevalainen J, Oja H. Characteristics of multivariate distributions and the invariant coordinate system. Stat Probab Lett 2010. [DOI: 10.1016/j.spl.2010.08.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
48
|
Taskinen S, Sirkiä S, Oja H. k-Step shape estimators based on spatial signs and ranks. J Stat Plan Inference 2010. [DOI: 10.1016/j.jspi.2010.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
49
|
Peña D, Prieto FJ, Viladomat J. Eigenvectors of a kurtosis matrix as interesting directions to reveal cluster structure. J MULTIVARIATE ANAL 2010. [DOI: 10.1016/j.jmva.2010.04.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|