1
|
Pan Y, Matilainen M, Taskinen S, Nordhausen K. A review of second-order blind identification methods. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14:e1550. [PMID: 36249858 PMCID: PMC9540980 DOI: 10.1002/wics.1550] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 01/06/2021] [Accepted: 01/07/2021] [Indexed: 11/24/2022]
Abstract
Second-order source separation (SOS) is a data analysis tool which can be used for revealing hidden structures in multivariate time series data or as a tool for dimension reduction. Such methods are nowadays increasingly important as more and more high-dimensional multivariate time series data are measured in numerous fields of applied science. Dimension reduction is crucial, as modeling such high-dimensional data with multivariate time series models is often impractical as the number of parameters describing dependencies between the component time series is usually too high. SOS methods have their roots in the signal processing literature, where they were first used to separate source signals from an observed signal mixture. The SOS model assumes that the observed time series (signals) is a linear mixture of latent time series (sources) with uncorrelated components. The methods make use of the second-order statistics-hence the name "second-order source separation." In this review, we discuss the classical SOS methods and their extensions to more complex settings. An example illustrates how SOS can be performed. This article is categorized under:Statistical Models > Time Series ModelsStatistical and Graphical Methods of Data Analysis > Dimension ReductionData: Types and Structure > Time Series, Stochastic Processes, and Functional Data.
Collapse
Affiliation(s)
- Yan Pan
- Department of Mathematics and StatisticsUniversity of JyväskyläFinland
| | - Markus Matilainen
- Turku PET CentreTurku University Hospital and University of TurkuFinland
| | - Sara Taskinen
- Department of Mathematics and StatisticsUniversity of JyväskyläFinland
| | - Klaus Nordhausen
- Department of Mathematics and StatisticsUniversity of JyväskyläFinland
- Institute of Statistics and Mathematical Methods in Economics, TUViennaAustria
| |
Collapse
|
2
|
Virta J, Lietzén N, Ilmonen P, Nordhausen K. Fast tensorial JADE. Scand Stat Theory Appl 2021; 48:164-187. [PMID: 33664538 PMCID: PMC7891388 DOI: 10.1111/sjos.12445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 10/24/2019] [Accepted: 01/05/2020] [Indexed: 11/28/2022]
Abstract
We propose a novel method for tensorial-independent component analysis. Our approach is based on TJADE and k-JADE, two recently proposed generalizations of the classical JADE algorithm. Our novel method achieves the consistency and the limiting distribution of TJADE under mild assumptions and at the same time offers notable improvement in computational speed. Detailed mathematical proofs of the statistical properties of our method are given and, as a special case, a conjecture on the properties of k-JADE is resolved. Simulations and timing comparisons demonstrate remarkable gain in speed. Moreover, the desired efficiency is obtained approximately for finite samples. The method is applied successfully to large-scale video data, for which neither TJADE nor k-JADE is feasible. Finally, an experimental procedure is proposed to select the values of a set of tuning parameters. Supplementary material including the R-code for running the examples and the proofs of the theoretical results is available online.
Collapse
Affiliation(s)
- Joni Virta
- Department of Mathematics and Systems AnalysisAalto University School of Science
- Department of Mathematics and StatisticsUniversity of Turku
| | - Niko Lietzén
- Department of Mathematics and Systems AnalysisAalto University School of Science
| | - Pauliina Ilmonen
- Department of Mathematics and Systems AnalysisAalto University School of Science
| | - Klaus Nordhausen
- Institute of Statistics & Mathematical Methods in EconomicsVienna University of Technology
| |
Collapse
|
3
|
Chen YL, Kolar M, Tsay RS. Tensor Canonical Correlation Analysis With Convergence and Statistical Guarantees. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2020.1856118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- You-Lin Chen
- Department of Statistics, The University of Chicago, Chicago, IL
| | - Mladen Kolar
- The University of Chicago Booth School of Business, Chicago, IL
| | - Ruey S. Tsay
- The University of Chicago Booth School of Business, Chicago, IL
| |
Collapse
|
4
|
Lee S, Shen H, Truong Y. Sampling Properties of color Independent Component Analysis. J MULTIVARIATE ANAL 2020; 181. [PMID: 33162620 DOI: 10.1016/j.jmva.2020.104692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Independent Component Analysis (ICA) offers an effective data-driven approach for blind source extraction encountered in many signal and image processing problems. Although many ICA methods have been developed, they have received relatively little attention in the statistics literature, especially in terms of rigorous theoretical investigation for statistical inference. The current paper aims at narrowing this gap and investigates the statistical sampling properties of the colorICA (cICA) method. The cICA incorporates the correlation structure within sources through parametric time series models in the frequency domain and outperforms several existing ICA alternatives numerically. We establish the consistency and asymptotic normality of the cICA estimates, which then enables statistical inference based on the estimates. These asymptotic properties are further validated using simulation studies.
Collapse
Affiliation(s)
- Seonjoo Lee
- Department of Psychiatry and Biostatistics, Columbia University, New York, NY, USA.,Mental Health Data Science, New York State Psychiatric Institute and Research Foundation for Mental Hygiene, Inc., New York, NY, USA
| | - Haipeng Shen
- Innovation and Information Management, Faculty of Business and Economics, University of Hong Kong, Hong Kong, China
| | - Young Truong
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
5
|
Virta J, Li B, Nordhausen K, Oja H. Independent component analysis for multivariate functional data. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2019.104568] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
6
|
Teschendorff AE, Jing H, Paul DS, Virta J, Nordhausen K. Tensorial blind source separation for improved analysis of multi-omic data. Genome Biol 2018; 19:76. [PMID: 29884221 PMCID: PMC5994057 DOI: 10.1186/s13059-018-1455-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 05/18/2018] [Indexed: 01/24/2023] Open
Abstract
There is an increased need for integrative analyses of multi-omic data. We present and benchmark a novel tensorial independent component analysis (tICA) algorithm against current state-of-the-art methods. We find that tICA outperforms competing methods in identifying biological sources of data variation at a reduced computational cost. On epigenetic data, tICA can identify methylation quantitative trait loci at high sensitivity. In the cancer context, tICA identifies gene modules whose expression variation across tumours is driven by copy-number or DNA methylation changes, but whose deregulation relative to normal tissue is independent of such alterations, a result we validate by direct analysis of individual data types.
Collapse
Affiliation(s)
- Andrew E Teschendorff
- CAS-MPG Partner Institute for Computational Biology, CAS Key Lab of Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China. .,Department of Women's Cancer, UCL Elizabeth Garrett Anderson Institute for Women's Health, University College London, 74 Huntley Street, London, WC1E 6BT, UK. .,UCL Cancer Institute, University College London, 72 Huntley Street, London, WC1E 6BT, UK.
| | - Han Jing
- CAS-MPG Partner Institute for Computational Biology, CAS Key Lab of Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.,University of Chinese Academy of Sciences, 19A Yuquan Road, Beijing, 100049, China
| | - Dirk S Paul
- Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Strangeways Research Laboratory, Cambridge, CB1 8RN, UK
| | - Joni Virta
- University of Turku, Turku, 20014, Finland
| | - Klaus Nordhausen
- Vienna University of Technology, Wiedner Hauptstr. 7, Vienna, A-1040, Austria
| |
Collapse
|