1
|
Si T, Wang Y, Zhang L, Richmond E, Ahn TH, Gong H. Multivariate Time Series Change-Point Detection with a Novel Pearson-like Scaled Bregman Divergence. STATS 2024; 7:462-480. [PMID: 38827579 PMCID: PMC11138604 DOI: 10.3390/stats7020028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2024] Open
Abstract
Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. Precise identification of change points in time series omics data can provide insights into the dynamic and temporal characteristics inherent to complex biological systems. Many change-point detection methods have traditionally focused on the direct estimation of data distributions. However, these approaches become unrealistic in high-dimensional data analysis. Density ratio methods have emerged as promising approaches for change-point detection since estimating density ratios is easier than directly estimating individual densities. Nevertheless, the divergence measures used in these methods may suffer from numerical instability during computation. Additionally, the most popular α -relative Pearson divergence cannot measure the dissimilarity between two distributions of data but a mixture of distributions. To overcome the limitations of existing density ratio-based methods, we propose a novel approach called the Pearson-like scaled-Bregman divergence-based (PLsBD) density ratio estimation method for change-point detection. Our theoretical studies derive an analytical expression for the Pearson-like scaled Bregman divergence using a mixture measure. We integrate the PLsBD with a kernel regression model and apply a random sampling strategy to identify change points in both synthetic data and real-world high-dimensional genomics data of Drosophila. Our PLsBD method demonstrates superior performance compared to many other change-point detection methods.
Collapse
Affiliation(s)
- Tong Si
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Yunge Wang
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Lingling Zhang
- Department of Mathematics and Statistics, University at Albany SUNY, Albany, NY 12222, USA
| | - Evan Richmond
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| | - Tae-Hyuk Ahn
- Department of Computer Science, Saint Louis University, St. Louis, MO 63103, USA
| | - Haijun Gong
- Department of Mathematics and Statistics, Saint Louis University, St. Louis, MO 63103, USA
| |
Collapse
|
2
|
Wang J, Li N, Meng Z, Li Q. Change point detection for high dimensional data via kernel measure with application to human aging brain data. Stat Med 2023; 42:4644-4663. [PMID: 37649243 DOI: 10.1002/sim.9881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/07/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
Identifying the existence and locations of change points has been a broadly encountered task in many statistical application areas. The existing change point detection methods may produce unsatisfactory results for high-dimensional data since certain distributional assumptions are made on data, which are hard to verify in practice. Moreover, some parameters (such as the number of change points) need to be estimated beforehand for some methods, making their powers sensitive to these values. Here, we propose a kernel-basedU $$ U $$ -statistic to identify change points (KUCP) for high dimensional data, which is free of distributional assumptions and sup-parameter estimations. Specifically, we employ a kernel function to describe similarities among the subjects and construct aU $$ U $$ -statistic to test the existence of change point for a given location. The asymptotic properties of theU $$ U $$ -statistic are deduced. We also develop a procedure to locate the change points sequentially via a dichotomy algorithm. Extensive simulations demonstrate that KUCP has higher sensitivity in identifying existence of change points and higher accuracy in locating these change points than its counterparts. We further illustrate its practical utility by analyzing a gene expression data of human brain to detect the time point when gene expression profiles begin to change, which has been reported to be closely related with aging brain.
Collapse
Affiliation(s)
- Jinjuan Wang
- School of Mathematics and Statistics, Beijing Institute of Technology, Beijing, China
| | - Na Li
- School of Applied Science, Beijing Information Science and Technology University, Beijing, China
| | - Zhen Meng
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Qizhai Li
- LSC Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
3
|
Wang X, Liu B, Zhang X, Liu Y. Efficient multiple change point detection for high-dimensional generalized linear models. CAN J STAT 2023; 51:596-629. [PMID: 37346756 PMCID: PMC10281755 DOI: 10.1002/cjs.11721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 12/16/2021] [Indexed: 11/11/2022]
Abstract
Change point detection for high-dimensional data is an important yet challenging problem for many applications. In this paper, we consider multiple change point detection in the context of high-dimensional generalized linear models, allowing the covariate dimension p to grow exponentially with the sample size n. The model considered is general and flexible in the sense that it covers various specific models as special cases. It can automatically account for the underlying data generation mechanism without specifying any prior knowledge about the number of change points. Based on dynamic programming and binary segmentation techniques, two algorithms are proposed to detect multiple change points, allowing the number of change points to grow with n. To further improve the computational efficiency, a more efficient algorithm designed for the case of a single change point is proposed. We present theoretical properties of our proposed algorithms, including estimation consistency for the number and locations of change points as well as consistency and asymptotic distributions for the underlying regression coefficients. Finally, extensive simulation studies and application to the Alzheimer's Disease Neuroimaging Initiative data further demonstrate the competitive performance of our proposed methods.
Collapse
Affiliation(s)
- Xianru Wang
- Department of Statistics and Data Science, School of Management at Fudan University, Shanghai, China
| | - Bin Liu
- Department of Statistics and Data Science, School of Management at Fudan University, Shanghai, China
| | - Xinsheng Zhang
- Department of Statistics and Data Science, School of Management at Fudan University, Shanghai, China
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, Linberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, U.S.A
| | | |
Collapse
|
4
|
Chen Y, Wang T, Samworth RJ. Inference in High-Dimensional Online Changepoint Detection. J Am Stat Assoc 2023; 119:1461-1472. [PMID: 38974186 PMCID: PMC11225951 DOI: 10.1080/01621459.2023.2199962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 04/01/2023] [Indexed: 07/09/2024]
Abstract
We introduce and study two new inferential challenges associated with the sequential detection of change in a high-dimensional mean vector. First, we seek a confidence interval for the changepoint, and second, we estimate the set of indices of coordinates in which the mean changes. We propose an online algorithm that produces an interval with guaranteed nominal coverage, and whose length is, with high probability, of the same order as the average detection delay, up to a logarithmic factor. The corresponding support estimate enjoys control of both false negatives and false positives. Simulations confirm the effectiveness of our methodology, and we also illustrate its applicability on the U.S. excess deaths data from 2017 to 2020. The supplementary material, which contains the proofs of our theoretical results, is available online.
Collapse
Affiliation(s)
- Yudong Chen
- Statistical Laboratory, University of Cambridge, Cambridge, UK
- London School of Economics and Political Science, London, UK
| | - Tengyao Wang
- London School of Economics and Political Science, London, UK
| | | |
Collapse
|
5
|
Ryan S, Killick R. Detecting changes in covariance via random matrix theory. Technometrics 2023. [DOI: 10.1080/00401706.2023.2183261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|
6
|
Cappello L, Madrid Padilla OH, Palacios JA. Bayesian change point detection with spike and slab priors. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2182312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023]
Affiliation(s)
| | | | - Julia A. Palacios
- Departments of Statistics and Biomedical Data Science, Stanford University
| |
Collapse
|
7
|
Cui J, Wang G, Zou C, Wang Z. Change-point testing for parallel data sets with FDR control. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2023.107705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
8
|
Cai H, Wang T. Estimation of high-dimensional change-points under a group sparsity structure. Electron J Stat 2023. [DOI: 10.1214/23-ejs2116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Affiliation(s)
- Hanqing Cai
- Department of Statistical Sciences, 1–19 Torrington Place, London WC1E 7HB, United Kingdom
| | - Tengyao Wang
- Department of Statistics, London School of Economics, Columbia House, 69 Aldwych, London WC2B 4RR, United Kingdom
| |
Collapse
|
9
|
Shi X, Wang XS, Reid N. A New Class of Weighted CUSUM Statistics. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1652. [PMID: 36421507 PMCID: PMC9689417 DOI: 10.3390/e24111652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 11/09/2022] [Accepted: 11/11/2022] [Indexed: 06/16/2023]
Abstract
A change point is a location or time at which observations or data obey two different models: before and after. In real problems, we may know some prior information about the location of the change point, say at the right or left tail of the sequence. How does one incorporate the prior information into the current cumulative sum (CUSUM) statistics? We propose a new class of weighted CUSUM statistics with three different types of quadratic weights accounting for different prior positions of the change points. One interpretation of the weights is the mean duration in a random walk. Under the normal model with known variance, the exact distributions of these statistics are explicitly expressed in terms of eigenvalues. Theoretical results about the explicit difference of the distributions are valuable. The expansions of asymptotic distributions are compared with the expansion of the limit distributions of the Cramér-von Mises statistic and the Anderson and Darling statistic. We provide some extensions from independent normal responses to more interesting models, such as graphical models, the mixture of normals, Poisson, and weakly dependent models. Simulations suggest that the proposed test statistics have better power than the graph-based statistics. We illustrate their application to a detection problem with video data.
Collapse
Affiliation(s)
- Xiaoping Shi
- Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Kelowna, BC V1V 1V7, Canada
| | - Xiang-Sheng Wang
- Department of Mathematics, University of Louisiana at Lafayette, Lafayette, LA 70503, USA
| | - Nancy Reid
- Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada
| |
Collapse
|
10
|
Teng HY, Zhang Z. Two-way Truncated Linear Regression Models with Extremely Thresholding Penalization. J Am Stat Assoc 2022. [DOI: 10.1080/01621459.2022.2147074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Hao Yang Teng
- Department of Mathematics and Statistics, Arkansas State University
| | - Zhengjun Zhang
- Department of Statistics, University of Wisconsin-Madison
| |
Collapse
|
11
|
Guo Y, Gao M, Lu X. Multivariate change point detection for heterogeneous series. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.09.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
12
|
Robust inference for change points in high dimension. J MULTIVARIATE ANAL 2022. [DOI: 10.1016/j.jmva.2022.105114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
13
|
Wang X, Liu B, Zhang X. A computationally efficient and flexible algorithm for high dimensional mean and covariance matrix change point models. J Korean Stat Soc 2022. [DOI: 10.1007/s42952-022-00183-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
14
|
Follain B, Wang T, Samworth RJ. High‐dimensional changepoint estimation with heterogeneous missingness. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Bertille Follain
- Statistical LaboratoryUniversity of Cambridge CambridgeCambridgeshireUK
- Ecole Normale SupérieurePSL Research University, INRIA ParisFrance
| | - Tengyao Wang
- Department of StatisticsLondon School of Economics and Political Science LondonLondonUK
- Department of Statistical ScienceUniversity College London LondonUK
| | | |
Collapse
|
15
|
Tveten M, Eckley IA, Fearnhead P. Scalable change-point and anomaly detection in cross-correlated data with an application to condition monitoring. Ann Appl Stat 2022. [DOI: 10.1214/21-aoas1508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
16
|
Wang R, Zhu C, Volgushev S, Shao X. Inference for change points in high-dimensional data via selfnormalization. Ann Stat 2022. [DOI: 10.1214/21-aos2127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Runmin Wang
- Department of Statistical Science, Southern Methodist University
| | - Changbo Zhu
- Department of Statistics, University of California at Davis
| | | | - Xiaofeng Shao
- Department of Statistics, University of Illinois at Urbana–Champaign
| |
Collapse
|
17
|
Li J, Li Y, Hsing T. On functional processes with multiple discontinuities. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jialiang Li
- Department of Statistics and Data Science National University of Singapore Singapore Singapore
| | - Yaguang Li
- School of Management University of Science and Technology of China Hefei China
| | - Tailen Hsing
- Department of Statistics University of Michigan Ann Arbor Michigan USA
| |
Collapse
|
18
|
Liu B, Zhang X, Liu Y. High Dimensional Change Point Inference: Recent Developments and Extensions. J MULTIVARIATE ANAL 2022; 188:104833. [PMID: 35177873 PMCID: PMC8846568 DOI: 10.1016/j.jmva.2021.104833] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
Change point analysis aims to detect structural changes in a data sequence. It has always been an active research area since it was introduced in the 1950s. In modern statistical applications, however, high-throughput data with increasing dimensions are ubiquitous in fields ranging from economics, finance to genetics and engineering. For those problems, the earlier works are typically no longer applicable. As a result, the problem of testing a change point for high dimensional data sequences has been an important yet challenging task. In this paper, we first focus on models for at most one change point, and review recent state-of-art techniques for change point testing of high dimensional mean vectors and compare their theoretical properties. Based on that, we provide a survey of some extensions to general high dimensional parameters beyond mean vectors as well as strategies for testing multiple change points in high dimensions. Finally, we discuss some open problems for possible future research directions.
Collapse
Affiliation(s)
- Bin Liu
- School of Management, Fudan University, Shanghai, 200433, China
| | - Xinsheng Zhang
- School of Management, Fudan University, Shanghai, 200433, China
| | - Yufeng Liu
- Department of Statistics and Operations Research, Department of Genetics, and Department of Biostatistics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, U.S.A.,Corresponding author. . (Yufeng Liu)
| |
Collapse
|
19
|
Hahn G. Online multivariate changepoint detection with type I error control and constant time/memory updates per series. Stat Probab Lett 2022. [DOI: 10.1016/j.spl.2021.109258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
20
|
Chen Y, Wang T, Samworth RJ. High‐dimensional, multiscale online changepoint detection. J R Stat Soc Series B Stat Methodol 2022. [DOI: 10.1111/rssb.12447] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Yudong Chen
- University of Cambridge Cambridge Cambridgeshire UK
| | - Tengyao Wang
- London School of Economics and Political Science London UK
- University College London London UK
| | | |
Collapse
|
21
|
Yu M, Chen X. A robust bootstrap change point test for high-dimensional location parameter. Electron J Stat 2022. [DOI: 10.1214/21-ejs1915] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Mengjia Yu
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright Street, Champaign, IL 61820, USA
| | - Xiaohui Chen
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S. Wright Street, Champaign, IL 61820, USA
| |
Collapse
|
22
|
Gösmann J, Stoehr C, Heiny J, Dette H. Sequential change point detection in high dimensional time series. Electron J Stat 2022. [DOI: 10.1214/22-ejs2027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | | | | | - Holger Dette
- Department of Mathematics, Ruhr University Bochum
| |
Collapse
|
23
|
K. J. P, Singh N, Dayama P, Agarwal A, Pandit V. Change point detection for compositional multivariate data. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02321-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
24
|
Yu Y, Chatterjee S, Xu H. Localising change points in piecewise polynomials of general degrees. Electron J Stat 2022. [DOI: 10.1214/21-ejs1963] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Yi Yu
- Department of Statistics, University of Warwick, Coventry CV4 7AL, U.K
| | - Sabyasachi Chatterjee
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820, U.S.A
| | - Haotian Xu
- Department of Statistics, University of Warwick, Coventry CV4 7AL, U.K
| |
Collapse
|
25
|
Bian L, Cui T, Thomas Yeo BT, Fornito A, Razi A, Keith J. Identification of community structure-based brain states and transitions using functional MRI. Neuroimage 2021; 244:118635. [PMID: 34624503 PMCID: PMC8905300 DOI: 10.1016/j.neuroimage.2021.118635] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 09/29/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022] Open
Abstract
Community-based detection of discrete brain states using stochastic latent block model. Bayesian change-point detection and model selection via posterior predictive discrepancy. Markov chain Monte Carlo methods for estimation of community memberships. Distinctive brain states for varying task demands in working memory task fMRI.
Brain function relies on a precisely coordinated and dynamic balance between the functional integration and segregation of distinct networks. Characterizing the way in which brain regions reconfigure their interactions to give rise to distinct but hidden brain states remains an open challenge. In this paper, we propose a Bayesian method for characterizing community structure-based latent brain states and showcase a novel strategy based on posterior predictive discrepancy using the latent block model to detect transitions between community structures in blood oxygen level-dependent (BOLD) time series. The set of estimated parameters in the model includes a latent label vector that assigns network nodes to communities, and also block model parameters that reflect the weighted connectivity within and between communities. Besides extensive in-silico model evaluation, we also provide empirical validation (and replication) using the Human Connectome Project (HCP) dataset of 100 healthy adults. Our results obtained through an analysis of task-fMRI data during working memory performance show appropriate lags between external task demands and change-points between brain states, with distinctive community patterns distinguishing fixation, low-demand and high-demand task conditions.
Collapse
Affiliation(s)
- Lingbin Bian
- School of Mathematics, Monash University, Australia; Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Australia.
| | - Tiangang Cui
- School of Mathematics, Monash University, Australia
| | - B T Thomas Yeo
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore
| | - Alex Fornito
- Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Australia; Monash Biomedical Imaging, Monash University, Australia
| | - Adeel Razi
- Turner Institute for Brain and Mental Health, School of Psychological Sciences, Monash University, Australia; Monash Biomedical Imaging, Monash University, Australia; Wellcome Centre for Human Neuroimaging, University College London, United Kingdom; CIFAR Azrieli Global Scholars Program, CIFAR, Toronto, Canada.
| | | |
Collapse
|
26
|
Fisch ATM, Eckley IA, Fearnhead P. Subset Multivariate Collective and Point Anomaly Detection. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1987257] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
| | - Idris A. Eckley
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Paul Fearnhead
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| |
Collapse
|
27
|
Song H, Chen H. Asymptotic distribution-free change-point detection for data with repeated observations. Biometrika 2021. [DOI: 10.1093/biomet/asab048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Summary
In the regime of change-point detection, a nonparametric framework based on scan statistics utilizing graphs representing similarities among observations is gaining attention due to its flexibility and good performances for high-dimensional and non-Euclidean data sequences, which are ubiquitous in this big data era. However, this graph-based framework encounters problems when there are repeated observations in the sequence, which often happens for discrete data, such as network data. In this work, we extend the graph-based framework to solve this problem by averaging or taking union of all possible optimal graphs resulted from repeated observations. We consider both the single change-point alternative and the changed-interval alternative, and derive analytic formulas to control the Type I error for the new methods, making them fast applicable to large datasets. The extended methods are illustrated on an application in detecting changes in a sequence of dynamic networks over time. All proposed methods are implemented in an R package gSeg available on CRAN.
Collapse
Affiliation(s)
- Hoseung Song
- Department of Statistics, University of California, Davis, Davis, California 95616, U.S.A
| | - Hao Chen
- Department of Statistics, University of California, Davis, Davis, California 95616, U.S.A
| |
Collapse
|
28
|
Cho H, Kirch C. Two-stage data segmentation permitting multiscale change points, heavy tails and dependence. ANN I STAT MATH 2021. [DOI: 10.1007/s10463-021-00811-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
29
|
Chen YT, Chiou JM, Huang TM. Greedy Segmentation for a Functional Data Sequence. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1963261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yu-Ting Chen
- Department of Statistics, National Cheng Chi University, Taiwan, R.O.C.
| | - Jeng-Min Chiou
- Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C
| | - Tzee-Ming Huang
- Department of Statistics, National Cheng Chi University, Taiwan, R.O.C.
| |
Collapse
|
30
|
Chen H, Xia Y. A Normality Test for High-dimensional Data Based on the Nearest Neighbor Approach. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1953507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Hao Chen
- Department of Statistics, University of California at Davis, CA
| | - Yin Xia
- Department of Statistics, School of Management, Fudan University
| |
Collapse
|
31
|
Liu Y, Gao Y, Fang R, Cao H, Sa J, Wang J, Liu H, Wang T, Cui Y. Identifying complex gene-gene interactions: a mixed kernel omnibus testing approach. Brief Bioinform 2021; 22:6346804. [PMID: 34373892 DOI: 10.1093/bib/bbab305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 07/01/2021] [Accepted: 07/17/2021] [Indexed: 11/12/2022] Open
Abstract
Genes do not function independently; rather, they interact with each other to fulfill their joint tasks. Identification of gene-gene interactions has been critically important in elucidating the molecular mechanisms responsible for the variation of a phenotype. Regression models are commonly used to model the interaction between two genes with a linear product term. The interaction effect of two genes can be linear or nonlinear, depending on the true nature of the data. When nonlinear interactions exist, the linear interaction model may not be able to detect such interactions; hence, it suffers from substantial power loss. While the true interaction mechanism (linear or nonlinear) is generally unknown in practice, it is critical to develop statistical methods that can be flexible to capture the underlying interaction mechanism without assuming a specific model assumption. In this study, we develop a mixed kernel function which combines both linear and Gaussian kernels with different weights to capture the linear or nonlinear interaction of two genes. Instead of optimizing the weight function, we propose a grid search strategy and use a Cauchy transformation of the P-values obtained under different weights to aggregate the P-values. We further extend the two-gene interaction model to a high-dimensional setup using a de-biased LASSO algorithm. Extensive simulation studies are conducted to verify the performance of the proposed method. Application to two case studies further demonstrates the utility of the model. Our method provides a flexible and computationally efficient tool for disentangling complex gene-gene interactions associated with complex traits.
Collapse
Affiliation(s)
- Yan Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuzhao Gao
- School of Statistics, Shanxi University of Finance and Economics, Taiyuan, PR China
| | - Ruiling Fang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Hongyan Cao
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jian Sa
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Jianrong Wang
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Hongqi Liu
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Tong Wang
- Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, PR China
| | - Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| |
Collapse
|
32
|
Safikhani A, Bai Y, Michailidis G. Fast and Scalable Algorithm for Detection of Structural Breaks in Big VAR Models. J Comput Graph Stat 2021. [DOI: 10.1080/10618600.2021.1950005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Abolfazl Safikhani
- Department of Statistics and Informatics Institute, University of Florida, Gainesville, FL
| | - Yue Bai
- Department of Statistics and Informatics Institute, University of Florida, Gainesville, FL
| | - George Michailidis
- Department of Statistics and Informatics Institute, University of Florida, Gainesville, FL
| |
Collapse
|
33
|
Zhu X, Pang T. Inference on a structural break in trend with mildly integrated errors. J Korean Stat Soc 2021. [DOI: 10.1007/s42952-021-00140-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
34
|
Affiliation(s)
- Likai Chen
- Department of Mathematics and Statistics, Washington University in St. Louis, MO
| | - Weining Wang
- Department of Economics and Related Studies, University of York, New York
| | - Wei Biao Wu
- Department of Statistics, University of Chicago, Chicago, IL
| |
Collapse
|
35
|
Zhang Y, Wang R, Shao X. Adaptive Inference for Change Points in High-Dimensional Data. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1884562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Yangfan Zhang
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Runmin Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX
| | - Xiaofeng Shao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| |
Collapse
|
36
|
Korkas KK. Ensemble binary segmentation for irregularly spaced data with change-points. J Korean Stat Soc 2021. [DOI: 10.1007/s42952-021-00120-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
AbstractWe propose a new technique for consistent estimation of the number and locations of the change-points in the structure of an irregularly spaced time series. The core of the segmentation procedure is the ensemble binary segmentation method (EBS), a technique in which a large number of multiple change-point detection tasks using the binary segmentation method are applied on sub-samples of the data of differing lengths, and then the results are combined to create an overall answer. We do not restrict the total number of change-points a time series can have, therefore, our proposed method works well when the spacings between change-points are short. Our main change-point detection statistic is the time-varying autoregressive conditional duration model on which we apply a transformation process in order to decorrelate it. To examine the performance of EBS we provide a simulation study for various types of scenarios. A proof of consistency is also provided. Our methodology is implemented in the R package , available to download from CRAN.
Collapse
|
37
|
Liu H, Gao C, Samworth RJ. Minimax rates in sparse, high-dimensional change point detection. Ann Stat 2021. [DOI: 10.1214/20-aos1994] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Haoyang Liu
- Department of Statistics, University of Chicago
| | - Chao Gao
- Department of Statistics, University of Chicago
| | - Richard J. Samworth
- Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge
| |
Collapse
|
38
|
Wang D, Yu Y, Rinaldo A. Optimal change point detection and localization in sparse dynamic networks. Ann Stat 2021. [DOI: 10.1214/20-aos1953] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
39
|
Wang D, Yu Y, Rinaldo A. Optimal covariance change point localization in high dimensions. BERNOULLI 2021. [DOI: 10.3150/20-bej1249] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
40
|
Kaul A, Fotopoulos SB, Jandhyala VK, Safikhani A. Inference on the change point under a high dimensional sparse mean shift. Electron J Stat 2021. [DOI: 10.1214/20-ejs1791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
41
|
Madrid Padilla OH, Yu Y, Wang D, Rinaldo A. Optimal nonparametric change point analysis. Electron J Stat 2021. [DOI: 10.1214/21-ejs1809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Yi Yu
- Department of Statistics, University of Warwick, Coventry CV4 7AL, U.K
| | - Daren Wang
- Department of ACMS, University of Notre Dame, Notre Dame, IN 46556 USA
| | - Alessandro Rinaldo
- Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, U.S.A
| |
Collapse
|
42
|
Yu M, Chen X. Finite sample change point inference and identification for high‐dimensional mean vectors. J R Stat Soc Series B Stat Methodol 2020. [DOI: 10.1111/rssb.12406] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Mengjia Yu
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign IL USA
| | - Xiaohui Chen
- Department of Statistics University of Illinois at Urbana‐Champaign Champaign IL USA
| |
Collapse
|
43
|
|
44
|
Ren H, Zou C, Chen N, Li R. Large-Scale Datastreams Surveillance via Pattern-Oriented-Sampling. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1819295] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Haojie Ren
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
- Department of Statistics, The Pennsylvania State University at University Park, State College, PA
| | - Changliang Zou
- School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin, China
| | - Nan Chen
- Department of Industrial Systems Engineering and Management, National University of Singapore, Singapore
| | - Runze Li
- Department of Statistics, The Pennsylvania State University at University Park, State College, PA
| |
Collapse
|
45
|
Gao X, Liu Q. Sparsity identification in ultra-high dimensional quantile regression models with longitudinal data. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2019.1604966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Xianli Gao
- School of Statistics, Capital University of Economics and Business, Beijing, China
| | - Qiang Liu
- School of Statistics, Capital University of Economics and Business, Beijing, China
- Beijing Key Laboratory of Megaregions Sustainable Development Modelling, Beijing, China
| |
Collapse
|
46
|
Eckley I, Kirch C, Weber S. A novel change-point approach for the detection of gas emission sources using remotely contained concentration data. Ann Appl Stat 2020. [DOI: 10.1214/20-aoas1345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
47
|
|
48
|
Dette H, Pan G, Yang Q. Estimating a Change Point in a Sequence of Very High-Dimensional Covariance Matrices. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1785477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Holger Dette
- Fakultät für Mathematik, Ruhr-Universität Bochum, Bochum, Germany
| | - Guangming Pan
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Qing Yang
- International Institute of Finance, School of Management, University of Science and Technology of China, China
| |
Collapse
|
49
|
Liu B, Zhou C, Zhang X, Liu Y. A unified data‐adaptive framework for high dimensional change point detection. J R Stat Soc Series B Stat Methodol 2020. [DOI: 10.1111/rssb.12375] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Bin Liu
- Fudan University Shanghai People's Republic of China
| | - Cheng Zhou
- Robotics X Lab Tencent People's Republic of China
| | | | - Yufeng Liu
- University of North Carolina at Chapel Hill USA
| |
Collapse
|
50
|
Steland A. Testing and estimating change-points in the covariance matrix of a high-dimensional time series. J MULTIVARIATE ANAL 2020. [DOI: 10.1016/j.jmva.2019.104582] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|