1
|
Wai Tsang K, Tsung F, Xu Z. Knockoff procedure for false discovery rate control in high-dimensional data streams. J Appl Stat 2023; 50:2970-2983. [PMID: 37808615 PMCID: PMC10557548 DOI: 10.1080/02664763.2023.2200496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 04/03/2023] [Indexed: 10/10/2023]
Abstract
Motivated by applications to root-cause identification of faults in high-dimensional data streams that may have very limited samples after faults are detected, we consider multiple testing in models for multivariate statistical process control (SPC). With quick fault detection, only small portion of data streams being out-of-control (OC) can be assumed. It is a long standing problem to identify those OC data streams while controlling the number of false discoveries. It is challenging due to the limited number of OC samples after the termination of the process when faults are detected. Although several false discovery rate (FDR) controlling methods have been proposed, people may prefer other methods for quick detection. With a recently developed method called Knockoff filtering, we propose a knockoff procedure that can combine with other fault detection methods in the sense that the knockoff procedure does not change the stopping time, but may identify another set of faults to control FDR. A theorem for the FDR control of the proposed procedure is provided. Simulation studies show that the proposed procedure can control FDR while maintaining high power. We also illustrate the performance in an application to semiconductor manufacturing processes that motivated this development.
Collapse
Affiliation(s)
- Ka Wai Tsang
- School of Data Science, The Chinese University of Hong Kong, ShenzhenGuangdong518172, People's Republic of China
| | - Fugee Tsung
- Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Hong Kong
| | - Zhihao Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
2
|
Xiang D, Qiu P, Wang D, Li W. Reliable Post-Signal Fault Diagnosis for Correlated High-Dimensional Data Streams. Technometrics 2021. [DOI: 10.1080/00401706.2021.1979100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Dongdong Xiang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Peihua Qiu
- Department of Biostatistics, University of Florida, Gainesville, USA
| | - Dezhi Wang
- KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai, China
- School of Mathematics and Statistics, Lanzhou University, Lanzhou, China
| | - Wendong Li
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| |
Collapse
|
3
|
Process monitoring using inflated beta regression control chart. PLoS One 2020; 15:e0236756. [PMID: 32730316 PMCID: PMC7392223 DOI: 10.1371/journal.pone.0236756] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2020] [Accepted: 07/11/2020] [Indexed: 11/23/2022] Open
Abstract
This paper provides a general framework for controlling quality characteristics related to control variables and limited to the intervals (0, 1], [0, 1), or [0, 1]. The proposed control chart is based on the inflated beta regression model considering a reparametrization of the inflated beta distribution indexed by the response mean, which is useful for modeling fractions and proportions. The contribution of the paper is twofold. First, we extend the inflated beta regression model by allowing a regression structure for the precision parameter. We also present closed-form expressions for the score vector and Fisher’s information matrix. Second, based on the proposed regression model, we introduce a new model-based control chart. The control limits are obtained considering the estimates of the inflated beta regression model parameters. We conduct a Monte Carlo simulation study to evaluate the performance of the proposed regression model estimators, and the performance of the proposed control chart is evaluated in terms of run length distribution. Finally, we present and discuss an empirical application to show the applicability of the proposed regression control chart.
Collapse
|
4
|
Affiliation(s)
- Peihua Qiu
- Department of Biostatistics, University of Florida, Gainesville, FL
| |
Collapse
|
5
|
Li W, Xiang D, Tsung F, Pu X. A Diagnostic Procedure for High-Dimensional Data Streams via Missed Discovery Rate Control. Technometrics 2019. [DOI: 10.1080/00401706.2019.1575284] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Wendong Li
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Dongdong Xiang
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
| | - Fugee Tsung
- Department of Industrial Engineering and Decision Analytics, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Xiaolong Pu
- Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, School of Statistics, East China Normal University, Shanghai, China
| |
Collapse
|
6
|
|
7
|
Koeman M, Engel J, Jansen J, Buydens L. Critical comparison of methods for fault diagnosis in metabolomics data. Sci Rep 2019; 9:1123. [PMID: 30718783 PMCID: PMC6362212 DOI: 10.1038/s41598-018-37494-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 11/20/2018] [Indexed: 11/09/2022] Open
Abstract
Platforms like metabolomics provide an unprecedented view on the chemical versatility in biomedical samples. Many diseases reflect themselves as perturbations in specific metabolite combinations. Multivariate analyses are essential to detect such combinations and associate them to specific diseases. For this, usually targeted discriminations of samples associated to a specific disease from non-diseased control samples are used. Such targeted data interpretation may not respect the heterogeneity of metabolic responses, both between diseases and within diseases. Here we show that multivariate methods that find any set of perturbed metabolites in a single patient, may be employed in combination with data collected with a single metabolomics technology to simultaneously investigate a large array of diseases. Several such untargeted data analysis approaches have been already proposed in other fields to find both expected and unexpected perturbations, e.g. in Statistical Process Control. We have critically compared several of these approaches for their sensitivity and their correct identification of the specifically perturbed metabolites. Also a new approach is introduced for this purpose. The newly introduced Sparse Mean approach, which we find here as most sensitive and best able to identify the specifically perturbed metabolites, turns metabolomics into an untargeted diagnostic platform. Aside from metabolomics, the proposed approach may greatly benefit fault diagnosis with untargeted analyses in many other fields, such as Industrial Process Control, food Adulteration Detection, and Intrusion Detection.
Collapse
Affiliation(s)
- M Koeman
- Radboud University, Institute for Molecules and Materials (IMM) Heyendaalseweg 135, 6525, AJ Nijmegen, The Netherlands
| | - J Engel
- Radboud University, Institute for Molecules and Materials (IMM) Heyendaalseweg 135, 6525, AJ Nijmegen, The Netherlands.,Biometris, Wageningen UR, Droevendaalsesteeg 1, 6708, PB Wageningen, The Netherlands
| | - J Jansen
- Radboud University, Institute for Molecules and Materials (IMM) Heyendaalseweg 135, 6525, AJ Nijmegen, The Netherlands.
| | - L Buydens
- Radboud University, Institute for Molecules and Materials (IMM) Heyendaalseweg 135, 6525, AJ Nijmegen, The Netherlands
| |
Collapse
|
8
|
Affiliation(s)
- Peihua Qiu
- Department of Biostatistics, University of Florida, Gainesville, FL
| | - Xuemin Zi
- School of Science, Tianjin University of Technology and Education, Tianjin, China
| | - Changliang Zou
- Institute of Statistic and LPMC, Nankai University, Nankai Qu, China
| |
Collapse
|
9
|
Affiliation(s)
- Giovanna Capizzi
- Department of Statistical Sciences, University of Padua, Padua, Italy
| | - Guido Masarotto
- Department of Statistical Sciences, University of Padua, Padua, Italy
| |
Collapse
|
10
|
Ing CK, Lai TL, Shen M, Tsang K, Yu SH. Multiple Testing in Regression Models With Applications to Fault Diagnosis in the Big Data Era. Technometrics 2017. [DOI: 10.1080/00401706.2016.1236755] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Ching-Kang Ing
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Tze Leung Lai
- Department of Statistics, Stanford University, Stanford, California
| | - Milan Shen
- Department of Statistics, Stanford University, Stanford, California
| | - KaWai Tsang
- Department of Statistics, Stanford University, Stanford, California
| | - Shu-Hui Yu
- Institute of Statistics, National Kaohsiung University, Kaohsiung, Taiwan
| |
Collapse
|
11
|
Cheng Y, Mukherjee A, Xie M. Simultaneously monitoring frequency and magnitude of events based on bivariate gamma distribution. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2017.1284846] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Affiliation(s)
- Yuan Cheng
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon Tong, Hong Kong
- Shenzhen Research Institute, Shenzhen, People’s Republic of China
| | - Amitava Mukherjee
- XLRI Jamshedpur, XLRI – Xavier School of Management, Production, Operations and Decision Sciences Area, Jamshedpur, Jharkhand, India
| | - Min Xie
- Department of Systems Engineering and Engineering Management, City University of Hong Kong, Kowloon Tong, Hong Kong
- Shenzhen Research Institute, Shenzhen, People’s Republic of China
| |
Collapse
|
12
|
Nie B, Du M. Identifying change-point in polynomial profiles based on data-segmentation. COMMUN STAT-SIMUL C 2016. [DOI: 10.1080/03610918.2015.1053917] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Bin Nie
- The College of Management and Economics, Tianjin University, Tianjin, Nankai, P. R. China
| | - Mengying Du
- The College of Management and Economics, Tianjin University, Tianjin, Nankai, P. R. China
| |
Collapse
|
13
|
Paynabar K, Zou C, Qiu P. A Change-Point Approach for Phase-I Analysis in Multivariate Profile Monitoring and Diagnosis. Technometrics 2016. [DOI: 10.1080/00401706.2015.1042168] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- Kamran Paynabar
- H. Milton Stewart School of Industrial & Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332
| | - Changliang Zou
- Institute of Statistics and LPMC Nankai University, 300071, Nankai, Tianjin, China
| | - Peihua Qiu
- Department of Biostatistics, University of Florida, Gainesville, FL 32611
| |
Collapse
|
14
|
Qiu P, Xiang D. Surveillance of cardiovascular diseases using a multivariate dynamic screening system. Stat Med 2015; 34:2204-21. [PMID: 25757653 DOI: 10.1002/sim.6477] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 10/11/2014] [Accepted: 02/24/2015] [Indexed: 11/12/2022]
Abstract
In the SHARe Framingham Heart Study of the National Heart, Lung and Blood Institute, one major task is to monitor several health variables (e.g., blood pressure and cholesterol level) so that their irregular longitudinal pattern can be detected as soon as possible and some medical treatments applied in a timely manner to avoid some deadly cardiovascular diseases (e.g., stroke). To handle this kind of applications effectively, we propose a new statistical methodology called multivariate dynamic screening system (MDySS) in this paper. The MDySS method combines the major strengths of the multivariate longitudinal data analysis and the multivariate statistical process control, and it makes decisions about the longitudinal pattern of a subject by comparing it with other subjects cross sectionally and by sequentially monitoring it as well. Numerical studies show that MDySS works well in practice.
Collapse
Affiliation(s)
- Peihua Qiu
- Department of Biostatistics, University of Florida, Gainesville, FL, 32611, U.S.A
| | - Dongdong Xiang
- School of Finance and Statistics, East China Normal University, Shanghai, 200241, China
| |
Collapse
|
15
|
Zou C, Wang Z, Zi X, Jiang W. An Efficient Online Monitoring Method for High-Dimensional Data Streams. Technometrics 2014. [DOI: 10.1080/00401706.2014.940089] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
16
|
Tan MHY, Shi J. A Bayesian Approach for Interpreting Mean Shifts in Multivariate Quality Control. Technometrics 2012. [DOI: 10.1080/00401706.2012.694789] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|