1
|
Sofikitou EM, Markatou M, Koutras MV. Multivariate semiparametric control charts for mixed-type data. Stat Methods Med Res 2023; 32:671-690. [PMID: 36788007 DOI: 10.1177/09622802221142528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
A useful tool that has gained popularity in the Quality Control area is the control chart which monitors a process over time, identifies potential changes, understands variations, and eventually improves the quality and performance of the process. This article introduces a new class of multivariate semiparametric control charts for monitoring multivariate mixed-type data, which comprise both continuous and discrete random variables (rvs). Our methodology leverages ideas from clustering and Statistical Process Control to develop control charts for MIxed-type data. We propose four control chart schemes based on modified versions of the KAy-means for MIxed LArge KAMILA data clustering algorithm, where we assume that the two existing clusters represent the reference and the test sample. The charts are semiparametric, the continuous rvs follow a distribution that belongs in the class of elliptical distributions. Categorical scale rvs follow a multinomial distribution. We present the algorithmic procedures and study the characteristics of the new control charts. The performance of the proposed schemes is evaluated on the basis of the False Alarm Rate and in-control Average Run Length. Finally, we demonstrate the effectiveness and applicability of our proposed methods utilizing real-world data.
Collapse
Affiliation(s)
- Elisavet M Sofikitou
- Department of Biostatistics, School of Public Health & Health Professions, State University of New York at Buffalo, Buffalo, NY, USA
| | - Marianthi Markatou
- Department of Biostatistics, School of Public Health & Health Professions, State University of New York at Buffalo, Buffalo, NY, USA
| | - Markos V Koutras
- Department of Statistics & Insurance Science, School of Finance & Statistics, 69000University of Piraeus, Pireas, Greece
| |
Collapse
|
3
|
Rosenblatt JD, Benjamini Y, Gilron R, Mukamel R, Goeman JJ. Better-than-chance classification for signal detection. Biostatistics 2019; 22:365-380. [PMID: 31612223 DOI: 10.1093/biostatistics/kxz035] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/14/2019] [Indexed: 11/13/2022] Open
Abstract
The estimated accuracy of a classifier is a random quantity with variability. A common practice in supervised machine learning, is thus to test if the estimated accuracy is significantly better than chance level. This method of signal detection is particularly popular in neuroimaging and genetics. We provide evidence that using a classifier's accuracy as a test statistic can be an underpowered strategy for finding differences between populations, compared to a bona fide statistical test. It is also computationally more demanding than a statistical test. Via simulation, we compare test statistics that are based on classification accuracy, to others based on multivariate test statistics. We find that the probability of detecting differences between two distributions is lower for accuracy-based statistics. We examine several candidate causes for the low power of accuracy-tests. These causes include: the discrete nature of the accuracy-test statistic, the type of signal accuracy-tests are designed to detect, their inefficient use of the data, and their suboptimal regularization. When the purpose of the analysis is the evaluation of a particular classifier, not signal detection, we suggest several improvements to increase power. In particular, to replace V-fold cross-validation with the Leave-One-Out Bootstrap.
Collapse
Affiliation(s)
- Jonathan D Rosenblatt
- Department of IE&M and Zlotowsky Center for Neuroscience, Ben Gurion University of the Negev, P.O. 653, Beer Sheva, 84105 Israel
| | - Yuval Benjamini
- Department of Statistics, Hebrew University, Mount Scopus, Jerusalem 9190501, Israel
| | - Roee Gilron
- Movement Disorders and Neuromodulation Center, University of California, 1635 Divisadero St, San Francisco, CA 94115, USA
| | - Roy Mukamel
- School of Psychological Sciences, and Sagol School of Neuroscience, Tel-Aviv University, Tel-Aviv 69978, Israel
| | - Jelle J Goeman
- Department of Biomedical Data Sciences, Leiden University Medical Center, Postbus 9600, 2300 RC Leiden, The Netherlands
| |
Collapse
|
4
|
Zhang H, Wheeler W, Wang Z, Taylor PR, Yu K. A fast and powerful tree-based association test for detecting complex joint effects in case-control studies. Bioinformatics 2014; 30:2171-8. [PMID: 24794927 DOI: 10.1093/bioinformatics/btu186] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Multivariate tests derived from the logistic regression model are widely used to assess the joint effect of multiple predictors on a disease outcome in case-control studies. These tests become less optimal if the joint effect cannot be approximated adequately by the additive model. The tree-structure model is an attractive alternative, as it is more apt to capture non-additive effects. However, the tree model is used most commonly for prediction and seldom for hypothesis testing, mainly because of the computational burden associated with the resampling-based procedure required for estimating the significance level. RESULTS We designed a fast algorithm for building the tree-structure model and proposed a robust TREe-based Association Test (TREAT) that incorporates an adaptive model selection procedure to identify the optimal tree model representing the joint effect. We applied TREAT as a multilocus association test on >20 000 genes/regions in a study of esophageal squamous cell carcinoma (ESCC) and detected a highly significant novel association between the gene CDKN2B and ESCC ([Formula: see text]). We also demonstrated, through simulation studies, the power advantage of TREAT over other commonly used tests. AVAILABILITY AND IMPLEMENTATION The package TREAT is freely available for download at http://www.hanzhang.name/softwares/treat, implemented in C++ and R and supported on 64-bit Linux and 64-bit MS Windows. CONTACT yuka@mail.nih.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Han Zhang
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USA
| | - William Wheeler
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USA
| | - Zhaoming Wang
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USABiostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USA
| | - Philip R Taylor
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USA
| | - Kai Yu
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA, Information Management Services, Inc., Silver Spring, Maryland 20904, USA, and Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland 20877, USA
| |
Collapse
|
5
|
Chan CC, Fisson S, Bodaghi B. The future of primary intraocular lymphoma (retinal lymphoma). Ocul Immunol Inflamm 2010; 17:375-9. [PMID: 20001255 DOI: 10.3109/09273940903434804] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Basic science and clinical investigations in cancer research have contributed to our understanding of the genetic causes of various neoplasms and discovery of novel therapeutic interventions to fight malignancies such as lymphoma. During this exciting time, we have witnessed the advent of new technologies to further characterize primary intraocular lymphoma (PIOL), or retinal lymphoma, which is selected as the first "Disease of the Year" by Ocular Immunology and Inflammation. Different comprehensive aspects of PIOL, including epidemiology, clinical manifestations, diagnosis, pathophysiology, therapy, and animal models are discussed. The future of PIOL holds an opportunity to really understand the unique cytologic, histopathologic, physiological and immunologic features, as well as the genotypic traits (gene expression, interaction, polymorphism, epigenetics, etc.) and epidemiology. This information will empower us to truly make a difference in patients' managements with this devastating disease. While most of this technology already exists, much work still needs to be done to make translational therapy a reality for PIOL patients in the future.
Collapse
Affiliation(s)
- Chi-Chao Chan
- Immunopathology Section, Laboratory of Immunology, National Eye Institute, National Institutes of Health, Bethesda, MD 20895, USA.
| | | | | |
Collapse
|
6
|
Yu K, Wheeler W, Li Q, Bergen AW, Caporaso N, Chatterjee N, Chen J. A partially linear tree-based regression model for multivariate outcomes. Biometrics 2009; 66:89-96. [PMID: 19432770 DOI: 10.1111/j.1541-0420.2009.01235.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
In the genetic study of complex traits, especially behavior related ones, such as smoking and alcoholism, usually several phenotypic measurements are obtained for the description of the complex trait, but no single measurement can quantify fully the complicated characteristics of the symptom because of our lack of understanding of the underlying etiology. If those phenotypes share a common genetic mechanism, rather than studying each individual phenotype separately, it is more advantageous to analyze them jointly as a multivariate trait to enhance the power to identify associated genes. We propose a multilocus association test for the study of multivariate traits. The test is derived from a partially linear tree-based regression model for multiple outcomes. This novel tree-based model provides a formal statistical testing framework for the evaluation of the association between a multivariate outcome and a set of candidate predictors, such as markers within a gene or pathway, while accommodating adjustment for other covariates. Through simulation studies we show that the proposed method has an acceptable type I error rate and improved power over the univariate outcome analysis, which studies each component of the complex trait separately with multiple-comparison adjustment. A candidate gene association study of multiple smoking-related phenotypes is used to demonstrate the application and advantages of this new method. The proposed method is general enough to be used for the assessment of the joint effect of a set of multiple risk factors on a multivariate outcome in other biomedical research settings.
Collapse
Affiliation(s)
- Kai Yu
- Division of Cancer Epidemiology and Genetics, NCI, Rockville, Maryland 20892, USA.
| | | | | | | | | | | | | |
Collapse
|