1
|
Chang CL, Cai Q. Stock return anomalies identification during the Covid-19 with the application of a grouped multiple comparison procedure. ECONOMIC ANALYSIS AND POLICY 2023; 79:168-183. [PMID: 37346281 PMCID: PMC10261139 DOI: 10.1016/j.eap.2023.06.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 01/26/2023] [Accepted: 06/08/2023] [Indexed: 06/23/2023]
Abstract
This study investigates the impact of COVID-19 pandemic on the Chinese stock market in 2020. Using daily data of three industries, this study addresses the identification of abnormal stock returns as a multiple hypothesis testing problem and proposes to apply a grouped comparison procedure for better detection. By comparing the numbers of daily signals and numbers of stocks with abnormal positive and negative returns, the empirical result shows that the three industries perform differently under the pandemic. Compared to the non-grouped testing procedure, the signals found by the grouped procedure are more prominent, which is advantageous for some situations when there tends to be abnormal performance clustering at the occurrence of major event. This paper on stock return anomalies gives a new perspective on the impact of major events to the stock market, like the global outbreak disease.
Collapse
Affiliation(s)
- Chiu-Lan Chang
- School of Finance and Accounting, Fuzhou University of International Studies and Trade, Fuzhou, 350202, China
| | - Qingyun Cai
- Overseas Education College, Xiamen University, Xiamen 361102, China
| |
Collapse
|
2
|
Shao Z, Wang T, Zhang M, Jiang Z, Huang S, Zeng P. IUSMMT: Survival mediation analysis of gene expression with multiple DNA methylation exposures and its application to cancers of TCGA. PLoS Comput Biol 2021; 17:e1009250. [PMID: 34464378 PMCID: PMC8437300 DOI: 10.1371/journal.pcbi.1009250] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/13/2021] [Accepted: 07/06/2021] [Indexed: 02/07/2023] Open
Abstract
Effective and powerful survival mediation models are currently lacking. To partly fill such knowledge gap, we particularly focus on the mediation analysis that includes multiple DNA methylations acting as exposures, one gene expression as the mediator and one survival time as the outcome. We proposed IUSMMT (intersection-union survival mixture-adjusted mediation test) to effectively examine the existence of mediation effect by fitting an empirical three-component mixture null distribution. With extensive simulation studies, we demonstrated the advantage of IUSMMT over existing methods. We applied IUSMMT to ten TCGA cancers and identified multiple genes that exhibited mediating effects. We further revealed that most of the identified regions, in which genes behaved as active mediators, were cancer type-specific and exhibited a full mediation from DNA methylation CpG sites to the survival risk of various types of cancers. Overall, IUSMMT represents an effective and powerful alternative for survival mediation analysis; our results also provide new insights into the functional role of DNA methylation and gene expression in cancer progression/prognosis and demonstrate potential therapeutic targets for future clinical practice.
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
| |
Collapse
|
3
|
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J 2021; 19:3209-3224. [PMID: 34141140 PMCID: PMC8187160 DOI: 10.1016/j.csbj.2021.05.042] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/21/2021] [Accepted: 05/21/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor 48109, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor 48109, MI, USA
| |
Collapse
|
4
|
Cheng D, He Z, Schwartzman A. Multiple testing of local extrema for detection of change points. Electron J Stat 2020. [DOI: 10.1214/20-ejs1751] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Cai Q. A scoring criterion for rejection of clustered p-values. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2016.02.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
6
|
Wang C, Gevertz JL. Finding causative genes from high-dimensional data: an appraisal of statistical and machine learning approaches. Stat Appl Genet Mol Biol 2017; 15:321-47. [PMID: 27226102 DOI: 10.1515/sagmb-2015-0072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Modern biological experiments often involve high-dimensional data with thousands or more variables. A challenging problem is to identify the key variables that are related to a specific disease. Confounding this task is the vast number of statistical methods available for variable selection. For this reason, we set out to develop a framework to investigate the variable selection capability of statistical methods that are commonly applied to analyze high-dimensional biological datasets. Specifically, we designed six simulated cancers (based on benchmark colon and prostate cancer data) where we know precisely which genes cause a dataset to be classified as cancerous or normal - we call these causative genes. We found that not one statistical method tested could identify all the causative genes for all of the simulated cancers, even though increasing the sample size does improve the variable selection capabilities in most cases. Furthermore, certain statistical tools can classify our simulated data with a low error rate, yet the variables being used for classification are not necessarily the causative genes.
Collapse
|
7
|
A Double Application of the Benjamini-Hochberg Procedure for Testing Batched Hypotheses. Methodol Comput Appl Probab 2017. [DOI: 10.1007/s11009-016-9491-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Ignatiadis N, Klaus B, Zaugg J, Huber W. Data-driven hypothesis weighting increases detection power in genome-scale multiple testing. Nat Methods 2016; 13:577-80. [PMID: 27240256 PMCID: PMC4930141 DOI: 10.1038/nmeth.3885] [Citation(s) in RCA: 316] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 05/03/2016] [Indexed: 11/25/2022]
Abstract
Hypothesis weighting improves the power of large-scale multiple testing. We describe independent hypothesis weighting (IHW), a method that assigns weights using covariates independent of the P-values under the null hypothesis but informative of each test's power or prior probability of the null hypothesis (http://www.bioconductor.org/packages/IHW). IHW increases power while controlling the false discovery rate and is a practical approach to discovering associations in genomics, high-throughput biology and other large data sets.
Collapse
Affiliation(s)
| | - Bernd Klaus
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Judith Zaugg
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Wolfgang Huber
- European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
9
|
Hu J, Zhang L, Wang HJ. Sequential model selection-based segmentation to detect DNA copy number variation. Biometrics 2016; 72:815-26. [PMID: 26954760 DOI: 10.1111/biom.12478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Revised: 08/01/2015] [Accepted: 09/01/2015] [Indexed: 12/16/2022]
Abstract
Array-based CGH experiments are designed to detect genomic aberrations or regions of DNA copy-number variation that are associated with an outcome, typically a state of disease. Most of the existing statistical methods target on detecting DNA copy number variations in a single sample or array. We focus on the detection of group effect variation, through simultaneous study of multiple samples from multiple groups. Rather than using direct segmentation or smoothing techniques, as commonly seen in existing detection methods, we develop a sequential model selection procedure that is guided by a modified Bayesian information criterion. This approach improves detection accuracy by accumulatively utilizing information across contiguous clones, and has computational advantage over the existing popular detection methods. Our empirical investigation suggests that the performance of the proposed method is superior to that of the existing detection methods, in particular, in detecting small segments or separating neighboring segments with differential degrees of copy-number variation.
Collapse
Affiliation(s)
- Jianhua Hu
- Department of Biostatistics, UT M. D. Anderson Cancer Center, Houston, Texas 77030, U.S.A..
| | - Liwen Zhang
- School of Economics, Shanghai University, Shanghai 200444, China.
| | - Huixia Judy Wang
- Department of Statistics, George Washington University, Washington D.C. 20052, U.S.A..
| |
Collapse
|
10
|
|
11
|
|
12
|
|
13
|
Zhang Z, Lange K, Sabatti C. Reconstructing DNA copy number by joint segmentation of multiple sequences. BMC Bioinformatics 2012; 13:205. [PMID: 22897923 PMCID: PMC3534631 DOI: 10.1186/1471-2105-13-205] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2012] [Accepted: 07/27/2012] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. RESULTS We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. CONCLUSIONS The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.
Collapse
Affiliation(s)
- Zhongyang Zhang
- Department of Statistics, University of California, Los Angeles, CA, USA
| | - Kenneth Lange
- Department of Human Genetics, Biomathematics and Statistics, University of California, Los Angeles, CA, USA
| | - Chiara Sabatti
- Department of Health Research and Policy and Statistics, Stanford University, Stanford, CA, USA
| |
Collapse
|
14
|
Breheny P, Chalise P, Batzler A, Wang L, Fridley BL. Genetic association studies of copy-number variation: should assignment of copy number states precede testing? PLoS One 2012; 7:e34262. [PMID: 22493684 PMCID: PMC3320903 DOI: 10.1371/journal.pone.0034262] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 02/24/2012] [Indexed: 11/18/2022] Open
Abstract
Recently, structural variation in the genome has been implicated in many complex diseases. Using genomewide single nucleotide polymorphism (SNP) arrays, researchers are able to investigate the impact not only of SNP variation, but also of copy-number variants (CNVs) on the phenotype. The most common analytic approach involves estimating, at the level of the individual genome, the underlying number of copies present at each location. Once this is completed, tests are performed to determine the association between copy number state and phenotype. An alternative approach is to carry out association testing first, between phenotype and raw intensities from the SNP array at the level of the individual marker, and then aggregate neighboring test results to identify CNVs associated with the phenotype. Here, we explore the strengths and weaknesses of these two approaches using both simulations and real data from a pharmacogenomic study of the chemotherapeutic agent gemcitabine. Our results indicate that pooled marker-level testing is capable of offering a dramatic increase in power (> 12-fold) over CNV-level testing, particularly for small CNVs. However, CNV-level testing is superior when CNVs are large and rare; understanding these tradeoffs is an important consideration in conducting association studies of structural variation.
Collapse
Affiliation(s)
- Patrick Breheny
- Department of Biostatistics, University of Kentucky, Lexington, Kentucky, United States of America.
| | | | | | | | | |
Collapse
|
15
|
|