Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Sun W, Wei Z. Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments. J Am Stat Assoc 2011. [DOI: 10.1198/jasa.2011.ap09587] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

For:	Sun W, Wei Z. Multiple Testing for Pattern Identification, With Applications to Microarray Time-Course Experiments. J Am Stat Assoc 2011. [DOI: 10.1198/jasa.2011.ap09587] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Number

Cited by Other Article(s)

Rastaghi S, Saki A, Tabesh H. Modifying the false discovery rate procedure based on the information theory under arbitrary correlation structure and its performance in high-dimensional genomic data. BMC Bioinformatics 2024;25:57. [PMID: 38317067 PMCID: PMC10840263 DOI: 10.1186/s12859-024-05678-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 01/26/2024] [Indexed: 02/07/2024] Open

Abstract

BACKGROUND

Controlling the False Discovery Rate (FDR) in Multiple Comparison Procedures (MCPs) has widespread applications in many scientific fields. Previous studies show that the correlation structure between test statistics increases the variance and bias of FDR. The objective of this study is to modify the effect of correlation in MCPs based on the information theory. We proposed three modified procedures (M1, M2, and M3) under strong, moderate, and mild assumptions based on the conditional Fisher Information of the consecutive sorted test statistics for controlling the false discovery rate under arbitrary correlation structure. The performance of the proposed procedures was compared with the Benjamini-Hochberg (BH) and Benjamini-Yekutieli (BY) procedures in simulation study and real high-dimensional data of colorectal cancer gene expressions. In the simulation study, we generated 1000 differential multivariate Gaussian features with different levels of the correlation structure and screened the significance features by the FDR controlling procedures, with strong control on the Family Wise Error Rates.

RESULTS

When there was no correlation between 1000 simulated features, the performance of the BH procedure was similar to the three proposed procedures. In low to medium correlation structures the BY procedure is too conservative. The BH procedure is too liberal, and the mean number of screened features was constant at the different levels of the correlation between features. The mean number of screened features by proposed procedures was between BY and BH procedures and reduced when the correlations increased. Where the features are highly correlated the number of screened features by proposed procedures reached the Bonferroni (BF) procedure, as expected. In real data analysis the BY, BH, M1, M2, and M3 procedures were done to screen gene expressions of colorectal cancer. To fit a predictive model based on the screened features the Efficient Bayesian Logistic Regression (EBLR) model was used. The fitted EBLR models based on the screened features by M1 and M2 procedures have minimum entropies and are more efficient than BY and BH procedures.

CONCLUSION

The modified proposed procedures based on information theory, are much more flexible than BH and BY procedures for the amount of correlation between test statistics. The modified procedures avoided screening the non-informative features and so the number of screened features reduced with the increase in the level of correlation.

Collapse

Hahn G, Novak T, Crawford JC, Randolph AG, Lange C. Longitudinal Analysis of Contrasts in Gene Expression Data. Genes (Basel) 2023;14:1134. [PMID: 37372314 DOI: 10.3390/genes14061134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 05/19/2023] [Accepted: 05/21/2023] [Indexed: 06/29/2023] Open

Sarkar SK, Zhao Z. Local false discovery rate based methods for multiple testing of one-way classified hypotheses. Electron J Stat 2022. [DOI: 10.1214/22-ejs2080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Du L, Guo X, Sun W, Zou C. False Discovery Rate Control Under General Dependence By Symmetrized Data Aggregation. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1945459] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Tian T, Cheng R, Wei Z. An empirical Bayes change-point model for transcriptome time-course data. Ann Appl Stat 2021. [DOI: 10.1214/20-aoas1403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Fu L, Gang B, James GM, Sun W. Heteroscedasticity-Adjusted Ranking and Thresholding for Large-Scale Multiple Testing. J Am Stat Assoc 2020. [DOI: 10.1080/01621459.2020.1840992] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Heller R, Rosset S. Optimal control of false discovery criteria in the two‐group model. J R Stat Soc Series B Stat Methodol 2020. [DOI: 10.1111/rssb.12403] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Adaptive controls of FWER and FDR under block dependence. J Stat Plan Inference 2020. [DOI: 10.1016/j.jspi.2018.03.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Zhao H, Cui X. Constructing confidence intervals for selected parameters. Biometrics 2020;76:1098-1108. [PMID: 31975369 DOI: 10.1111/biom.13222] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 12/28/2019] [Accepted: 01/08/2020] [Indexed: 11/27/2022]

Banerjee T, Mukherjee G, Sun W. Adaptive Sparse Estimation With Side Information. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1679639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Xia Y, Cai TT, Sun W. GAP: A General Framework for Information Pooling in Two-Sample Sparse Inference. J Am Stat Assoc 2019. [DOI: 10.1080/01621459.2019.1611585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Bhattacharjee A, Vishwakarma GK. Time-course data prediction for repeatedly measured gene expression. INT J BIOMATH 2019. [DOI: 10.1142/s1793524519500335] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Tony Cai T, Sun W, Wang W. Covariate‐assisted ranking and screening for large‐scale two‐sample inference. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12304] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Bogomolov M, Heller R. Assessing replicability of findings across two studies of multiple features. Biometrika 2018. [DOI: 10.1093/biomet/asy029] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Sun J, Herazo-Maya JD, Kaminski N, Zhao H, Warren JL. A Dirichlet process mixture model for clustering longitudinal gene expression data. Stat Med 2017;36:3495-3506. [PMID: 28620908 PMCID: PMC5583037 DOI: 10.1002/sim.7374] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Revised: 04/15/2017] [Accepted: 05/23/2017] [Indexed: 12/27/2022]

Zhao H, Fung WK. A powerful FDR control procedure for multiple hypotheses. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2015.12.013] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Zhao H, Zhang J. Weighted p-value procedures for controlling FDR of grouped hypotheses. J Stat Plan Inference 2014. [DOI: 10.1016/j.jspi.2014.04.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Martini P, Sales G, Calura E, Cagnin S, Chiogna M, Romualdi C. timeClip: pathway analysis for time course data without replicates. BMC Bioinformatics 2014;15 Suppl 5:S3. [PMID: 25077979 PMCID: PMC4095003 DOI: 10.1186/1471-2105-15-s5-s3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abstract

Background

Time-course gene expression experiments are useful tools for exploring biological processes. In this type of experiments, gene expression changes are monitored along time. Unfortunately, replication of time series is still costly and usually long time course do not have replicates. Many approaches have been proposed to deal with this data structure, but none of them in the field of pathway analysis. Pathway analyses have acquired great relevance for helping the interpretation of gene expression data. Several methods have been proposed to this aim: from the classical enrichment to the more complex topological analysis that gains power from the topology of the pathway. None of them were devised to identify temporal variations in time course data.

Results

Here we present timeClip, a topology based pathway analysis specifically tailored to long time series without replicates. timeClip combines dimension reduction techniques and graph decomposition theory to explore and identify the portion of pathways that is most time-dependent. In the first step, timeClip selects the time-dependent pathways; in the second step, the most time dependent portions of these pathways are highlighted. We used timeClip on simulated data and on a benchmark dataset regarding mouse muscle regeneration model. Our approach shows good performance on different simulated settings. On the real dataset, we identify 76 time-dependent pathways, most of which known to be involved in the regeneration process. Focusing on the 'mTOR signaling pathway' we highlight the timing of key processes of the muscle regeneration: from the early pathway activation through growth factor signals to the late burst of protein production needed for the fiber regeneration.

Conclusions

timeClip represents a new improvement in the field of time-dependent pathway analysis. It allows to isolate and dissect pathways characterized by time-dependent components. Furthermore, using timeClip on a mouse muscle regeneration dataset we were able to characterize the process of muscle fiber regeneration with its correct timing.

Collapse

Li Y, Ghosh D. A two-step hierarchical hypothesis set testing framework, with applications to gene expression data on ordered categories. BMC Bioinformatics 2014;15:108. [PMID: 24731138 PMCID: PMC4000433 DOI: 10.1186/1471-2105-15-108] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Accepted: 04/09/2014] [Indexed: 11/10/2022] Open

Abstract

Background

In complex large-scale experiments, in addition to simultaneously considering a large number of features, multiple hypotheses are often being tested for each feature. This leads to a problem of multi-dimensional multiple testing. For example, in gene expression studies over ordered categories (such as time-course or dose-response experiments), interest is often in testing differential expression across several categories for each gene. In this paper, we consider a framework for testing multiple sets of hypothesis, which can be applied to a wide range of problems.

Results

We adopt the concept of the overall false discovery rate (OFDR) for controlling false discoveries on the hypothesis set level. Based on an existing procedure for identifying differentially expressed gene sets, we discuss a general two-step hierarchical hypothesis set testing procedure, which controls the overall false discovery rate under independence across hypothesis sets. In addition, we discuss the concept of the mixed-directional false discovery rate (mdFDR), and extend the general procedure to enable directional decisions for two-sided alternatives. We applied the framework to the case of microarray time-course/dose-response experiments, and proposed three procedures for testing differential expression and making multiple directional decisions for each gene. Simulation studies confirm the control of the OFDR and mdFDR by the proposed procedures under independence and positive correlations across genes. Simulation results also show that two of our new procedures achieve higher power than previous methods. Finally, the proposed methodology is applied to a microarray dose-response study, to identify 17 β-estradiol sensitive genes in breast cancer cells that are induced at low concentrations.

Conclusions

The framework we discuss provides a platform for multiple testing procedures covering situations involving two (or potentially more) sources of multiplicity. The framework is easy to use and adaptable to various practical settings that frequently occur in large-scale experiments. Procedures generated from the framework are shown to maintain control of the OFDR and mdFDR, quantities that are especially relevant in the case of multiple hypothesis set testing. The procedures work well in both simulations and real datasets, and are shown to have better power than existing methods.

Collapse

Heller R, Yekutieli D. Replicability analysis for genome-wide association studies. Ann Appl Stat 2014. [DOI: 10.1214/13-aoas697] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Zhao Z, Wang W, Wei Z. An empirical Bayes testing procedure for detecting variants in analysis of next generation sequencing data. Ann Appl Stat 2013. [DOI: 10.1214/13-aoas660] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Benjamini Y, Bogomolov M. Selective inference on multiple families of hypotheses. J R Stat Soc Series B Stat Methodol 2013. [DOI: 10.1111/rssb.12028] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Wu S, Wu H. More powerful significant testing for time course gene expression data using functional principal component analysis approaches. BMC Bioinformatics 2013;14:6. [PMID: 23323795 PMCID: PMC3617096 DOI: 10.1186/1471-2105-14-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 11/07/2012] [Indexed: 11/24/2022] Open

Wang K, Ng SK, McLachlan GJ. Clustering of time-course gene expression profiles using normal mixture models with autoregressive random effects. BMC Bioinformatics 2012;13:300. [PMID: 23151154 PMCID: PMC3574839 DOI: 10.1186/1471-2105-13-300] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Accepted: 11/07/2012] [Indexed: 11/26/2022] Open

Abstract

Background

Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models.

Results

We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases.

Conclusions

Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data.

Collapse

Wei Z, Wang W, Hu P, Lyon GJ, Hakonarson H. SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res 2011;39:e132. [PMID: 21813454 PMCID: PMC3201884 DOI: 10.1093/nar/gkr599] [Citation(s) in RCA: 176] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open