1
|
Li Y, Zhou X, Chen R, Zhang X, Cao H. STAREG: Statistical replicability analysis of high throughput experiments with applications to spatial transcriptomic studies. PLoS Genet 2024; 20:e1011423. [PMID: 39361716 DOI: 10.1371/journal.pgen.1011423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 10/15/2024] [Accepted: 09/10/2024] [Indexed: 10/05/2024] Open
Abstract
Replicable signals from different yet conceptually related studies provide stronger scientific evidence and more powerful inference. We introduce STAREG, a statistical method for replicability analysis of high throughput experiments, and apply it to analyze spatial transcriptomic studies. STAREG uses summary statistics from multiple studies of high throughput experiments and models the the joint distribution of p-values accounting for the heterogeneity of different studies. It effectively controls the false discovery rate (FDR) and has higher power by information borrowing. Moreover, it provides different rankings of important genes. With the EM algorithm in combination with pool-adjacent-violator-algorithm (PAVA), STAREG is scalable to datasets with millions of genes without any tuning parameters. Analyzing two pairs of spatially resolved transcriptomic datasets, we are able to make biological discoveries that otherwise cannot be obtained by using existing methods.
Collapse
Affiliation(s)
- Yan Li
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin, China
- School of Mathematics, Jilin University, Changchun, Jilin, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Hongyuan Cao
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
| |
Collapse
|
2
|
Dreyfuss JM, Djordjilović V, Pan H, Bussberg V, MacDonald AM, Narain NR, Kiebish MA, Blüher M, Tseng YH, Lynes MD. ScreenDMT reveals DiHOMEs are replicably inversely associated with BMI and stimulate adipocyte calcium influx. Commun Biol 2024; 7:996. [PMID: 39143411 PMCID: PMC11324735 DOI: 10.1038/s42003-024-06646-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 07/29/2024] [Indexed: 08/16/2024] Open
Abstract
Activating brown adipose tissue (BAT) improves systemic metabolism, making it a promising target for metabolic syndrome. BAT is activated by 12,13-dihydroxy-9Z-octadecenoic acid (12,13-diHOME), which we previously identified to be inversely associated with BMI and which directly improves metabolism in multiple tissues. Here we profile plasma lipidomics from 83 people and test which lipids' association with BMI replicates in a concordant direction using our novel tool ScreenDMT, whose power and validity we demonstrate via mathematical proofs and simulations. We find that the linoleic acid diols 12,13-diHOME and 9,10-diHOME are both replicably inversely associated with BMI and mechanistically activate calcium influx in mouse brown and white adipocytes in vitro, which implicates this signaling pathway and 9,10-diHOME as candidate therapeutic targets. ScreenDMT can be applied to test directional mediation, directional replication, and qualitative interactions, such as identifying biomarkers whose association is shared (replication) or opposite (qualitative interaction) across diverse populations.
Collapse
Affiliation(s)
- Jonathan M Dreyfuss
- Bioinformatics & Biostatistics Core, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
| | - Vera Djordjilović
- Department of Economics, Ca' Foscari University of Venice, Cannaregio 873, Venice, Italy
| | - Hui Pan
- Bioinformatics & Biostatistics Core, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | - Matthias Blüher
- Helmholtz Institute for Metabolic, Obesity and Vascular Research (HI-MAG) of the Helmholtz Zentrum München at the University of Leipzig and University Hospital, Leipzig, Germany
| | - Yu-Hua Tseng
- Integrative Physiology and Metabolism, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Matthew D Lynes
- Center for Molecular Medicine, MaineHealth Institute for Research, Scarborough, ME, USA.
- Department of Medicine, MaineHealth, Portland, ME, USA.
- Roux Institute at Northeastern University, Portland, ME, USA.
| |
Collapse
|
3
|
Amar D, Gay NR, Jimenez-Morales D, Jean Beltran PM, Ramaker ME, Raja AN, Zhao B, Sun Y, Marwaha S, Gaul DA, Hershman SG, Ferrasse A, Xia A, Lanza I, Fernández FM, Montgomery SB, Hevener AL, Ashley EA, Walsh MJ, Sparks LM, Burant CF, Rector RS, Thyfault J, Wheeler MT, Goodpaster BH, Coen PM, Schenk S, Bodine SC, Lindholm ME. The mitochondrial multi-omic response to exercise training across rat tissues. Cell Metab 2024; 36:1411-1429.e10. [PMID: 38701776 PMCID: PMC11152996 DOI: 10.1016/j.cmet.2023.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 09/27/2023] [Accepted: 12/15/2023] [Indexed: 05/05/2024]
Abstract
Mitochondria have diverse functions critical to whole-body metabolic homeostasis. Endurance training alters mitochondrial activity, but systematic characterization of these adaptations is lacking. Here, the Molecular Transducers of Physical Activity Consortium mapped the temporal, multi-omic changes in mitochondrial analytes across 19 tissues in male and female rats trained for 1, 2, 4, or 8 weeks. Training elicited substantial changes in the adrenal gland, brown adipose, colon, heart, and skeletal muscle. The colon showed non-linear response dynamics, whereas mitochondrial pathways were downregulated in brown adipose and adrenal tissues. Protein acetylation increased in the liver, with a shift in lipid metabolism, whereas oxidative proteins increased in striated muscles. Exercise-upregulated networks were downregulated in human diabetes and cirrhosis. Knockdown of the central network protein 17-beta-hydroxysteroid dehydrogenase 10 (HSD17B10) elevated oxygen consumption, indicative of metabolic stress. We provide a multi-omic, multi-tissue, temporal atlas of the mitochondrial response to exercise training and identify candidates linked to mitochondrial dysfunction.
Collapse
Affiliation(s)
- David Amar
- Stanford University, Stanford, CA, USA; Insitro, San Francisco, CA, USA
| | | | | | | | | | | | | | - Yifei Sun
- Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | | | - David A Gaul
- Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | - Ashley Xia
- National Institutes of Health, Bethesda, MD, USA
| | | | | | | | | | | | - Martin J Walsh
- Icahn School of Medicine at Mount Sinai, New York City, NY, USA
| | - Lauren M Sparks
- Translational Research Institute AdventHealth, Orlando, FL, USA
| | | | | | - John Thyfault
- University of Kansas Medical Center, Kansas City, KS, USA
| | | | | | - Paul M Coen
- Translational Research Institute AdventHealth, Orlando, FL, USA
| | - Simon Schenk
- University of California, San Diego, La Jolla, CA, USA
| | - Sue C Bodine
- Oklahoma Medical Research Foundation, Oklahoma City, OK, USA
| | | |
Collapse
|
4
|
Li Y, Lei H, Wen X, Cao H. A powerful approach to identify replicable variants in genome-wide association studies. Am J Hum Genet 2024; 111:966-978. [PMID: 38701746 PMCID: PMC11080610 DOI: 10.1016/j.ajhg.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 05/05/2024] Open
Abstract
Replicability is the cornerstone of modern scientific research. Reliable identifications of genotype-phenotype associations that are significant in multiple genome-wide association studies (GWASs) provide stronger evidence for the findings. Current replicability analysis relies on the independence assumption among single-nucleotide polymorphisms (SNPs) and ignores the linkage disequilibrium (LD) structure. We show that such a strategy may produce either overly liberal or overly conservative results in practice. We develop an efficient method, ReAD, to detect replicable SNPs associated with the phenotype from two GWASs accounting for the LD structure. The local dependence structure of SNPs across two heterogeneous studies is captured by a four-state hidden Markov model (HMM) built on two sequences of p values. By incorporating information from adjacent locations via the HMM, our approach provides more accurate SNP significance rankings. ReAD is scalable, platform independent, and more powerful than existing replicability analysis methods with effective false discovery rate control. Through analysis of datasets from two asthma GWASs and two ulcerative colitis GWASs, we show that ReAD can identify replicable genetic loci that existing methods might otherwise miss.
Collapse
Affiliation(s)
- Yan Li
- School of Computer Science and Technology, Changchun University of Science and Technology, Changchun, Jilin 130022, China; School of Mathematics, Jilin University, Changchun, Jilin 130012, China
| | - Haochen Lei
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA
| | - Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hongyuan Cao
- Department of Statistics, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
5
|
Vetr NG, Gay NR, Montgomery SB. The impact of exercise on gene regulation in association with complex trait genetics. Nat Commun 2024; 15:3346. [PMID: 38693125 PMCID: PMC11063075 DOI: 10.1038/s41467-024-45966-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 02/01/2024] [Indexed: 05/03/2024] Open
Abstract
Endurance exercise training is known to reduce risk for a range of complex diseases. However, the molecular basis of this effect has been challenging to study and largely restricted to analyses of either few or easily biopsied tissues. Extensive transcriptome data collected across 15 tissues during exercise training in rats as part of the Molecular Transducers of Physical Activity Consortium has provided a unique opportunity to clarify how exercise can affect tissue-specific gene expression and further suggest how exercise adaptation may impact complex disease-associated genes. To build this map, we integrate this multi-tissue atlas of gene expression changes with gene-disease targets, genetic regulation of expression, and trait relationship data in humans. Consensus from multiple approaches prioritizes specific tissues and genes where endurance exercise impacts disease-relevant gene expression. Specifically, we identify a total of 5523 trait-tissue-gene triplets to serve as a valuable starting point for future investigations [Exercise; Transcription; Human Phenotypic Variation].
Collapse
|
6
|
Dai R, Zheng C. False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach. Biometrics 2023; 79:3497-3509. [PMID: 36854821 PMCID: PMC10460825 DOI: 10.1111/biom.13848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 02/17/2023] [Indexed: 03/02/2023]
Abstract
False discovery rate (FDR) controlling procedures provide important statistical guarantees for replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us with opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (Simultaneous knockoffs) to identify mutual signals from multiple independent datasets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real-data examples.
Collapse
Affiliation(s)
- Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, U.S.A
| | | |
Collapse
|
7
|
Dreyfuss JM, Djordjilovic V, Pan H, Bussberg V, MacDonald AM, Narain NR, Kiebish MA, Blüher M, Tseng YH, Lynes MD. ScreenDMT reveals linoleic acid diols replicably associate with BMI and stimulate adipocyte calcium fluxes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.12.548737. [PMID: 37503007 PMCID: PMC10369939 DOI: 10.1101/2023.07.12.548737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Activating brown adipose tissue (BAT) improves systemic metabolism, making it a promising target for metabolic syndrome. BAT is activated by 12, 13-dihydroxy-9Z-octadecenoic acid (12, 13-diHOME), which we previously identified to be inversely associated with BMI and which directly improves metabolism in multiple tissues. Here we profile plasma lipidomics from a cohort of 83 people and test which lipids' association with BMI replicates in a concordant direction using our novel tool ScreenDMT, whose power and validity we demonstrate via mathematical proofs and simulations. We find that the linoleic acid diols 12, 13-diHOME and 9, 10-diHOME both replicably inversely associate with BMI and mechanistically activate calcium fluxes in mouse brown and white adipocytes in vitro, which implicates this pathway and 9, 10-diHOME as candidate therapeutic targets. ScreenDMT can be applied to test directional mediation, directional replication, and qualitative interactions, such as identifying biomarkers whose association is shared (replication) or opposite (qualitative interaction) across diverse populations.
Collapse
Affiliation(s)
- Jonathan M. Dreyfuss
- Bioinformatics & Biostatistics Core, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
| | - Vera Djordjilovic
- Department of Economics, Ca’ Foscari University of Venice, Cannaregio 873, Venice, Italy
| | - Hui Pan
- Bioinformatics & Biostatistics Core, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
| | | | | | | | | | - Matthias Blüher
- Helmholtz Institute for Metabolic, Obesity and Vascular Research (HI-MAG) of the Helmholtz Zentrum München at the University of Leipzig and University Hospital, Leipzig, Germany
| | - Yu-Hua Tseng
- Integrative Physiology and Metabolism, Joslin Diabetes Center, Harvard Medical School, Boston, MA, USA
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA, USA
| | - Matthew D. Lynes
- Center for Molecular Medicine, MaineHealth Institute for Research, Scarborough, ME, USA
- Department of Medicine, MaineHealth, Portland, ME, USA
- Roux Institute at Northeastern University, Portland, ME, USA
| |
Collapse
|
8
|
Bogomolov M. Testing partial conjunction hypotheses under dependency, with applications to meta-analysis. Electron J Stat 2023. [DOI: 10.1214/22-ejs2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Marina Bogomolov
- Faculty of Data and Decision Sciences, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
9
|
Lee W, Lee D, Pawitan Y. Overall assessment for selected markers from high-throughput data. Stat Med 2022; 41:5830-5843. [PMID: 36270585 DOI: 10.1002/sim.9596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/31/2022] [Accepted: 10/04/2022] [Indexed: 12/15/2022]
Abstract
Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined P $$ P $$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.
Collapse
Affiliation(s)
- Woojoo Lee
- Department of Public Health Science, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
10
|
CLIMB: High-dimensional association detection in large scale genomic data. Nat Commun 2022; 13:6874. [PMID: 36371401 PMCID: PMC9653391 DOI: 10.1038/s41467-022-34360-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 10/21/2022] [Indexed: 11/14/2022] Open
Abstract
Joint analyses of genomic datasets obtained in multiple different conditions are essential for understanding the biological mechanism that drives tissue-specificity and cell differentiation, but they still remain computationally challenging. To address this we introduce CLIMB (Composite LIkelihood eMpirical Bayes), a statistical methodology that learns patterns of condition-specificity present in genomic data. CLIMB provides a generic framework facilitating a host of analyses, such as clustering genomic features sharing similar condition-specific patterns and identifying which of these features are involved in cell fate commitment. We apply CLIMB to three sets of hematopoietic data, which examine CTCF ChIP-seq measured in 17 different cell populations, RNA-seq measured across constituent cell populations in three committed lineages, and DNase-seq in 38 cell populations. Our results show that CLIMB improves upon existing alternatives in statistical precision, while capturing interpretable and biologically relevant clusters in the data.
Collapse
|
11
|
Wang J, Gui L, Su WJ, Sabatti C, Owen AB. Detecting multiple replicating signals using adaptive filtering procedures. Ann Stat 2022. [DOI: 10.1214/21-aos2139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Jingshu Wang
- Department of Statistics, The University of Chicago
| | - Lin Gui
- Department of Statistics, The University of Chicago
| | - Weijie J. Su
- Department of Statistics and Data Science, University of Pennsylvania
| | | | - Art B. Owen
- Department of Statistics, Stanford University
| |
Collapse
|
12
|
Roquain E, Verzelen N. False discovery rate control with unknown null distribution: Is it possible to mimic the oracle? Ann Stat 2022. [DOI: 10.1214/21-aos2141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Etienne Roquain
- Université de Paris and Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation
| | | |
Collapse
|
13
|
Mary D, Roquain E. Semi-supervised multiple testing. Electron J Stat 2022. [DOI: 10.1214/22-ejs2050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- David Mary
- Université Côte d’Azur, Observatoire de la Côte d’Azur, CNRS, Laboratoire Lagrange, Bd de l’Observatoire, CS 34229, 06304, Nice cedex 4, France
| | - Etienne Roquain
- Laboratoire de Probabilités, Statistique et Modélisation, Sorbonne Université, Université de Paris & CNRS, 4, place Jussieu, 75005 Paris, France
| |
Collapse
|
14
|
Generalizing research findings for enhanced reproducibility: an approach based on verbal alternative representations. Scientometrics 2021. [DOI: 10.1007/s11192-021-03914-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
15
|
Hoang AT, Dickhaus T. Randomized p -values for multiple testing and their application in replicability analysis. Biom J 2021; 64:384-409. [PMID: 33464615 DOI: 10.1002/bimj.202000155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 10/01/2020] [Accepted: 11/02/2020] [Indexed: 11/11/2022]
Abstract
We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional p -values, which are computed under least favorable parameter configurations (LFCs), are over-conservative in the case of composite null hypotheses. As demonstrated in prior work, this poses severe challenges in the multiple testing context, especially when one goal of the statistical analysis is to estimate the proportion π 0 of true null hypotheses. Randomized p -values have been proposed to remedy this issue. In the present work, we discuss the application of randomized p -values in replicability analysis. In particular, we introduce a general class of statistical models for which valid, randomized p -values can be calculated easily. By means of computer simulations, we demonstrate that their usage typically leads to a much more accurate estimation of π 0 than the LFC-based approach. Finally, we apply our proposed methodology to a real data example from genomics.
Collapse
Affiliation(s)
- Anh-Tuan Hoang
- Institute for Statistics, University of Bremen, Bremen, Germany
| | | |
Collapse
|
16
|
Heller R. Comments on: Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 2020. [DOI: 10.1007/s00180-019-00942-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Rejoinder on: Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput Stat 2020. [DOI: 10.1007/s00180-019-00948-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
18
|
Zhao SD, Nguyen YT. Nonparametric false discovery rate control for identifying simultaneous signals. Electron J Stat 2020. [DOI: 10.1214/19-ejs1663] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
19
|
Xiang D, Zhao SD, Tony Cai T. Signal classification for the integrative analysis of multiple sequences of large-scale multiple tests. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Dongdong Xiang
- East China Normal University; Shanghai People's Republic of China
| | | | - T. Tony Cai
- University of Pennsylvania; Philadelphia USA
| |
Collapse
|
20
|
Tony Cai T, Sun W, Wang W. Covariate‐assisted ranking and screening for large‐scale two‐sample inference. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12304] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | - Wenguang Sun
- University of Southern California Los Angeles USA
| | - Weinan Wang
- University of Southern California Los Angeles USA
| |
Collapse
|
21
|
Wang P, Zhu W. Replicability analysis in genome-wide association studies via Cartesian hidden Markov models. BMC Bioinformatics 2019; 20:146. [PMID: 30885122 PMCID: PMC6423849 DOI: 10.1186/s12859-019-2707-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Accepted: 02/27/2019] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Replicability analysis which aims to detect replicated signals attracts more and more attentions in modern scientific applications. For example, in genome-wide association studies (GWAS), it would be of convincing to detect an association which can be replicated in more than one study. Since the neighboring single nucleotide polymorphisms (SNPs) often exhibit high correlation, it is desirable to exploit the dependency information among adjacent SNPs properly in replicability analysis. In this paper, we propose a novel multiple testing procedure based on the Cartesian hidden Markov model (CHMM), called repLIS procedure, for replicability analysis across two studies, which can characterize the local dependence structure among adjacent SNPs via a four-state Markov chain. RESULTS Theoretical results show that the repLIS procedure can control the false discovery rate (FDR) at the nominal level α and is shown to be optimal in the sense that it has the smallest false non-discovery rate (FNR) among all α-level multiple testing procedures. We carry out simulation studies to compare our repLIS procedure with the existing methods, including the Benjamini-Hochberg (BH) procedure and the empirical Bayes approach, called repfdr. Finally, we apply our repLIS procedure and repfdr procedure in the replicability analyses of psychiatric disorders data sets collected by Psychiatric Genomics Consortium (PGC) and Wellcome Trust Case Control Consortium (WTCCC). Both the simulation studies and real data analysis show that the repLIS procedure is valid and achieves a higher efficiency compared with its competitors. CONCLUSIONS In replicability analysis, our repLIS procedure controls the FDR at the pre-specified level α and can achieve more efficiency by exploiting the dependency information among adjacent SNPs.
Collapse
Affiliation(s)
- Pengfei Wang
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, China
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, China.
| |
Collapse
|
22
|
Affiliation(s)
- Jingshu Wang
- Department of Statistics, University of Pennsylvania, Philadelphia, PA
| | - Art B. Owen
- Department of Statistics, Stanford University, Stanford, CA
| |
Collapse
|
23
|
Bogomolov M, Heller R. Assessing replicability of findings across two studies of multiple features. Biometrika 2018. [DOI: 10.1093/biomet/asy029] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Marina Bogomolov
- The William Davidson Faculty of Industrial Engineering and Management, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Ruth Heller
- Department of Statistics and Operations Research, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 6997801, Israel
| |
Collapse
|
24
|
Amar D, Shamir R, Yekutieli D. Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate. PLoS Comput Biol 2017; 13:e1005700. [PMID: 28821015 PMCID: PMC5576761 DOI: 10.1371/journal.pcbi.1005700] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 08/30/2017] [Accepted: 07/24/2017] [Indexed: 12/03/2022] Open
Abstract
In almost every field in genomics, large-scale biomedical datasets are used to report associations. Extracting associations that recur across multiple studies while controlling the false discovery rate is a fundamental challenge. Here, we propose a new method to allow joint analysis of multiple studies. Given a set of p-values obtained from each study, the goal is to identify associations that recur in at least k > 1 studies while controlling the false discovery rate. We propose several new algorithms that differ in how the study dependencies are modeled, and compare them and extant methods under various simulated scenarios. The top algorithm, SCREEN (Scalable Cluster-based REplicability ENhancement), is our new algorithm that works in three stages: (1) clustering an estimated correlation network of the studies, (2) learning replicability (e.g., of genes) within clusters, and (3) merging the results across the clusters. When we applied SCREEN to two real datasets it greatly outperformed the results obtained via standard meta-analysis. First, on a collection of 29 case-control gene expression cancer studies, we detected a large set of consistently up-regulated genes related to proliferation and cell cycle regulation. These genes are both consistently up-regulated across many cancer studies, and are well connected in known gene networks. Second, on a recent pan-cancer study that examined the expression profiles of patients with and without mutations in the HLA complex, we detected a large active module of up-regulated genes that are both related to immune responses and are well connected in known gene networks. This module covers thrice more genes as compared to the original study at a similar false discovery rate, demonstrating the high power of SCREEN. An implementation of SCREEN is available in the supplement. When analyzing results from multiple studies, extracting replicated associations is the first step towards making new discoveries. The standard approach for this task is to use meta-analysis methods, which usually make an underlying null hypothesis that a gene has no effect in all studies. On the other hand, in replicability analysis we explicitly require that the gene will manifest a recurring pattern of effects. In this study we develop new algorithms for replicability analysis that are both scalable (i.e., can handle many studies) and allow controlling the false discovery rate. We show that our main algorithm called SCREEN (Scalable Cluster-based REplicability ENhancement) outperforms the other methods in simulated scenarios. Moreover, when applied to real datasets, SCREEN greatly extended the results of the meta-analysis, and can even facilitate detection of new biological results.
Collapse
Affiliation(s)
- David Amar
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| | - Daniel Yekutieli
- Department of Statistics and OR, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
25
|
Zhao SD, Cai TT, Li H. Optimal detection of weak positive latent dependence between two sequences of multiple tests. J MULTIVARIATE ANAL 2017; 160:169-184. [PMID: 29203948 PMCID: PMC5711487 DOI: 10.1016/j.jmva.2017.06.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
It is frequently of interest to jointly analyze two paired sequences of multiple tests. This paper studies the problem of detecting whether there are more pairs of tests that are significant in both sequences than would be expected by chance. The asymptotic detection boundary is derived in terms of parameters such as the sparsity of non-null cases in each sequence, the effect sizes of the signals, and the magnitude of the dependence between the two sequences. A new test for detecting weak dependence is also proposed, shown to be asymptotically adaptively optimal, studied in simulations, and applied to study genetic pleiotropy in 10 pediatric autoimmune diseases.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, IL, United States
| | - T. Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, United States
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
26
|
Lee D, Ganna A, Pawitan Y, Lee W. Nonparametric estimation of the rediscovery rate. Stat Med 2016; 35:3203-12. [PMID: 26910365 DOI: 10.1002/sim.6915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 12/15/2015] [Accepted: 01/28/2016] [Indexed: 11/09/2022]
Abstract
Validation studies have been used to increase the reliability of the statistical conclusions for scientific discoveries; such studies improve the reproducibility of the findings and reduce the possibility of false positives. Here, one of the important roles of statistics is to quantify reproducibility rigorously. Two concepts were recently defined for this purpose: (i) rediscovery rate (RDR), which is the expected proportion of statistically significant findings in a study that can be replicated in the validation study and (ii) false discovery rate in the validation study (vFDR). In this paper, we aim to develop a nonparametric approach to estimate the RDR and vFDR and show an explicit link between the RDR and the FDR. Among other things, the link explains why reproducing statistically significant results even with low FDR level may be difficult. Two metabolomics datasets are considered to illustrate the application of the RDR and vFDR concepts in high-throughput data analysis. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, Korea
| | - Andrea Ganna
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, U.S.A.,Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, U.S.A
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Woojoo Lee
- Department of Statistics, Inha University, Incheon, Korea
| |
Collapse
|
27
|
|
28
|
Heller R, Yaacoby S, Yekutieli D. repfdr: a tool for replicability analysis for genome-wide association studies. ACTA ACUST UNITED AC 2014; 30:2971-2. [PMID: 25012182 DOI: 10.1093/bioinformatics/btu434] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Identification of single nucleotide polymorphisms that are associated with a phenotype in more than one study is of great scientific interest in the genome-wide association studies (GWAS) research. The empirical Bayes approach for discovering whether results have been replicated across studies was shown to be a reliable method, and close to optimal in terms of power. RESULTS The R package repfdr provides a flexible implementation of the empirical Bayes approach for replicability analysis and meta-analysis, to be used when several studies examine the same set of null hypotheses. The usefulness of the package for the GWAS community is discussed. AVAILABILITY AND IMPLEMENTATION The R package repfdr can be downloaded from CRAN. CONTACT ruheller@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ruth Heller
- Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv 6997801, Israel
| | - Shay Yaacoby
- Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv 6997801, Israel
| | - Daniel Yekutieli
- Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv 6997801, Israel
| |
Collapse
|
29
|
|