1
|
Held L. Beyond the two-trials rule. Stat Med 2024; 43:5023-5042. [PMID: 38573319 DOI: 10.1002/sim.10055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/10/2023] [Accepted: 12/10/2023] [Indexed: 04/05/2024]
Abstract
The two-trials rule for drug approval requires "at least two adequate and well-controlled studies, each convincing on its own, to establish effectiveness." This is usually implemented by requiring two significant pivotal trials and is the standard regulatory requirement to provide evidence for a new drug's efficacy. However, there is need to develop suitable alternatives to this rule for a number of reasons, including the possible availability of data from more than two trials. I consider the case of up to three studies and stress the importance to control the partial Type-I error rate, where only some studies have a true null effect, while maintaining the overall Type-I error rate of the two-trials rule, where all studies have a null effect. Some less-knownP $$ P $$ -value combination methods are useful to achieve this: Pearson's method, Edgington's method and the recently proposed harmonic meanχ 2 $$ {\chi}^2 $$ -test. I study their properties and discuss how they can be extended to a sequential assessment of success while still ensuring overall Type-I error control. I compare the different methods in terms of partial Type-I error rate, project power and the expected number of studies required. Edgington's method is eventually recommended as it is easy to implement and communicate, has only moderate partial Type-I error rate inflation but substantially increased project power.
Collapse
Affiliation(s)
- Leonhard Held
- Epidemiology, Biostatistics and Prevention Institute (EBPI) and Center for Reproducible Science (CRS), University of Zurich, Zurich, Switzerland
| |
Collapse
|
2
|
Dai R, Zheng C. False discovery rate-controlled multiple testing for union null hypotheses: a knockoff-based approach. Biometrics 2023; 79:3497-3509. [PMID: 36854821 PMCID: PMC10460825 DOI: 10.1111/biom.13848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 02/17/2023] [Indexed: 03/02/2023]
Abstract
False discovery rate (FDR) controlling procedures provide important statistical guarantees for replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us with opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (Simultaneous knockoffs) to identify mutual signals from multiple independent datasets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real-data examples.
Collapse
Affiliation(s)
- Ran Dai
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, USA
| | - Cheng Zheng
- Department of Biostatistics, University of Nebraska Medical Center, Omaha, Nebraska, USA
| |
Collapse
|
3
|
Haftorn KL, Denault WRP, Lee Y, Page CM, Romanowska J, Lyle R, Næss ØE, Kristjansson D, Magnus PM, Håberg SE, Bohlin J, Jugessur A. Nucleated red blood cells explain most of the association between DNA methylation and gestational age. Commun Biol 2023; 6:224. [PMID: 36849614 PMCID: PMC9971030 DOI: 10.1038/s42003-023-04584-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 02/13/2023] [Indexed: 03/01/2023] Open
Abstract
Determining if specific cell type(s) are responsible for an association between DNA methylation (DNAm) and a given phenotype is important for understanding the biological mechanisms underlying the association. Our EWAS of gestational age (GA) in 953 newborns from the Norwegian MoBa study identified 13,660 CpGs significantly associated with GA (pBonferroni<0.05) after adjustment for cell type composition. When the CellDMC algorithm was applied to explore cell-type specific effects, 2,330 CpGs were significantly associated with GA, mostly in nucleated red blood cells [nRBCs; n = 2,030 (87%)]. Similar patterns were found in another dataset based on a different array and when applying an alternative algorithm to CellDMC called Tensor Composition Analysis (TCA). Our findings point to nRBCs as the main cell type driving the DNAm-GA association, implicating an epigenetic signature of erythropoiesis as a likely mechanism. They also explain the poor correlation observed between epigenetic age clocks for newborns and those for adults.
Collapse
Affiliation(s)
- Kristine L Haftorn
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.
- Institute of Health and Society, University of Oslo, Oslo, Norway.
| | - William R P Denault
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Human Genetics, University of Chicago, Chicago, IL, 60637, USA
| | - Yunsung Lee
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Christian M Page
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Physical Health and Ageing, Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Julia Romanowska
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Global Public Health and Primary Care, , University of Bergen, Bergen, Norway
| | - Robert Lyle
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital and University of Oslo, Oslo, Norway
| | - Øyvind E Næss
- Institute of Health and Society, University of Oslo, Oslo, Norway
- Division of Mental and Physical Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Dana Kristjansson
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway
| | - Per M Magnus
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Siri E Håberg
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Jon Bohlin
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Division for Infection Control and Environmental Health, Department of Infectious Disease Epidemiology and Modelling, Norwegian Institute of Public Health, Oslo, Norway
| | - Astanand Jugessur
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Global Public Health and Primary Care, , University of Bergen, Bergen, Norway
| |
Collapse
|
4
|
Bogomolov M. Testing partial conjunction hypotheses under dependency, with applications to meta-analysis. Electron J Stat 2023. [DOI: 10.1214/22-ejs2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Marina Bogomolov
- Faculty of Data and Decision Sciences, Technion - Israel Institute of Technology, Haifa 3200003, Israel
| |
Collapse
|
5
|
Lee W, Lee D, Pawitan Y. Overall assessment for selected markers from high-throughput data. Stat Med 2022; 41:5830-5843. [PMID: 36270585 DOI: 10.1002/sim.9596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Revised: 07/31/2022] [Accepted: 10/04/2022] [Indexed: 12/15/2022]
Abstract
Reproducibility, a hallmark of science, is typically assessed in validation studies. We focus on high-throughput studies where a large number of biomarkers is measured in a training study, but only a subset of the most significant findings is selected and re-tested in a validation study. Our aim is to get the statistical measures of overall assessment for the selected markers, by integrating the information in both the training and validation studies. Naive statistical measures, such as the combined P $$ P $$ -value by conventional meta-analysis, that ignore the non-random selection are clearly biased, producing over-optimistic significance. We use the false-discovery rate (FDR) concept to develop a selection-adjusted FDR (sFDR) as an overall assessment measure. We describe the link between the overall assessment and other concepts such as replicability and meta-analysis. Some simulation studies and two real metabolomic datasets are considered to illustrate the application of sFDR in high-throughput data analyses.
Collapse
Affiliation(s)
- Woojoo Lee
- Department of Public Health Science, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, Republic of Korea
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
6
|
Wang J, Gui L, Su WJ, Sabatti C, Owen AB. DETECTING MULTIPLE REPLICATING SIGNALS USING ADAPTIVE FILTERING PROCEDURES. Ann Stat 2022; 50:1890-1909. [PMID: 39421244 PMCID: PMC11486506 DOI: 10.1214/21-aos2139] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Replicability is a fundamental quality of scientific discoveries: we are interested in those signals that are detectable in different laboratories, different populations, across time etc. Unlike meta-analysis which accounts for experimental variability but does not guarantee replicability, testing a partial conjunction (PC) null aims specifically to identify the signals that are discovered in multiple studies. In many contemporary applications, for example, comparing multiple high-throughput genetic experiments, a large number M of PC nulls need to be tested simultaneously, calling for a multiple comparisons correction. However, standard multiple testing adjustments on the M PC p -values can be severely conservative, especially when M is large and the signals are sparse. We introduce AdaFilter, a new multiple testing procedure that increases power by adaptively filtering out unlikely candidates of PC nulls. We prove that AdaFilter can control FWER and FDR as long as data across studies are independent, and has much higher power than other existing methods. We illustrate the application of AdaFilter with three examples: microarray studies of Duchenne muscular dystrophy, single-cell RNA sequencing of T cells in lung cancer tumors and GWAS for metabolomics.
Collapse
Affiliation(s)
- Jingshu Wang
- Department of Statistics, The University of Chicago
| | - Lin Gui
- Department of Statistics, The University of Chicago
| | - Weijie J. Su
- Department of Statistics and Data Science, University of Pennsylvania
| | | | - Art B. Owen
- Department of Statistics, Stanford University
| |
Collapse
|
7
|
Hoang AT, Dickhaus T. Randomized p -values for multiple testing and their application in replicability analysis. Biom J 2021; 64:384-409. [PMID: 33464615 DOI: 10.1002/bimj.202000155] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 10/01/2020] [Accepted: 11/02/2020] [Indexed: 11/11/2022]
Abstract
We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional p -values, which are computed under least favorable parameter configurations (LFCs), are over-conservative in the case of composite null hypotheses. As demonstrated in prior work, this poses severe challenges in the multiple testing context, especially when one goal of the statistical analysis is to estimate the proportion π 0 of true null hypotheses. Randomized p -values have been proposed to remedy this issue. In the present work, we discuss the application of randomized p -values in replicability analysis. In particular, we introduce a general class of statistical models for which valid, randomized p -values can be calculated easily. By means of computer simulations, we demonstrate that their usage typically leads to a much more accurate estimation of π 0 than the LFC-based approach. Finally, we apply our proposed methodology to a real data example from genomics.
Collapse
Affiliation(s)
- Anh-Tuan Hoang
- Institute for Statistics, University of Bremen, Bremen, Germany
| | | |
Collapse
|
8
|
Holers VM. Challenges and Opportunities: Using Omics to Generate Testable Insights Into Pathogenic Mechanisms in Preclinical Seropositive Rheumatoid Arthritis. Arthritis Rheumatol 2020; 73:1-4. [DOI: 10.1002/art.41479] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 08/04/2020] [Indexed: 12/19/2022]
|
9
|
Quantify and control reproducibility in high-throughput experiments. Nat Methods 2020; 17:1207-1213. [PMID: 33046893 DOI: 10.1038/s41592-020-00978-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Accepted: 09/14/2020] [Indexed: 11/09/2022]
Abstract
Ensuring reproducibility of results in high-throughput experiments is crucial for biomedical research. Here, we propose a set of computational methods, INTRIGUE, to evaluate and control reproducibility in high-throughput settings. Our approaches are built on a new definition of reproducibility that emphasizes directional consistency when experimental units are assessed with signed effect size estimates. The proposed methods are designed to (1) assess the overall reproducible quality of multiple studies and (2) evaluate reproducibility at the individual experimental unit levels. We demonstrate the proposed methods in detecting unobserved batch effects via simulations. We further illustrate the versatility of the proposed methods in transcriptome-wide association studies: in addition to reproducible quality control, they are also suited to investigating genuine biological heterogeneity. Finally, we discuss the potential extensions of the proposed methods in other vital areas of reproducible research (for example, publication bias and conceptual replications).
Collapse
|
10
|
Influence of multiple hypothesis testing on reproducibility in neuroimaging research: A simulation study and Python-based software. J Neurosci Methods 2020; 337:108654. [PMID: 32114144 DOI: 10.1016/j.jneumeth.2020.108654] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 02/26/2020] [Accepted: 02/26/2020] [Indexed: 11/24/2022]
Abstract
BACKGROUND Reproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which entails false positive findings unless the analyzed p-values are carefully corrected. While this multiple testing problem is well known and studied, it continues to be both a theoretical and practical problem. NEW METHOD Here we assess reproducibility in simulated experiments in the context of multiple testing. We consider methods that control either the family-wise error rate (FWER) or false discovery rate (FDR), including techniques based on random field theory (RFT), cluster-mass based permutation testing, and adaptive FDR. Several classical methods are also considered. The performance of these methods is investigated under two different models. RESULTS We found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes. COMPARISON WITH EXISTING METHOD(S) We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software "MultiPy" for multiple hypothesis testing. CONCLUSIONS Our simulations suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit "MultiPy" and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.
Collapse
|
11
|
Zhao SD, Nguyen YT. Nonparametric false discovery rate control for identifying simultaneous signals. Electron J Stat 2020. [DOI: 10.1214/19-ejs1663] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
12
|
|
13
|
Xiang D, Zhao SD, Tony Cai T. Signal classification for the integrative analysis of multiple sequences of large-scale multiple tests. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12323] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Dongdong Xiang
- East China Normal University; Shanghai People's Republic of China
| | | | - T. Tony Cai
- University of Pennsylvania; Philadelphia USA
| |
Collapse
|
14
|
Tony Cai T, Sun W, Wang W. Covariate‐assisted ranking and screening for large‐scale two‐sample inference. J R Stat Soc Series B Stat Methodol 2019. [DOI: 10.1111/rssb.12304] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
| | - Wenguang Sun
- University of Southern California Los Angeles USA
| | - Weinan Wang
- University of Southern California Los Angeles USA
| |
Collapse
|
15
|
Wang P, Zhu W. Replicability analysis in genome-wide association studies via Cartesian hidden Markov models. BMC Bioinformatics 2019; 20:146. [PMID: 30885122 PMCID: PMC6423849 DOI: 10.1186/s12859-019-2707-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2019] [Accepted: 02/27/2019] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Replicability analysis which aims to detect replicated signals attracts more and more attentions in modern scientific applications. For example, in genome-wide association studies (GWAS), it would be of convincing to detect an association which can be replicated in more than one study. Since the neighboring single nucleotide polymorphisms (SNPs) often exhibit high correlation, it is desirable to exploit the dependency information among adjacent SNPs properly in replicability analysis. In this paper, we propose a novel multiple testing procedure based on the Cartesian hidden Markov model (CHMM), called repLIS procedure, for replicability analysis across two studies, which can characterize the local dependence structure among adjacent SNPs via a four-state Markov chain. RESULTS Theoretical results show that the repLIS procedure can control the false discovery rate (FDR) at the nominal level α and is shown to be optimal in the sense that it has the smallest false non-discovery rate (FNR) among all α-level multiple testing procedures. We carry out simulation studies to compare our repLIS procedure with the existing methods, including the Benjamini-Hochberg (BH) procedure and the empirical Bayes approach, called repfdr. Finally, we apply our repLIS procedure and repfdr procedure in the replicability analyses of psychiatric disorders data sets collected by Psychiatric Genomics Consortium (PGC) and Wellcome Trust Case Control Consortium (WTCCC). Both the simulation studies and real data analysis show that the repLIS procedure is valid and achieves a higher efficiency compared with its competitors. CONCLUSIONS In replicability analysis, our repLIS procedure controls the FDR at the pre-specified level α and can achieve more efficiency by exploiting the dependency information among adjacent SNPs.
Collapse
Affiliation(s)
- Pengfei Wang
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, China
| | - Wensheng Zhu
- Key Laboratory for Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, 5268 Renmin Street, Changchun, 130024, China.
| |
Collapse
|
16
|
Affiliation(s)
- Qingyuan Zhao
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
17
|
Bogomolov M, Heller R. Assessing replicability of findings across two studies of multiple features. Biometrika 2018. [DOI: 10.1093/biomet/asy029] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Marina Bogomolov
- The William Davidson Faculty of Industrial Engineering and Management, Technion–Israel Institute of Technology, Technion City, Haifa 3200003, Israel
| | - Ruth Heller
- Department of Statistics and Operations Research, Tel-Aviv University, P.O. Box 39040, Tel-Aviv 6997801, Israel
| |
Collapse
|
18
|
Abstract
This article considers replicability of the performance of predictors across studies. We suggest a general approach to investigating this issue, based on ensembles of prediction models trained on different studies. We quantify how the common practice of training on a single study accounts in part for the observed challenges in replicability of prediction performance. We also investigate whether ensembles of predictors trained on multiple studies can be combined, using unique criteria, to design robust ensemble learners trained upfront to incorporate replicability into different contexts and populations.
Collapse
Affiliation(s)
- Prasad Patil
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| | - Giovanni Parmigiani
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA 02215;
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| |
Collapse
|
19
|
Kafkafi N, Agassi J, Chesler EJ, Crabbe JC, Crusio WE, Eilam D, Gerlai R, Golani I, Gomez-Marin A, Heller R, Iraqi F, Jaljuli I, Karp NA, Morgan H, Nicholson G, Pfaff DW, Richter SH, Stark PB, Stiedl O, Stodden V, Tarantino LM, Tucci V, Valdar W, Williams RW, Würbel H, Benjamini Y. Reproducibility and replicability of rodent phenotyping in preclinical studies. Neurosci Biobehav Rev 2018; 87:218-232. [PMID: 29357292 PMCID: PMC6071910 DOI: 10.1016/j.neubiorev.2018.01.003] [Citation(s) in RCA: 116] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 12/13/2017] [Accepted: 01/11/2018] [Indexed: 12/15/2022]
Abstract
The scientific community is increasingly concerned with the proportion of
published “discoveries” that are not replicated in subsequent
studies. The field of rodent behavioral phenotyping was one of the first to
raise this concern, and to relate it to other methodological issues: the complex
interaction between genotype and environment; the definitions of behavioral
constructs; and the use of laboratory mice and rats as model species for
investigating human health and disease mechanisms. In January 2015, researchers
from various disciplines gathered at Tel Aviv University to discuss these
issues. The general consensus was that the issue is prevalent and of concern,
and should be addressed at the statistical, methodological and policy levels,
but is not so severe as to call into question the validity and the usefulness of
model organisms as a whole. Well-organized community efforts, coupled with
improved data and metadata sharing, have a key role in identifying specific
problems and promoting effective solutions. Replicability is closely related to
validity, may affect generalizability and translation of findings, and has
important ethical implications.
Collapse
Affiliation(s)
| | | | | | - John C Crabbe
- Oregon Health & Science University, and VA Portland Health Care System, United States
| | | | | | | | | | | | | | | | | | - Natasha A Karp
- Discovery Sciences, IMED Biotech Unit, AstraZeneca, Cambridge, UK
| | | | | | | | | | | | | | | | | | | | - William Valdar
- University of North Carolina at Chapel Hill, United States
| | | | | | | |
Collapse
|
20
|
Popovic M, Fasanelli F, Fiano V, Biggeri A, Richiardi L. Increased correlation between methylation sites in epigenome-wide replication studies: impact on analysis and results. Epigenomics 2017; 9:1489-1502. [PMID: 29106300 DOI: 10.2217/epi-2017-0073] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
AIM To show that an increased correlation between CpGs after selection through an epigenome-wide association studies (EWAS) might translate into biased replication results. METHODS Pairwise correlation coefficients between CpGs selected in two published EWAS, the top hits replication, Bonferroni p-values, Benjamini-Hochberg (BH) false discovery rate (FDR) and directional FDR r-values were calculated in the NINFEA cohort data. Exposures' random permutations were performed to show the empirical p-value distributions. RESULTS The average pairwise correlation coefficients between CpGs were enhanced after selection for the replication (e.g., from 0.12 at genome-wide level to 0.26 among the selected CpGs), affecting the empirical p-value distributions and the usual multiple testing control. CONCLUSION Bonferroni and Benjamini-Hochberg FDR are inappropriate for the EWAS replication phase, and methods that account for the underlying correlation need to be used.
Collapse
Affiliation(s)
- Maja Popovic
- Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy
| | - Francesca Fasanelli
- Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy
| | - Valentina Fiano
- Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy
| | - Annibale Biggeri
- Department of Statistics, Computer Science, Applications «G. Parenti», University of Florence, Florence, Italy
| | - Lorenzo Richiardi
- Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy
| |
Collapse
|
21
|
Amar D, Shamir R, Yekutieli D. Extracting replicable associations across multiple studies: Empirical Bayes algorithms for controlling the false discovery rate. PLoS Comput Biol 2017; 13:e1005700. [PMID: 28821015 PMCID: PMC5576761 DOI: 10.1371/journal.pcbi.1005700] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2017] [Revised: 08/30/2017] [Accepted: 07/24/2017] [Indexed: 12/03/2022] Open
Abstract
In almost every field in genomics, large-scale biomedical datasets are used to report associations. Extracting associations that recur across multiple studies while controlling the false discovery rate is a fundamental challenge. Here, we propose a new method to allow joint analysis of multiple studies. Given a set of p-values obtained from each study, the goal is to identify associations that recur in at least k > 1 studies while controlling the false discovery rate. We propose several new algorithms that differ in how the study dependencies are modeled, and compare them and extant methods under various simulated scenarios. The top algorithm, SCREEN (Scalable Cluster-based REplicability ENhancement), is our new algorithm that works in three stages: (1) clustering an estimated correlation network of the studies, (2) learning replicability (e.g., of genes) within clusters, and (3) merging the results across the clusters. When we applied SCREEN to two real datasets it greatly outperformed the results obtained via standard meta-analysis. First, on a collection of 29 case-control gene expression cancer studies, we detected a large set of consistently up-regulated genes related to proliferation and cell cycle regulation. These genes are both consistently up-regulated across many cancer studies, and are well connected in known gene networks. Second, on a recent pan-cancer study that examined the expression profiles of patients with and without mutations in the HLA complex, we detected a large active module of up-regulated genes that are both related to immune responses and are well connected in known gene networks. This module covers thrice more genes as compared to the original study at a similar false discovery rate, demonstrating the high power of SCREEN. An implementation of SCREEN is available in the supplement. When analyzing results from multiple studies, extracting replicated associations is the first step towards making new discoveries. The standard approach for this task is to use meta-analysis methods, which usually make an underlying null hypothesis that a gene has no effect in all studies. On the other hand, in replicability analysis we explicitly require that the gene will manifest a recurring pattern of effects. In this study we develop new algorithms for replicability analysis that are both scalable (i.e., can handle many studies) and allow controlling the false discovery rate. We show that our main algorithm called SCREEN (Scalable Cluster-based REplicability ENhancement) outperforms the other methods in simulated scenarios. Moreover, when applied to real datasets, SCREEN greatly extended the results of the meta-analysis, and can even facilitate detection of new biological results.
Collapse
Affiliation(s)
- David Amar
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- * E-mail:
| | - Daniel Yekutieli
- Department of Statistics and OR, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
22
|
Zhao SD, Cai TT, Li H. Optimal detection of weak positive latent dependence between two sequences of multiple tests. J MULTIVARIATE ANAL 2017; 160:169-184. [PMID: 29203948 PMCID: PMC5711487 DOI: 10.1016/j.jmva.2017.06.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
It is frequently of interest to jointly analyze two paired sequences of multiple tests. This paper studies the problem of detecting whether there are more pairs of tests that are significant in both sequences than would be expected by chance. The asymptotic detection boundary is derived in terms of parameters such as the sparsity of non-null cases in each sequence, the effect sizes of the signals, and the magnitude of the dependence between the two sequences. A new test for detecting weak dependence is also proposed, shown to be asymptotically adaptively optimal, studied in simulations, and applied to study genetic pleiotropy in 10 pediatric autoimmune diseases.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, IL, United States
| | - T. Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, United States
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
23
|
Kent ST, Rosenson RS, Avery CL, Chen YDI, Correa A, Cummings SR, Cupples LA, Cushman M, Evans DS, Gudnason V, Harris TB, Howard G, Irvin MR, Judd SE, Jukema JW, Lange L, Levitan EB, Li X, Liu Y, Post WS, Postmus I, Psaty BM, Rotter JI, Safford MM, Sitlani CM, Smith AV, Stewart JD, Trompet S, Sun F, Vasan RS, Woolley JM, Whitsel EA, Wiggins KL, Wilson JG, Muntner P. PCSK9 Loss-of-Function Variants, Low-Density Lipoprotein Cholesterol, and Risk of Coronary Heart Disease and Stroke: Data From 9 Studies of Blacks and Whites. CIRCULATION. CARDIOVASCULAR GENETICS 2017; 10:e001632. [PMID: 28768753 PMCID: PMC5729040 DOI: 10.1161/circgenetics.116.001632] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 06/06/2017] [Indexed: 11/16/2022]
Abstract
BACKGROUND PCSK9 loss-of-function (LOF) variants allow for the examination of the effects of lifetime reduced low-density lipoprotein cholesterol (LDL-C) on cardiovascular events. We examined the association of PCSK9 LOF variants with LDL-C and incident coronary heart disease and stroke through a meta-analysis of data from 8 observational cohorts and 1 randomized trial of statin therapy. METHODS AND RESULTS These 9 studies together included 17 459 blacks with 403 (2.3%) having at least 1 Y142X or C679X variant and 31 306 whites with 955 (3.1%) having at least 1 R46L variant. Unadjusted odds ratios for associations between PCSK9 LOF variants and incident coronary heart disease (851 events in blacks and 2662 events in whites) and stroke (523 events in blacks and 1660 events in whites) were calculated using pooled Mantel-Haenszel estimates with continuity correction factors. Pooling results across studies using fixed-effects inverse-variance-weighted models, PCSK9 LOF variants were associated with 35 mg/dL (95% confidence interval [CI], 32-39) lower LDL-C in blacks and 13 mg/dL (95% CI, 11-16) lower LDL-C in whites. PCSK9 LOF variants were associated with a pooled odds ratio for coronary heart disease of 0.51 (95% CI, 0.28-0.92) in blacks and 0.82 (95% CI, 0.63-1.06) in whites. PCSK9 LOF variants were not associated with incident stroke (odds ratio, 0.84; 95% CI, 0.48-1.47 in blacks and odds ratio, 1.06; 95% CI, 0.80-1.41 in whites). CONCLUSIONS PCSK9 LOF variants were associated with lower LDL-C and coronary heart disease incidence. PCSK9 LOF variants were not associated with stroke risk.
Collapse
Affiliation(s)
- Shia T Kent
- For the author affiliations, please see the Appendix
| | | | | | | | - Adolfo Correa
- For the author affiliations, please see the Appendix
| | | | | | - Mary Cushman
- For the author affiliations, please see the Appendix
| | | | | | | | - George Howard
- For the author affiliations, please see the Appendix
| | | | | | | | - Leslie Lange
- For the author affiliations, please see the Appendix
| | | | - Xiaohui Li
- For the author affiliations, please see the Appendix
| | - Yongmei Liu
- For the author affiliations, please see the Appendix
| | - Wendy S Post
- For the author affiliations, please see the Appendix
| | - Iris Postmus
- For the author affiliations, please see the Appendix
| | - Bruce M Psaty
- For the author affiliations, please see the Appendix
| | | | | | | | | | | | | | - Fangui Sun
- For the author affiliations, please see the Appendix
| | | | | | | | | | | | - Paul Muntner
- For the author affiliations, please see the Appendix.
| |
Collapse
|
24
|
Heit JA, Armasu SM, McCauley BM, Kullo IJ, Sicotte H, Pathak J, Chute CG, Gottesman O, Bottinger EP, Denny JC, Roden DM, Li R, Ritchie MD, de Andrade M. Identification of unique venous thromboembolism-susceptibility variants in African-Americans. Thromb Haemost 2017; 117:758-768. [PMID: 28203683 DOI: 10.1160/th16-08-0652] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 01/12/2017] [Indexed: 12/30/2022]
Abstract
To identify novel single nucleotide polymorphisms (SNPs) associated with venous thromboembolism (VTE) in African-Americans (AAs), we performed a genome-wide association study (GWAS) of VTE in AAs using the Electronic Medical Records and Genomics (eMERGE) Network, comprised of seven sites each with DNA biobanks (total ~39,200 unique DNA samples) with genome-wide SNP data (imputed to 1000 Genomes Project cosmopolitan reference panel) and linked to electronic health records (EHRs). Using a validated EHR-driven phenotype extraction algorithm, we identified VTE cases and controls and tested for an association between each SNP and VTE using unconditional logistic regression, adjusted for age, sex, stroke, site-platform combination and sickle cell risk genotype. Among 393 AA VTE cases and 4,941 AA controls, three intragenic SNPs reached genome-wide significance: LEMD3 rs138916004 (OR=3.2; p=1.3E-08), LY86 rs3804476 (OR=1.8; p=2E-08) and LOC100130298 rs142143628 (OR=4.5; p=4.4E-08); all three SNPs validated using internal cross-validation, parametric bootstrap and meta-analysis methods. LEMD3 rs138916004 and LOC100130298 rs142143628 are only present in Africans (1000G data). LEMD3 showed a significant differential expression in both NCBI Gene Expression Omnibus (GEO) and the Mayo Clinic gene expression data, LOC100130298 showed a significant differential expression only in the GEO expression data, and LY86 showed a significant differential expression only in the Mayo expression data. LEMD3 encodes for an antagonist of TGF-β-induced cell proliferation arrest. LY86 encodes for MD-1 which down-regulates the pro-inflammatory response to lipopolysaccharide; LY86 variation was previously associated with VTE in white women; LOC100130298 is a non-coding RNA gene with unknown regulatory activity in gene expression and epigenetics.
Collapse
Affiliation(s)
- John A Heit
- John A. Heit, MD, Stabile 6-Hematology Research, Mayo Clinic, 200 First Street, SW, Rochester, MN 55905, USA, Tel.: +1 507 284 4634, Fax: +1 507 266 9302, E-mail:
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Sofer T, Heller R, Bogomolov M, Avery CL, Graff M, North KE, Reiner AP, Thornton TA, Rice K, Benjamini Y, Laurie CC, Kerr KF. A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet Epidemiol 2017; 41:251-258. [PMID: 28090672 DOI: 10.1002/gepi.22029] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 09/26/2016] [Accepted: 10/17/2016] [Indexed: 01/04/2023]
Abstract
In genome-wide association studies (GWAS), "generalization" is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. Current practices for declaring generalizations rely on testing associations while controlling the family-wise error rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. This approach does not guarantee control over the FWER or false discovery rate (FDR) of the generalization null hypotheses. It also fails to leverage the two-stage design to increase power for detecting generalized associations. We provide a formal statistical framework for quantifying the evidence of generalization that accounts for the (in)consistency between the directions of associations in the discovery and follow-up studies. We develop the directional generalization FWER (FWERg ) and FDR (FDRg ) controlling r-values, which are used to declare associations as generalized. This framework extends to generalization testing when applied to a published list of Single Nucleotide Polymorphism-(SNP)-trait associations. Our methods control FWERg or FDRg under various SNP selection rules based on P-values in the discovery study. We find that it is often beneficial to use a more lenient P-value threshold than the genome-wide significance threshold. In a GWAS of total cholesterol in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), when testing all SNPs with P-values <5×10-8 (15 genomic regions) for generalization in a large GWAS of whites, we generalized SNPs from 15 regions. But when testing all SNPs with P-values <6.6×10-5 (89 regions), we generalized SNPs from 27 regions.
Collapse
Affiliation(s)
- Tamar Sofer
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Ruth Heller
- Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel
| | - Marina Bogomolov
- Faculty of Industrial Engineering and Management, Technion-Israel Institute of Technology, Haifa, Israel
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Alex P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | | | - Kenneth Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Yoav Benjamini
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Kathleen F Kerr
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| |
Collapse
|
26
|
Lee D, Ganna A, Pawitan Y, Lee W. Nonparametric estimation of the rediscovery rate. Stat Med 2016; 35:3203-12. [PMID: 26910365 DOI: 10.1002/sim.6915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 12/15/2015] [Accepted: 01/28/2016] [Indexed: 11/09/2022]
Abstract
Validation studies have been used to increase the reliability of the statistical conclusions for scientific discoveries; such studies improve the reproducibility of the findings and reduce the possibility of false positives. Here, one of the important roles of statistics is to quantify reproducibility rigorously. Two concepts were recently defined for this purpose: (i) rediscovery rate (RDR), which is the expected proportion of statistically significant findings in a study that can be replicated in the validation study and (ii) false discovery rate in the validation study (vFDR). In this paper, we aim to develop a nonparametric approach to estimate the RDR and vFDR and show an explicit link between the RDR and the FDR. Among other things, the link explains why reproducing statistically significant results even with low FDR level may be difficult. Two metabolomics datasets are considered to illustrate the application of the RDR and vFDR concepts in high-throughput data analysis. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, Korea
| | - Andrea Ganna
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, U.S.A.,Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, U.S.A
| | - Yudi Pawitan
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Woojoo Lee
- Department of Statistics, Inha University, Incheon, Korea
| |
Collapse
|
27
|
Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, Crosslin DR, Hebbring SJ, Jarvik GP, Kullo IJ, Li R, Pathak J, Ritchie MD, Roden DM, Verma SS, Tromp G, Prato JD, Bush WS, Akey JM, Denny JC, Capra JA. The phenotypic legacy of admixture between modern humans and Neandertals. Science 2016; 351:737-41. [PMID: 26912863 DOI: 10.1126/science.aad2149] [Citation(s) in RCA: 163] [Impact Index Per Article: 20.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Many modern human genomes retain DNA inherited from interbreeding with archaic hominins, such as Neandertals, yet the influence of this admixture on human traits is largely unknown. We analyzed the contribution of common Neandertal variants to over 1000 electronic health record (EHR)-derived phenotypes in ~28,000 adults of European ancestry. We discovered and replicated associations of Neandertal alleles with neurological, psychiatric, immunological, and dermatological phenotypes. Neandertal alleles together explained a significant fraction of the variation in risk for depression and skin lesions resulting from sun exposure (actinic keratosis), and individual Neandertal alleles were significantly associated with specific human phenotypes, including hypercoagulation and tobacco use. Our results establish that archaic admixture influences disease risk in modern humans, provide hypotheses about the effects of hundreds of Neandertal haplotypes, and demonstrate the utility of EHR data in evolutionary analyses.
Collapse
Affiliation(s)
- Corinne N Simonti
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lisa Bastarache
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | - David S Carrell
- Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Rex L Chisholm
- Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - David R Crosslin
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Scott J Hebbring
- Center for Human Genetics, Marshfield Clinic, Marshfield, WI, USA
| | - Gail P Jarvik
- Department of Genome Sciences, University of Washington, Seattle, WA, USA. Department of Medicine (Medical Genetics), University of Washington Medical Center, Seattle, WA, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Rongling Li
- Division of Genomic Medicine, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jyotishman Pathak
- Division of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA. Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA
| | - Dan M Roden
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Medicine, Vanderbilt University, Nashville, TN, USA. Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Shefali S Verma
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Gerard Tromp
- Weis Center for Research, Geisinger Health System, Danville, PA, USA. Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Health Science, Stellenbosch University, Tygerberg, South Africa
| | - Jeffrey D Prato
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joshua C Denny
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Medicine, Vanderbilt University, Nashville, TN, USA
| | - John A Capra
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA. Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA. Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA. Center for Quantitative Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
28
|
Schick UM, Jain D, Hodonsky CJ, Morrison JV, Davis JP, Brown L, Sofer T, Conomos MP, Schurmann C, McHugh CP, Nelson SC, Vadlamudi S, Stilp A, Plantinga A, Baier L, Bien SA, Gogarten SM, Laurie CA, Taylor KD, Liu Y, Auer PL, Franceschini N, Szpiro A, Rice K, Kerr KF, Rotter JI, Hanson RL, Papanicolaou G, Rich SS, Loos RJF, Browning BL, Browning SR, Weir BS, Laurie CC, Mohlke KL, North KE, Thornton TA, Reiner AP. Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans. Am J Hum Genet 2016; 98:229-42. [PMID: 26805783 PMCID: PMC4746331 DOI: 10.1016/j.ajhg.2015.12.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 12/07/2015] [Indexed: 12/23/2022] Open
Abstract
Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10(-28)) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits.
Collapse
Affiliation(s)
- Ursula M Schick
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98195, USA; Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Deepti Jain
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Chani J Hodonsky
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Jean V Morrison
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - James P Davis
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Lisa Brown
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Tamar Sofer
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Claudia Schurmann
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Caitlin P McHugh
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sarah C Nelson
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | | | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Anna Plantinga
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Leslie Baier
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Disease, NIH, 445 North 5(th) Street, Phoenix, AZ 85004, USA
| | - Stephanie A Bien
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98195, USA
| | | | - Cecelia A Laurie
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Kent D Taylor
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Harbor-UCLA Medical Center, Torrance, CA 90502, USA; Department of Pediatrics, Los Angeles Biomedical Research Institute, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Yongmei Liu
- School of Medicine, Wake Forest University, Winston-Salem, NC 27157, USA
| | - Paul L Auer
- Joseph J. Zilber School of Public Health, University of Wisconsin Milwaukee, Milwaukee, WI 53201, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Adam Szpiro
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Ken Rice
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Kathleen F Kerr
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Robert L Hanson
- Phoenix Epidemiology and Clinical Research Branch, National Institute of Diabetes and Digestive and Kidney Disease, NIH, 445 North 5(th) Street, Phoenix, AZ 85004, USA
| | - George Papanicolaou
- Division of Cardiovascular Sciences, National Heart, Lung, and Blood Institute, NIH, Bethesda, MD 20892, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA; Division of Endocrinology, Department of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Ruth J F Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Brian L Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Bruce S Weir
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27514, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Alex P Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98195, USA.
| |
Collapse
|
29
|
|
30
|
Bretz F, Westfall PH. Multiplicity and replicability: two sides of the same coin. Pharm Stat 2014; 13:343-4. [DOI: 10.1002/pst.1648] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Revised: 08/24/2014] [Accepted: 09/03/2014] [Indexed: 11/11/2022]
|