Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Coombes B, Basu S, Guha S, Schork N. Weighted Score Tests Implementing Model-Averaging Schemes in Detection of Rare Variants in Case-Control Studies. PLoS One 2015;10:e0139355. [PMID: 26436424 PMCID: PMC4593572 DOI: 10.1371/journal.pone.0139355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 09/11/2015] [Indexed: 12/04/2022] Open

The power of gene-based rare variant methods to detect disease-associated variation and test hypotheses about complex disease. PLoS Genet 2015;11:e1005165. [PMID: 25906071 PMCID: PMC4407972 DOI: 10.1371/journal.pgen.1005165] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 03/20/2015] [Indexed: 01/09/2023] Open

Abstract

Genome and exome sequencing in large cohorts enables characterization of the role of rare variation in complex diseases. Success in this endeavor, however, requires investigators to test a diverse array of genetic hypotheses which differ in the number, frequency and effect sizes of underlying causal variants. In this study, we evaluated the power of gene-based association methods to interrogate such hypotheses, and examined the implications for study design. We developed a flexible simulation approach, using 1000 Genomes data, to (a) generate sequence variation at human genes in up to 10K case-control samples, and (b) quantify the statistical power of a panel of widely used gene-based association tests under a variety of allelic architectures, locus effect sizes, and significance thresholds. For loci explaining ~1% of phenotypic variance underlying a common dichotomous trait, we find that all methods have low absolute power to achieve exome-wide significance (~5-20% power at α=2.5×10^-6) in 3K individuals; even in 10K samples, power is modest (~60%). The combined application of multiple methods increases sensitivity, but does so at the expense of a higher false positive rate. MiST, SKAT-O, and KBAC have the highest individual mean power across simulated datasets, but we observe wide architecture-dependent variability in the individual loci detected by each test, suggesting that inferences about disease architecture from analysis of sequencing studies can differ depending on which methods are used. Our results imply that tens of thousands of individuals, extensive functional annotation, or highly targeted hypothesis testing will be required to confidently detect or exclude rare variant signals at complex disease loci.

Re-sequencing technologies allow for a more complete interrogation of the role of human variation in complex disease. The inadequate power of single variant methods to assess the role of less common variation has led to the development of numerous statistical methods for testing aggregate groups of variants for association with disease. Such endeavors pose substantial analytical challenges, however, due to the diverse array of genetic hypotheses that need to be considered. In this work, we systematically quantify and compare the performance of a panel of commonly used gene-based association methods under a range of allelic architectures, significance thresholds, locus effect sizes, sample sizes, and filters for neutral variation. We find that MiST, SKAT-O, and KBAC have the highest mean power across simulated datasets. Across all methods, however, the power to detect even loci of relatively large effect is very low at exome-wide significance thresholds for sample sizes comparable with those of ongoing sequencing studies; as such, the absence of signal in studies of a few thousand individuals does not exclude a role for rare variation in complex traits. Finally, we directly compare the results reported by different gene-based methods in order to identify their comparative advantages and disadvantages under distinct locus architectures. Our findings have implications for meaningful interpretation of both positive and negative findings in ongoing and future sequencing studies.

Collapse

Norden-Krichmar TM, Gizer IR, Wilhelmsen KC, Schork NJ, Ehlers CL. Protective variant associated with alcohol dependence in a Mexican American cohort. BMC MEDICAL GENETICS 2014;15:136. [PMID: 25527893 PMCID: PMC4337107 DOI: 10.1186/s12881-014-0136-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 12/08/2014] [Indexed: 01/11/2023]

Kim JH, Song P, Lim H, Lee JH, Lee JH, Park SA. Gene-based rare allele analysis identified a risk gene of Alzheimer's disease. PLoS One 2014;9:e107983. [PMID: 25329708 PMCID: PMC4203677 DOI: 10.1371/journal.pone.0107983] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 08/25/2014] [Indexed: 12/17/2022] Open

King CR, Nicolae DL. GWAS to Sequencing: Divergence in Study Design and Analysis. Genes (Basel) 2014;5:460-76. [PMID: 24879455 PMCID: PMC4094943 DOI: 10.3390/genes5020460] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Revised: 05/13/2014] [Accepted: 05/15/2014] [Indexed: 12/03/2022] Open

Xu C, Ciampi A, Greenwood CMT. Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 2014;5:11. [PMID: 24523729 PMCID: PMC3905218 DOI: 10.3389/fgene.2014.00011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 01/13/2014] [Indexed: 01/13/2023] Open

Abstract

When analyzing the data that arises from exome or whole-genome sequencing studies, window-based tests, (i.e., tests that jointly analyze all genetic data in a small genomic region), are very popular. However, power is known to be quite low for finding associations with phenotypes using these tests, and therefore a variety of analytic strategies may be employed to potentially improve power. Using sequencing data of all of chromosome 3 from an interim release of data on 2432 individuals from the UK10K project, we simulated phenotypes associated with rare genetic variation, and used the results to explore the window-based test power. We asked two specific questions: firstly, whether there could be substantial benefits associated with incorporating information from external annotation on the genetic variants, and secondly whether the false discovery rate (FDRs) would be a useful metric for assessing significance. Although, as expected, there are benefits to using additional information (such as annotation) when it is associated with causality, we confirmed the general pattern of low sensitivity and power for window-based tests. For our chosen example, even when power is high to detect some of the associations, many of the regions containing causal variants are not detectable, despite using lax significance thresholds and optimal analytic methods. Furthermore, our estimated FDR values tended to be much smaller than the true FDRs. Long-range correlations between variants—due to linkage disequilibrium—likely explain some of this bias. A more sophisticated approach to using the annotation information may improve power, however, many causal variants of realistic effect sizes may simply be undetectable, at least with this sample size. Perhaps annotation information could assist in distinguishing windows containing causal variants from windows that are merely correlated with causal variants.

Collapse

Chen YC, Carter H, Parla J, Kramer M, Goes FS, Pirooznia M, Zandi PP, McCombie WR, Potash JB, Karchin R. A hybrid likelihood model for sequence-based disease association studies. PLoS Genet 2013;9:e1003224. [PMID: 23358228 PMCID: PMC3554549 DOI: 10.1371/journal.pgen.1003224] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2012] [Accepted: 11/21/2012] [Indexed: 11/18/2022] Open

Abstract

In the past few years, case-control studies of common diseases have shifted their focus from single genes to whole exomes. New sequencing technologies now routinely detect hundreds of thousands of sequence variants in a single study, many of which are rare or even novel. The limitation of classical single-marker association analysis for rare variants has been a challenge in such studies. A new generation of statistical methods for case-control association studies has been developed to meet this challenge. A common approach to association analysis of rare variants is the burden-style collapsing methods to combine rare variant data within individuals across or within genes. Here, we propose a new hybrid likelihood model that combines a burden test with a test of the position distribution of variants. In extensive simulations and on empirical data from the Dallas Heart Study, the new model demonstrates consistently good power, in particular when applied to a gene set (e.g., multiple candidate genes with shared biological function or pathway), when rare variants cluster in key functional regions of a gene, and when protective variants are present. When applied to data from an ongoing sequencing study of bipolar disorder (191 cases, 107 controls), the model identifies seven gene sets with nominal p-values0.05, of which one MAPK signaling pathway (KEGG) reaches trend-level significance after correcting for multiple testing.

Inexpensive, high-throughput sequencing has transformed the field of case-control association studies. For the first time, it may be possible to identify the genetic underpinnings of complex diseases, by sequencing the DNA of hundreds (even thousands) of cases and controls and comparing patterns of DNA sequence variation. However, complex diseases are likely to be caused by many variants, some of which are very rare. Taken one at a time, the association between variant and disease phenotype may not be detectable by current statistical methods. One strategy is to identify regions where important variants occur by “collapsing” variants into groups. Here, we present a new collapsing approach, capable of detecting subtle genetic differences between cases and controls. We show, in extensive simulations and using a benchmark set of genes involved in human triglyceride levels, that the approach is potentially more powerful than existing methods. We apply the new method to an ongoing sequencing study of bipolar cases and controls and identify a set of genes found in neuronal synapses, which may be implicated in bipolar disorder.

Collapse

Chung D, Zhang Q, Kraja AT, Borecki IB, Province MA. Distance-based phenotypic association analysis of DNA sequence data. BMC Proc 2011;5 Suppl 9:S54. [PMID: 22373107 PMCID: PMC3287892 DOI: 10.1186/1753-6561-5-s9-s54] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Stitziel NO, Kiezun A, Sunyaev S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 2011;12:227. [PMID: 21920052 PMCID: PMC3308043 DOI: 10.1186/gb-2011-12-9-227] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open

Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 2010;11:773-85. [PMID: 20940738 PMCID: PMC3743540 DOI: 10.1038/nrg2867] [Citation(s) in RCA: 381] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]