Spencer C, Hechter E, Vukcevic D, Donnelly P. Quantifying the underestimation of relative risks from genome-wide association studies.
PLoS Genet 2011;
7:e1001337. [PMID:
21437273 PMCID:
PMC3060077 DOI:
10.1371/journal.pgen.1001337]
[Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2010] [Accepted: 02/15/2011] [Indexed: 12/21/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified hundreds of associated loci across many common diseases. Most risk variants identified by GWAS will merely be tags for as-yet-unknown causal variants. It is therefore possible that identification of the causal variant, by fine mapping, will identify alleles with larger effects on genetic risk than those currently estimated from GWAS replication studies. We show that under plausible assumptions, whilst the majority of the per-allele relative risks (RR) estimated from GWAS data will be close to the true risk at the causal variant, some could be considerable underestimates. For example, for an estimated RR in the range 1.2–1.3, there is approximately a 38% chance that it exceeds 1.4 and a 10% chance that it is over 2. We show how these probabilities can vary depending on the true effects associated with low-frequency variants and on the minor allele frequency (MAF) of the most associated SNP. We investigate the consequences of the underestimation of effect sizes for predictions of an individual's disease risk and interpret our results for the design of fine mapping experiments. Although these effects mean that the amount of heritability explained by known GWAS loci is expected to be larger than current projections, this increase is likely to explain a relatively small amount of the so-called “missing” heritability.
Genome-wide association studies (GWAS) exploit the correlation in genetic diversity along chromosomes in order to detect effects on disease risk without having to type causal loci directly. The inevitable downside of this approach is that, when the correlation between the marker and the causal variant is imperfect, the risk associated with carrying the predisposing allele is diluted and its effect is underestimated. Using simulations, where we know the true risk at the causal locus, we quantify the extent of this underestimation. We show that, for loci which have a modest effect on disease risk and are common in the population, the risk estimated from the most associated SNP is very close to the truth approximately two thirds of the time. Although the extent of the underestimation depends on assumptions about the frequency and strength of the risk allele, we predict that fine mapping of GWAS loci will, in rare cases, identify causal variants with considerably higher risk. Using three common diseases as examples, we investigate the expected cumulative effects of underestimation at multiple loci on our ability to stratify individuals by disease risk and to explain disease heritability.
Collapse