1
|
Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet 2018; 19:491-504. [PMID: 29844615 PMCID: PMC6050137 DOI: 10.1038/s41576-018-0016-z] [Citation(s) in RCA: 490] [Impact Index Per Article: 81.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA.
| | - Wenan Chen
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Nicholas B Larson
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
2
|
Mijuskovic M, Saunders EJ, Leongamornlert DA, Wakerell S, Whitmore I, Dadaev T, Cieza-Borrella C, Govindasami K, Brook MN, Haiman CA, Conti DV, Eeles RA, Kote-Jarai Z. Rare germline variants in DNA repair genes and the angiogenesis pathway predispose prostate cancer patients to develop metastatic disease. Br J Cancer 2018; 119:96-104. [PMID: 29915322 PMCID: PMC6035259 DOI: 10.1038/s41416-018-0141-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 05/01/2018] [Accepted: 05/17/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Prostate cancer (PrCa) demonstrates a heterogeneous clinical presentation ranging from largely indolent to lethal. We sought to identify a signature of rare inherited variants that distinguishes between these two extreme phenotypes. METHODS We sequenced germline whole exomes from 139 aggressive (metastatic, age of diagnosis < 60) and 141 non-aggressive (low clinical grade, age of diagnosis ≥60) PrCa cases. We conducted rare variant association analyses at gene and gene set levels using SKAT and Bayesian risk index techniques. GO term enrichment analysis was performed for genes with the highest differential burden of rare disruptive variants. RESULTS Protein truncating variants (PTVs) in specific DNA repair genes were significantly overrepresented among patients with the aggressive phenotype, with BRCA2, ATM and NBN the most frequently mutated genes. Differential burden of rare variants was identified between metastatic and non-aggressive cases for several genes implicated in angiogenesis, conferring both deleterious and protective effects. CONCLUSIONS Inherited PTVs in several DNA repair genes distinguish aggressive from non-aggressive PrCa cases. Furthermore, inherited variants in genes with roles in angiogenesis may be potential predictors for risk of metastases. If validated in a larger dataset, these findings have potential for future clinical application.
Collapse
Affiliation(s)
- Martina Mijuskovic
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Edward J Saunders
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Daniel A Leongamornlert
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Sarah Wakerell
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Ian Whitmore
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Tokhir Dadaev
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Clara Cieza-Borrella
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Koveela Govindasami
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Mark N Brook
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90015, USA
| | - David V Conti
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90015, USA
| | - Rosalind A Eeles
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK
- The Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK
| | - Zsofia Kote-Jarai
- Oncogenetics, Division of Genetics and Epidemiology, The Institute of Cancer Research, London, SW7 3RP, UK.
| |
Collapse
|
3
|
McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, Conti D, Gauderman WJ, Hsu L, Hutter CM, Jankowska MM, Kerr J, Kraft P, Montgomery SB, Mukherjee B, Papanicolaou GJ, Patel CJ, Ritchie MD, Ritz BR, Thomas DC, Wei P, Witte JS. Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 2017; 186:753-761. [PMID: 28978193 PMCID: PMC5860428 DOI: 10.1093/aje/kwx227] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/14/2017] [Accepted: 03/16/2017] [Indexed: 12/25/2022] Open
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Collapse
Affiliation(s)
| | - Leah E. Mechanic
- Correspondence to Dr. Leah E. Mechanic, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, 9609 Medical Center Drive, Room 4E104, MSC 9763, Bethesda, MD 20892 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Pereira M, Thompson JR, Weichenberger CX, Thomas DC, Minelli C. Inclusion of biological knowledge in a Bayesian shrinkage model for joint estimation of SNP effects. Genet Epidemiol 2017; 41:320-331. [PMID: 28393391 DOI: 10.1002/gepi.22038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 12/18/2016] [Accepted: 12/26/2016] [Indexed: 01/04/2023]
Abstract
With the aim of improving detection of novel single-nucleotide polymorphisms (SNPs) in genetic association studies, we propose a method of including prior biological information in a Bayesian shrinkage model that jointly estimates SNP effects. We assume that the SNP effects follow a normal distribution centered at zero with variance controlled by a shrinkage hyperparameter. We use biological information to define the amount of shrinkage applied on the SNP effects distribution, so that the effects of SNPs with more biological support are less shrunk toward zero, thus being more likely detected. The performance of the method was tested in a simulation study (1,000 datasets, 500 subjects with ∼200 SNPs in 10 linkage disequilibrium (LD) blocks) using a continuous and a binary outcome. It was further tested in an empirical example on body mass index (continuous) and overweight (binary) in a dataset of 1,829 subjects and 2,614 SNPs from 30 blocks. Biological knowledge was retrieved using the bioinformatics tool Dintor, which queried various databases. The joint Bayesian model with inclusion of prior information outperformed the standard analysis: in the simulation study, the mean ranking of the true LD block was 2.8 for the Bayesian model versus 3.6 for the standard analysis of individual SNPs; in the empirical example, the mean ranking of the six true blocks was 8.5 versus 9.3 in the standard analysis. These results suggest that our method is more powerful than the standard analysis. We expect its performance to improve further as more biological information about SNPs becomes available.
Collapse
Affiliation(s)
- Miguel Pereira
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - John R Thompson
- Department of Health Sciences, University of Leicester, Leicester, United Kingdom
| | - Christian X Weichenberger
- Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), Bolzano, Italy, Affiliated to the University of Lübeck, Lübeck, Germany
| | - Duncan C Thomas
- Biostatistics Division, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Cosetta Minelli
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| |
Collapse
|
5
|
Abstract
Background Recent advances in next-generation sequencing technologies have made it possible to generate large amounts of sequence data with rare variants in a cost-effective way. Yet, the statistical aspect of testing disease association of rare variants is quite challenging as the typical assumptions fail to hold owing to low minor allele frequency (<0.5 or 1 %). Methods I present a Bayesian variable selection approach to detect associations with both rare and common genetic variants for quantitative traits simultaneously. In my model, I frame the problem of identifying disease-associated variants as a problem of variable selection in a sparse space, that is, how best to model the relationship between phenotypes and a set of genetic variants. By constructing a risk index score for a group of rare variants, my method can effectively consider all variants in a multivariate model. I also use a within-chain permutation to generate the empirical thresholds to detect true-positive variants. Results I apply our method to study the association between increases in baseline systolic and diastolic blood pressure (SBP and DBP, respectively) and genetic variants in the data from Genetic Analysis Workshop 19 unrelated samples. I identify several rare and common variants in the gene MAP4 that are potentially associated with SBP and DBP. Conclusions The application shows that my method is powerful in identifying disease-associated variants even with the extreme rarity.
Collapse
|
6
|
Stell L, Sabatti C. Genetic Variant Selection: Learning Across Traits and Sites. Genetics 2016; 202:439-55. [PMID: 26680660 PMCID: PMC4788227 DOI: 10.1534/genetics.115.184572] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 11/30/2015] [Indexed: 11/18/2022] Open
Abstract
We consider resequencing studies of associated loci and the problem of prioritizing sequence variants for functional follow-up. Working within the multivariate linear regression framework helps us to account for the joint effects of multiple genes; and adopting a Bayesian approach leads to posterior probabilities that coherently incorporate all information about the variants' function. We describe two novel prior distributions that facilitate learning the role of each variable site by borrowing evidence across phenotypes and across mutations in the same gene. We illustrate their potential advantages with simulations and reanalyzing a data set of sequencing variants.
Collapse
Affiliation(s)
- Laurel Stell
- Department of Health Research and Policy, Stanford University, Stanford, California 94305
| | - Chiara Sabatti
- Department of Health Research and Policy, Stanford University, Stanford, California 94305 Department of Statistics, Stanford University, Stanford, California 94305
| |
Collapse
|
7
|
Genetic Contribution of Variants near SORT1 and APOE on LDL Cholesterol Independent of Obesity in Children. PLoS One 2015; 10:e0138064. [PMID: 26375028 PMCID: PMC4573320 DOI: 10.1371/journal.pone.0138064] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 08/25/2015] [Indexed: 11/19/2022] Open
Abstract
Objective To assess potential effects of variants in six lipid modulating genes (SORT1, HMGCR, MLXIPL, FADS2, APOE and MAFB) on early development of dyslipidemia independent of the degree of obesity in children, we investigated their association with total (TC), low density lipoprotein (LDL-C), high density lipoprotein (HDL-C) cholesterol and triglyceride (TG) levels in 594 children. Furthermore, we evaluated the expression profile of the candidate genes during human adipocyte differentiation. Results Expression of selected genes increased 101 to >104 fold during human adipocyte differentiation, suggesting a potential link with adipogenesis. In genetic association studies adjusted for age, BMI SDS and sex, we identified significant associations for rs599839 near SORT1 with TC and LDL-C and for rs4420638 near APOE with TC and LDL-C. We performed Bayesian modelling of the combined lipid phenotype of HDL-C, LDL-C and TG to identify potentially causal polygenic effects on this multi-dimensional phenotype and considering obesity, age and sex as a-priori modulating factors. This analysis confirmed that rs599839 and rs4420638 affect LDL-C. Conclusion We show that lipid modulating genes are dynamically regulated during adipogenesis and that variants near SORT1 and APOE influence lipid levels independent of obesity in children. Bayesian modelling suggests causal effects of these variants.
Collapse
|
8
|
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics. Genetics 2015; 200:719-36. [PMID: 25948564 DOI: 10.1534/genetics.115.176107] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2015] [Accepted: 05/04/2015] [Indexed: 01/08/2023] Open
Abstract
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
Collapse
|
9
|
Capanu M, Seshan VE. False discovery rates for rare variants from sequenced data. Genet Epidemiol 2015; 39:65-76. [PMID: 25556339 PMCID: PMC4711769 DOI: 10.1002/gepi.21880] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Revised: 10/23/2014] [Accepted: 11/11/2014] [Indexed: 01/22/2023]
Abstract
The detection of rare deleterious variants is the preeminent current technical challenge in statistical genetics. Sorting the deleterious from neutral variants at a disease locus is challenging because of the sparseness of the evidence for each individual variant. Hierarchical modeling and Bayesian model uncertainty are two techniques that have been shown to be promising in pinpointing individual rare variants that may be driving the association. Interpreting the results from these techniques from the perspective of multiple testing is a challenge and the goal of this article is to better understand their false discovery properties. Using simulations, we conclude that accurate false discovery control cannot be achieved in this framework unless the magnitude of the variants' risk is large and the hierarchical characteristics have high accuracy in distinguishing deleterious from neutral variants.
Collapse
Affiliation(s)
- Marinela Capanu
- Memorial Sloan-Kettering Cancer Center, 307 E 63rd St, 3rd Floor, New York, NY 10021
| | - Venkatraman E. Seshan
- Memorial Sloan-Kettering Cancer Center, 307 E 63rd St, 3rd Floor, New York, NY 10021
| |
Collapse
|
10
|
Marjoram P, Thomas DC. Next-Generation Sequencing Studies: Optimal Design and Analysis, Missing Heritability and Rare Variants. CURR EPIDEMIOL REP 2014. [DOI: 10.1007/s40471-014-0022-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
11
|
Logsdon BA, Dai JY, Auer PL, Johnsen JM, Ganesh SK, Smith NL, Wilson JG, Tracy RP, Lange LA, Jiao S, Rich SS, Lettre G, Carlson CS, Jackson RD, O'Donnell CJ, Wurfel MM, Nickerson DA, Tang H, Reiner AP, Kooperberg C. A variational Bayes discrete mixture test for rare variant association. Genet Epidemiol 2014; 38:21-30. [PMID: 24482836 DOI: 10.1002/gepi.21772] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that "aggregate" tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare-variant test that explicitly models a fraction of variants as neutral, tests associations at the gene-level, and infers the rare-variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome-wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare-variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.
Collapse
|
12
|
Duan L, Thomas DC. A Bayesian Hierarchical Model for Relating Multiple SNPs within Multiple Genes to Disease Risk. Int J Genomics 2013; 2013:406217. [PMID: 24490143 PMCID: PMC3892936 DOI: 10.1155/2013/406217] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 09/03/2013] [Accepted: 09/09/2013] [Indexed: 11/18/2022] Open
Abstract
A variety of methods have been proposed for studying the association of multiple genes thought to be involved in a common pathway for a particular disease. Here, we present an extension of a Bayesian hierarchical modeling strategy that allows for multiple SNPs within each gene, with external prior information at either the SNP or gene level. The model involves variable selection at the SNP level through latent indicator variables and Bayesian shrinkage at the gene level towards a prior mean vector and covariance matrix that depend on external information. The entire model is fitted using Markov chain Monte Carlo methods. Simulation studies show that the approach is capable of recovering many of the truly causal SNPs and genes, depending upon their frequency and size of their effects. The method is applied to data on 504 SNPs in 38 candidate genes involved in DNA damage response in the WECARE study of second breast cancers in relation to radiotherapy exposure.
Collapse
Affiliation(s)
- Lewei Duan
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California (USC), 2001 N. Soto Street, Los Angeles, CA, USA
| | - Duncan C. Thomas
- Division of Biostatistics, Department of Preventive Medicine, University of Southern California (USC), 2001 N. Soto Street, Los Angeles, CA, USA
| |
Collapse
|
13
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
14
|
Marjoram P, Zubair A, Nuzhdin SV. Post-GWAS: where next? More samples, more SNPs or more biology? Heredity (Edinb) 2013; 112:79-88. [PMID: 23759726 DOI: 10.1038/hdy.2013.52] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Revised: 03/19/2013] [Accepted: 04/09/2013] [Indexed: 11/09/2022] Open
Abstract
The power of genome-wide association studies (GWAS) rests on several foundations: (i) there is a significant amount of additive genetic variation, (ii) individual causal polymorphisms often have sizable effects and (iii) they segregate at moderate-to-intermediate frequencies, or will be effectively 'tagged' by polymorphisms that do. Each of these assumptions has recently been questioned. (i) Why should genetic variation appear additive given that the underlying molecular networks are highly nonlinear? (ii) A new generation of relatedness-based analyses directs us back to the nearly infinitesimal model for effect sizes that quantitative genetics was long based upon. (iii) Larger effect causal polymorphisms are often low frequency, as selection might lead us to expect. Here, we review these issues and other findings that appear to question many of the foundations of the optimism GWAS prompted. We then present a roadmap emerging as one possible future for quantitative genetics. We argue that in future GWAS should move beyond purely statistical grounds. One promising approach is to build upon the combination of population genetic models and molecular biological knowledge. This combined treatment, however, requires fitting experimental data to models that are very complex, as well as accurate capturing of the uncertainty of resulting inference. This problem can be resolved through Bayesian analysis and tools such as approximate Bayesian computation-a method growing in popularity in population genetic analysis. We show a case example of anterior-posterior segmentation in Drosophila, and argue that similar approaches will be helpful as a GWAS augmentation, in human and agricultural research.
Collapse
Affiliation(s)
- P Marjoram
- 1] Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA [2] Program in Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | | |
Collapse
|
15
|
Thomas DC. Some surprising twists on the road to discovering the contribution of rare variants to complex diseases. Hum Hered 2013; 74:113-7. [PMID: 23594489 DOI: 10.1159/000347020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|