1
|
Ali F, Zhang J. Mixture model-based association analysis with case-control data in genome wide association studies. Stat Appl Genet Mol Biol 2017; 16:173-187. [PMID: 28723613 DOI: 10.1515/sagmb-2016-0022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Multilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated disease penetrances. A theoretical justification of the above model is provided. Furthermore, we introduce a hypothesis test for haplotype inheritance patterns which underpin this model. The performance of the proposed approach is evaluated by simulations and real data analysis. The results show that the proposed approach outperforms an existing multiple testing method.
Collapse
|
2
|
Shi H, Pasaniuc B, Lange KL. A multivariate Bernoulli model to predict DNaseI hypersensitivity status from haplotype data. Bioinformatics 2015; 31:3514-21. [PMID: 26139633 DOI: 10.1093/bioinformatics/btv397] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2015] [Accepted: 06/24/2015] [Indexed: 01/02/2023] Open
Abstract
MOTIVATION Haplotype models enjoy a wide range of applications in population inference and disease gene discovery. The hidden Markov models traditionally used for haplotypes are hindered by the dubious assumption that dependencies occur only between consecutive pairs of variants. In this article, we apply the multivariate Bernoulli (MVB) distribution to model haplotype data. The MVB distribution relies on interactions among all sets of variants, thus allowing for the detection and exploitation of long-range and higher-order interactions. We discuss penalized estimation and present an efficient algorithm for fitting sparse versions of the MVB distribution to haplotype data. Finally, we showcase the benefits of the MVB model in predicting DNaseI hypersensitivity (DH) status--an epigenetic mark describing chromatin accessibility--from population-scale haplotype data. RESULTS We fit the MVB model to real data from 59 individuals on whom both haplotypes and DH status in lymphoblastoid cell lines are publicly available. The model allows prediction of DH status from genetic data (prediction R2=0.12 in cross-validations). Comparisons of prediction under the MVB model with prediction under linear regression (best linear unbiased prediction) and logistic regression demonstrate that the MVB model achieves about 10% higher prediction R2 than the two competing methods in empirical data. AVAILABILITY AND IMPLEMENTATION Software implementing the method described can be downloaded at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT shihuwenbo@ucla.edu or pasaniuc@ucla.edu.
Collapse
Affiliation(s)
- Huwenbo Shi
- Bioinformatics Interdepartmental Program, University of California, Los Angeles
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Department of Pathology and Laboratory Medicine, Department of Human Genetics and
| | - Kenneth L Lange
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Department of Human Genetics and Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90024, USA
| |
Collapse
|
3
|
Wallace C, Cutler AJ, Pontikos N, Pekalski ML, Burren OS, Cooper JD, García AR, Ferreira RC, Guo H, Walker NM, Smyth DJ, Rich SS, Onengut-Gumuscu S, Sawcer SJ, Ban M, Richardson S, Todd JA, Wicker LS. Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping. PLoS Genet 2015; 11:e1005272. [PMID: 26106896 PMCID: PMC4481316 DOI: 10.1371/journal.pgen.1005272] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 05/12/2015] [Indexed: 12/15/2022] Open
Abstract
Identification of candidate causal variants in regions associated with risk of common diseases is complicated by linkage disequilibrium (LD) and multiple association signals. Nonetheless, accurate maps of these variants are needed, both to fully exploit detailed cell specific chromatin annotation data to highlight disease causal mechanisms and cells, and for design of the functional studies that will ultimately be required to confirm causal mechanisms. We adapted a Bayesian evolutionary stochastic search algorithm to the fine mapping problem, and demonstrated its improved performance over conventional stepwise and regularised regression through simulation studies. We then applied it to fine map the established multiple sclerosis (MS) and type 1 diabetes (T1D) associations in the IL-2RA (CD25) gene region. For T1D, both stepwise and stochastic search approaches identified four T1D association signals, with the major effect tagged by the single nucleotide polymorphism, rs12722496. In contrast, for MS, the stochastic search found two distinct competing models: a single candidate causal variant, tagged by rs2104286 and reported previously using stepwise analysis; and a more complex model with two association signals, one of which was tagged by the major T1D
associated rs12722496 and the other by rs56382813. There is low to moderate LD between rs2104286 and both rs12722496 and rs56382813 (r2 ≃ 0:3) and our two SNP model could not be recovered through a forward stepwise search after conditioning on rs2104286. Both signals in the two variant model for MS affect CD25 expression on distinct subpopulations of CD4+ T cells, which are key cells in the autoimmune process. The results support a shared causal variant for T1D and MS. Our study illustrates the benefit of using a purposely designed model search strategy for fine mapping and the advantage of combining disease and protein expression data. Genetic association studies have identified many DNA sequence variants that associate with disease risk. By exploiting the known correlation that exists between neighbouring variants in the genome, inference can be extended beyond those individual variants tested to identify sets within which a causal variant is likely to reside. However, this correlation, particularly in the presence of multiple disease causing variants in relative proximity, makes disentangling the specific causal variants difficult. Statistical approaches to this fine mapping problem have traditionally taken a stepwise search approach, beginning with the most associated variant in a region, then iteratively attempting to find additional associated variants. We adapted a stochastic search approach that avoids this stepwise process and is explicitly designed for dealing with highly correlated predictors to the fine mapping problem. We showed in simulated data that it outperforms its stepwise counterpart and other variable selection strategies such as the lasso. We applied our approach to understand the association of two immune-mediated diseases to a region on chromosome 10p15. We identified a model for multiple sclerosis containing two variants, neither of which was found through a stepwise search, and functionally linked both of these to the neighbouring candidate gene, IL2RA, in independent data. Our approach can be used to aid fine mapping of other disease-associated regions, which is critical for design of functional follow-up studies required to understand the mechanisms through which genetic variants influence disease.
Collapse
Affiliation(s)
- Chris Wallace
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom; MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - Antony J Cutler
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Nikolas Pontikos
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Marcin L Pekalski
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Oliver S Burren
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Jason D Cooper
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Arcadio Rubio García
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Ricardo C Ferreira
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Hui Guo
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom; Centre for Biostatistics Institute of Population Health, The University of Manchester Manchester, United Kingdom
| | - Neil M Walker
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Deborah J Smyth
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America; Department of Medicine, Division of Endocrinology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Suna Onengut-Gumuscu
- Center for Public Health Genomics, University of Virginia, Charlottesville, Virginia, United States of America; Department of Public Health Sciences, Division of Biostatistics and Epidemiology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Stephen J Sawcer
- University of Cambridge, Department of Clinical Neurosciences, Cambridge, United Kingdom
| | - Maria Ban
- University of Cambridge, Department of Clinical Neurosciences, Cambridge, United Kingdom
| | - Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, Cambridge, United Kingdom
| | - John A Todd
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| | - Linda S Wicker
- JDRF/Wellcome Trust Diabetes and Inflammation Laboratory, Department of Medical Genetics, NIHR Biomedical Research Centre, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
4
|
Howey R, Cordell HJ. Imputation without doing imputation: a new method for the detection of non-genotyped causal variants. Genet Epidemiol 2014; 38:173-90. [PMID: 24535679 PMCID: PMC4150535 DOI: 10.1002/gepi.21792] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 12/30/2013] [Accepted: 12/31/2013] [Indexed: 01/22/2023]
Abstract
Genome-wide association studies allow detection of non-genotyped disease-causing variants through testing of nearby genotyped SNPs. This approach may fail when there are no genotyped SNPs in strong LD with the causal variant. Several genotyped SNPs in weak LD with the causal variant may, however, considered together, provide equivalent information. This observation motivates popular but computationally intensive approaches based on imputation or haplotyping. Here we present a new method and accompanying software designed for this scenario. Our approach proceeds by selecting, for each genotyped "anchor" SNP, a nearby genotyped "partner" SNP, chosen via a specific algorithm we have developed. These two SNPs are used as predictors in linear or logistic regression analysis to generate a final significance test. In simulations, our method captures much of the signal captured by imputation, while taking a fraction of the time and disc space, and generating a smaller number of false-positives. We apply our method to a case/control study of severe malaria genotyped using the Affymetrix 500K array. Previous analysis showed that fine-scale sequencing of a Gambian reference panel in the region of the known causal locus, followed by imputation, increased the signal of association to genome-wide significance levels. Our method also increases the signal of association from P ≈ 2 × 10⁻⁶ to P ≈ 6 × 10⁻¹¹. Our method thus, in some cases, eliminates the need for more complex methods such as sequencing and imputation, and provides a useful additional test that may be used to identify genetic regions of interest.
Collapse
Affiliation(s)
- Richard Howey
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| | - Heather J Cordell
- Institute of Genetic Medicine, Newcastle University, International Centre for Life, Central ParkwayNewcastle upon Tyne, United Kingdom
| |
Collapse
|
5
|
Delaneau O, Howie B, Cox AJ, Zagury JF, Marchini J. Haplotype estimation using sequencing reads. Am J Hum Genet 2013; 93:687-96. [PMID: 24094745 DOI: 10.1016/j.ajhg.2013.09.002] [Citation(s) in RCA: 267] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/19/2013] [Accepted: 09/04/2013] [Indexed: 12/20/2022] Open
Abstract
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
Collapse
Affiliation(s)
- Olivier Delaneau
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| | | | | | | | | |
Collapse
|
6
|
Wan X, Yang C, Yang Q, Zhao H, Yu W. HapBoost: a fast approach to boosting haplotype association analyses in genome-wide association studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:207-212. [PMID: 23702557 DOI: 10.1109/tcbb.2013.6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Genome-wide association study (GWAS) has been successful in identifying genetic variants that are associated with complex human diseases. In GWAS, multilocus association analyses through linkage disequilibrium (LD), named haplotype-based analyses, may have greater power than single-locus analyses for detecting disease susceptibility loci. However, the large number of SNPs genotyped in GWAS poses great computational challenges in the detection of haplotype associations. We present a fast method named HapBoost for finding haplotype associations, which can be applied to quickly screen the whole genome. The effectiveness of HapBoost is demonstrated by using both synthetic and real data sets. The experimental results show that the proposed approach can achieve comparably accurate results while it performs much faster than existing methods.
Collapse
Affiliation(s)
- Xiang Wan
- Department of Computer Science and Institute of Computational and Theoretical Studies, Hong Kong Baptist University, Hong Kong.
| | | | | | | | | |
Collapse
|
7
|
Fang M. A fast expectation-maximum algorithm for fine-scale QTL mapping. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:1727-1734. [PMID: 22865126 DOI: 10.1007/s00122-012-1949-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Accepted: 07/15/2012] [Indexed: 06/01/2023]
Abstract
The recent technology of the single-nucleotide-polymorphism (SNP) array makes it possible to genotype millions of SNP markers on genome, which in turn requires to develop fast and efficient method for fine-scale quantitative trait loci (QTL) mapping. The single-marker association (SMA) is the simplest method for fine-scale QTL mapping, but it usually shows many false-positive signals and has low QTL-detection power. Compared with SMA, the haplotype-based method of Meuwissen and Goddard who assume QTL effect to be random and estimate variance components (VC) with identity-by-descent (IBD) matrices that inferred from unknown historic population is more powerful for fine-scale QTL mapping; furthermore, their method also tends to show continuous QTL-detection profile to diminish many false-positive signals. However, as we know, the variance component estimation is usually very time consuming and difficult to converge. Thus, an extremely fast EMF (Expectation-Maximization algorithm under Fixed effect model) is proposed in this research, which assumes a biallelic QTL and uses an expectation-maximization (EM) algorithm to solve model effects. The results of simulation experiments showed that (1) EMF was computationally much faster than VC method; (2) EMF and VC performed similarly in QTL detection power and parameter estimations, and both outperformed the paired-marker analysis and SMA. However, the power of EMF would be lower than that of VC if the QTL was multiallelic.
Collapse
Affiliation(s)
- Ming Fang
- Life Science College, Heilongjiang Bayi Agricultural University, Daqing 163319, People's Republic of China.
| |
Collapse
|
8
|
Fine-scale mapping of disease susceptibility locus with Bayesian partition model. Genes Genomics 2012. [DOI: 10.1007/s13258-011-0220-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
9
|
Barendse W. Haplotype analysis improved evidence for candidate genes for intramuscular fat percentage from a genome wide association study of cattle. PLoS One 2011; 6:e29601. [PMID: 22216329 PMCID: PMC3247274 DOI: 10.1371/journal.pone.0029601] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2011] [Accepted: 12/01/2011] [Indexed: 11/23/2022] Open
Abstract
In genome wide association studies (GWAS), haplotype analyses of SNP data are neglected in favour of single point analysis of associations. In a recent GWAS, we found that none of the known candidate genes for intramuscular fat (IMF) had been identified. In this study, data from the GWAS for these candidate genes were re-analysed as haplotypes. First, we confirmed that the methodology would find evidence for association between haplotypes in candidate genes of the calpain-calpastatin complex and musculus longissimus lumborum peak force (LLPF), because these genes had been confirmed through single point analysis in the GWAS. Then, for intramuscular fat percent (IMF), we found significant partial haplotype substitution effects for the genes ADIPOQ and CXCR4, as well as suggestive associations to the genes CEBPA, FASN, and CAPN1. Haplotypes for these genes explained 80% more of the phenotypic variance compared to the best single SNP. For some genes the analyses suggested that there was more than one causative mutation in some genes, or confirmed that some causative mutations are limited to particular subgroups of a species. Fitting the SNPs and their interactions simultaneously explained a similar amount of the phenotypic variance compared to haplotype analyses. Haplotype analysis is a neglected part of the suite of tools used to analyse GWAS data, would be a useful method to extract more information from these data sets, and may contribute to reducing the missing heritability problem.
Collapse
Affiliation(s)
- William Barendse
- Cooperative Research Centre for Beef Genetic Technologies, Commonwealth Scientific and Industrial Research Organization, St. Lucia, Queensland, Australia.
| |
Collapse
|
10
|
Wason JMS, Dudbridge F. Comparison of multimarker logistic regression models, with application to a genomewide scan of schizophrenia. BMC Genet 2010; 11:80. [PMID: 20828390 PMCID: PMC2949738 DOI: 10.1186/1471-2156-11-80] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2010] [Accepted: 09/09/2010] [Indexed: 11/29/2022] Open
Abstract
Background Genome-wide association studies (GWAS) are a widely used study design for detecting genetic causes of complex diseases. Current studies provide good coverage of common causal SNPs, but not rare ones. A popular method to detect rare causal variants is haplotype testing. A disadvantage of this approach is that many parameters are estimated simultaneously, which can mean a loss of power and slower fitting to large datasets. Haplotype testing effectively tests both the allele frequencies and the linkage disequilibrium (LD) structure of the data. LD has previously been shown to be mostly attributable to LD between adjacent SNPs. We propose a generalised linear model (GLM) which models the effects of each SNP in a region as well as the statistical interactions between adjacent pairs. This is compared to two other commonly used multimarker GLMs: one with a main-effect parameter for each SNP; one with a parameter for each haplotype. Results We show the haplotype model has higher power for rare untyped causal SNPs, the main-effects model has higher power for common untyped causal SNPs, and the proposed model generally has power in between the two others. We show that the relative power of the three methods is dependent on the number of marker haplotypes the causal allele is present on, which depends on the age of the mutation. Except in the case of a common causal variant in high LD with markers, all three multimarker models are superior in power to single-SNP tests. Including the adjacent statistical interactions results in lower inflation in test statistics when a realistic level of population stratification is present in a dataset. Using the multimarker models, we analyse data from the Molecular Genetics of Schizophrenia study. The multimarker models find potential associations that are not found by single-SNP tests. However, multimarker models also require stricter control of data quality since biases can have a larger inflationary effect on multimarker test statistics than on single-SNP test statistics. Conclusions Analysing a GWAS with multimarker models can yield candidate regions which may contain rare untyped causal variants. This is useful for increasing prior odds of association in future whole-genome sequence analyses.
Collapse
Affiliation(s)
- James M S Wason
- MRC Biostatistics Unit, Institute of Public Health, Cambridge CB2 0SR, UK.
| | | |
Collapse
|
11
|
Maestrini E, Pagnamenta AT, Lamb JA, Bacchelli E, Sykes NH, Sousa I, Toma C, Barnby G, Butler H, Winchester L, Scerri TS, Minopoli F, Reichert J, Cai G, Buxbaum JD, Korvatska O, Schellenberg GD, Dawson G, Bildt AD, Minderaa RB, Mulder EJ, Morris AP, Bailey AJ, Monaco AP. High-density SNP association study and copy number variation analysis of the AUTS1 and AUTS5 loci implicate the IMMP2L-DOCK4 gene region in autism susceptibility. Mol Psychiatry 2010; 15:954-68. [PMID: 19401682 PMCID: PMC2934739 DOI: 10.1038/mp.2009.34] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Revised: 02/19/2009] [Accepted: 04/02/2009] [Indexed: 01/02/2023]
Abstract
Autism spectrum disorders are a group of highly heritable neurodevelopmental disorders with a complex genetic etiology. The International Molecular Genetic Study of Autism Consortium previously identified linkage loci on chromosomes 7 and 2, termed AUTS1 and AUTS5, respectively. In this study, we performed a high-density association analysis in AUTS1 and AUTS5, testing more than 3000 single nucleotide polymorphisms (SNPs) in all known genes in each region, as well as SNPs in non-genic highly conserved sequences. SNP genotype data were also used to investigate copy number variation within these regions. The study sample consisted of 127 and 126 families, showing linkage to the AUTS1 and AUTS5 regions, respectively, and 188 gender-matched controls. Further investigation of the strongest association results was conducted in an independent European family sample containing 390 affected individuals. Association and copy number variant analysis highlighted several genes that warrant further investigation, including IMMP2L and DOCK4 on chromosome 7. Evidence for the involvement of DOCK4 in autism susceptibility was supported by independent replication of association at rs2217262 and the finding of a deletion segregating in a sib-pair family.
Collapse
Affiliation(s)
- E Maestrini
- Department of Biology, University of Bologna, Bologna, Italy
| | - A T Pagnamenta
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - J A Lamb
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK
| | - E Bacchelli
- Department of Biology, University of Bologna, Bologna, Italy
| | - N H Sykes
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - I Sousa
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - C Toma
- Department of Biology, University of Bologna, Bologna, Italy
| | - G Barnby
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - H Butler
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - L Winchester
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - T S Scerri
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - F Minopoli
- Department of Biology, University of Bologna, Bologna, Italy
| | - J Reichert
- Department of Psychiatry, Seaver Autism Research Center, Mount Sinai School of Medicine, New York, NY, USA
| | - G Cai
- Department of Psychiatry, Seaver Autism Research Center, Mount Sinai School of Medicine, New York, NY, USA
| | - J D Buxbaum
- Department of Psychiatry, Seaver Autism Research Center, Mount Sinai School of Medicine, New York, NY, USA
| | - O Korvatska
- Geriatric Research Education and Clinical Centre, Veterans Affairs Puget Sound Health Care System, Seattle Division, Seattle, WA, USA
| | - G D Schellenberg
- Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
| | - G Dawson
- Autism Speaks, New York, NY, USA
- Department of Psychology, University of Washington, Seattle, WA, USA
| | - A de Bildt
- Department of Psychiatry, Child and Adolescent Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
| | - R B Minderaa
- Department of Psychiatry, Child and Adolescent Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
| | - E J Mulder
- Department of Psychiatry, Child and Adolescent Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
| | - A P Morris
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - A J Bailey
- University Department of Psychiatry, Warneford Hospital, Oxford, UK
| | - A P Monaco
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - IMGSAC12
- Department of Biology, University of Bologna, Bologna, Italy
- The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
- Centre for Integrated Genomic Medical Research, University of Manchester, Manchester, UK
- Department of Psychiatry, Seaver Autism Research Center, Mount Sinai School of Medicine, New York, NY, USA
- Geriatric Research Education and Clinical Centre, Veterans Affairs Puget Sound Health Care System, Seattle Division, Seattle, WA, USA
- Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA, USA
- Autism Speaks, New York, NY, USA
- Department of Psychology, University of Washington, Seattle, WA, USA
- Department of Psychiatry, Child and Adolescent Psychiatry, University Medical Center Groningen, Groningen, The Netherlands
- University Department of Psychiatry, Warneford Hospital, Oxford, UK
| |
Collapse
|
12
|
Barnes KC. An update on the genetics of atopic dermatitis: scratching the surface in 2009. J Allergy Clin Immunol 2010; 125:16-29.e1-11; quiz 30-1. [PMID: 20109730 DOI: 10.1016/j.jaci.2009.11.008] [Citation(s) in RCA: 228] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2009] [Revised: 11/06/2009] [Accepted: 11/09/2009] [Indexed: 12/27/2022]
Abstract
A genetic basis for atopic dermatitis (AD) has long been recognized. Historic documents allude to family history of disease as a risk factor. Before characterization of the human genome, heritability studies combined with family-based linkage studies supported the definition of AD as a complex trait in that interactions between genes and environmental factors and the interplay between multiple genes contribute to disease manifestation. A summary of more than 100 published reports on genetic association studies through mid-2009 implicates 81 genes, in 46 of which at least 1 positive association with AD has been demonstrated. Of these, the gene encoding filaggrin (FLG) has been most consistently replicated. Most candidate gene studies to date have focused on adaptive and innate immune response genes, but there is increasing interest in skin barrier dysfunction genes. This review examines the methods that have been used to identify susceptibility genes for AD and how the underlying pathology of this disease has been used to select candidate genes. Current challenges and the potential effect of new technologies are discussed.
Collapse
Affiliation(s)
- Kathleen C Barnes
- Johns Hopkins Asthma & Allergy Center, 5501 Hopkins Bayview Circle, Room 3A.62, Baltimore, MD 21224, USA.
| |
Collapse
|
13
|
Tönjes A, Zeggini E, Kovacs P, Böttcher Y, Schleinitz D, Dietrich K, Morris AP, Enigk B, Rayner NW, Koriath M, Eszlinger M, Kemppinen A, Prokopenko I, Hoffmann K, Teupser D, Thiery J, Krohn K, McCarthy MI, Stumvoll M. Association of FTO variants with BMI and fat mass in the self-contained population of Sorbs in Germany. Eur J Hum Genet 2010; 18:104-10. [PMID: 19584900 DOI: 10.1038/ejhg.2009.107] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The association between common variants in the FTO gene with weight, adiposity and body mass index (BMI) has now been widely replicated. Although the causal variant has yet to be identified, it most likely maps within a 47 kb region of intron 1 of FTO. We performed a genome-wide association study in the Sorbian population and evaluated the relationships between FTO variants and BMI and fat mass in this isolate of Slavonic origin resident in Germany. In a sample of 948 Sorbs, we could replicate the earlier reported associations of intron 1 SNPs with BMI (eg, P-value=0.003, beta=0.02 for rs8050136). However, using genome-wide association data, we also detected a second independent signal mapping to a region in intron 2/3 about 40-60 kb away from the originally reported SNPs (eg, for rs17818902 association with BMI P-value=0.0006, beta=-0.03 and with fat mass P-value=0.0018, beta=-0.079). Both signals remain independently associated in the conditioned analyses. In conclusion, we extend the evidence that FTO variants are associated with BMI by putatively identifying a second susceptibility allele independent of that described earlier. Although further statistical analysis of these findings is hampered by the finite size of the Sorbian isolate, these findings should encourage other groups to seek alternative susceptibility variants within FTO (and other established susceptibility loci) using the opportunities afforded by analyses in populations with divergent mutational and/or demographic histories.
Collapse
Affiliation(s)
- Anke Tönjes
- Department of Medicine, Coordination Centre for Clinical Trials, University of Leipzig, Leipzig, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
The diverse applications of cladistic analysis of molecular evolution, with special reference to nested clade analysis. Int J Mol Sci 2010; 11:124-39. [PMID: 20162005 PMCID: PMC2820993 DOI: 10.3390/ijms11010124] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Revised: 01/06/2010] [Accepted: 01/06/2010] [Indexed: 11/17/2022] Open
Abstract
The genetic variation found in small regions of the genomes of many species can be arranged into haplotype trees that reflect the evolutionary genealogy of the DNA lineages found in that region and the accumulation of mutations on those lineages. This review demonstrates some of the many ways in which clades (branches) of haplotype trees have been applied in recent years, including the study of genotype/phenotype associations at candidate loci and in genome-wide association studies, the phylogeographic history of species, human evolution, the conservation of endangered species, and the identification of species.
Collapse
|
15
|
Abstract
We describe a fast hierarchical Bayesian method for mapping quantitative trait loci by haplotype-based association, applicable when haplotypes are not observed directly but are inferred from multiple marker genotypes. The method avoids the use of a Monte Carlo Markov chain by employing priors for which the likelihood factorizes completely. It is parameterized by a single hyperparameter, the fraction of variance explained by the quantitative trait locus, compared to the frequentist fixed-effects model, which requires a parameter for the phenotypic effect of each combination of haplotypes; nevertheless it still provides estimates of haplotype effects. We use simulation to show that the method matches the power of the frequentist regression model and, when the haplotypes are inferred, exceeds it for small QTL effect sizes. The Bayesian estimates of the haplotype effects are more accurate than the frequentist estimates, for both known and inferred haplotypes, which indicates that this advantage is independent of the effect of uncertainty in haplotype inference and will hold in comparison with frequentist methods in general. We apply the method to data from a panel of recombinant inbred lines of Arabidopsis thaliana, descended from 19 inbred founders.
Collapse
|
16
|
Carpenter D, Ringrose C, Leo V, Morris A, Robinson RL, Halsall PJ, Hopkins PM, Shaw MA. The role of CACNA1S in predisposition to malignant hyperthermia. BMC MEDICAL GENETICS 2009; 10:104. [PMID: 19825159 PMCID: PMC2770053 DOI: 10.1186/1471-2350-10-104] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2009] [Accepted: 10/13/2009] [Indexed: 01/19/2023]
Abstract
BACKGROUND Malignant hyperthermia (MH) is an inherited pharmacogenetic disorder of skeletal muscle, characterised by an elevated calcium release from the skeletal muscle sarcoplasmic reticulum. The dihydropyridine receptor (DHPR) plays an essential role in excitation-contraction coupling and calcium homeostasis in skeletal muscle. This study focuses on the gene CACNA1S which encodes the alpha1 subunit of the DHPR, in order to establish whether CACNA1S plays a major role in MH susceptibility in the UK. METHODS We investigate the CACNA1S locus in detail in 50 independent MH patients, the largest study to date, to identify novel variants that may predispose to disease and also to characterise the haplotype structure across CACNA1S. RESULTS We present CACNA1S cDNA sequencing data from 50 MH patients in whom RYR1 mutations have been excluded, and subsequent mutation screening analysis. Furthermore we present haplotype analysis of unphased CACNA1S SNPs to (1) assess CACNA1S haplotype frequency differences between susceptible MH cases and a European control group and (2) analyse population-based association via clustering of CACNA1S haplotypes based on disease risk. CONCLUSION The study identified a single potentially pathogenic change in CACNA1S (p.Arg174Trp), and highlights that the haplotype structure across CACNA1S is diverse, with a high degree of variability.
Collapse
Affiliation(s)
- Danielle Carpenter
- MH Investigation Unit, Academic Unit of Anaesthesia, St James's University Hospital, Leeds, LS9 7TF, UK.
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Igo RP, Li J, Goddard KAB. Association mapping by generalized linear regression with density-based haplotype clustering. Genet Epidemiol 2009; 33:16-26. [PMID: 18561202 DOI: 10.1002/gepi.20352] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Haplotypes of closely linked single-nucleotide polymorphisms (SNPs) potentially offer greater power than individual SNPs to detect association between genetic variants and disease. We present a novel approach for association mapping in which density-based clustering of haplotypes reduces the dimensionality of the general linear model (GLM)-based score test of association implemented in the HaploStats software (Schaid et al. [2002] Am. J. Hum. Genet. 70:425-434). A flexible haplotype similarity score, a generalization of previously used measures, forms the basis, for grouping haplotypes of probable recent common ancestry. All haplotypes within a cluster are assigned the same regression coefficient within the GLM, and evidence for association is assessed with a score statistic. The approach is applicable to both binary and continuous trait data, and does not require prior phase information. Results of simulation studies demonstrated that clustering enhanced the power of the score test to detect association, under a variety of conditions, while preserving valid Type-I error. Improvement in performance was most dramatic in the presence of extreme haplotype diversity, while a slight improvement was observed even at low diversity. Our method also offers, for binary traits, a slight advantage in power over a similar approach based on an evolutionary model (Tzeng et al. [2006] Am. J. Hum. Genet. 78:231-242).
Collapse
Affiliation(s)
- Robert P Igo
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, USA
| | | | | |
Collapse
|
18
|
Dai JY, Leblanc M, Smith NL, Psaty B, Kooperberg C. SHARE: an adaptive algorithm to select the most informative set of SNPs for candidate genetic association. Biostatistics 2009; 10:680-93. [PMID: 19605740 PMCID: PMC2742496 DOI: 10.1093/biostatistics/kxp023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Association studies have been widely used to identify genetic liability variants for complex diseases. While scanning the chromosomal region 1 single nucleotide polymorphism (SNP) at a time may not fully explore linkage disequilibrium, haplotype analyses tend to require a fairly large number of parameters, thus potentially losing power. Clustering algorithms, such as the cladistic approach, have been proposed to reduce the dimensionality, yet they have important limitations. We propose a SNP-Haplotype Adaptive REgression (SHARE) algorithm that seeks the most informative set of SNPs for genetic association in a targeted candidate region by growing and shrinking haplotypes with 1 more or less SNP in a stepwise fashion, and comparing prediction errors of different models via cross-validation. Depending on the evolutionary history of the disease mutations and the markers, this set may contain a single SNP or several SNPs that lay a foundation for haplotype analyses. Haplotype phase ambiguity is effectively accounted for by treating haplotype reconstruction as a part of the learning procedure. Simulations and a data application show that our method has improved power over existing methodologies and that the results are informative in the search for disease-causal loci.
Collapse
Affiliation(s)
- James Y Dai
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue N, M2-C200, Seattle, WA 98109, USA.
| | | | | | | | | |
Collapse
|
19
|
Carpenter D, Morris A, Robinson RL, Booms P, Iles D, Halsall PJ, Steele D, Hopkins PM, Shaw MA. Analysis ofRYR1Haplotype Profile in Patients with Malignant Hyperthermia. Ann Hum Genet 2009; 73:10-8. [DOI: 10.1111/j.1469-1809.2008.00482.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
20
|
Hosking FJ, Sterne JAC, Smith GD, Green PJ. Inference from genome-wide association studies using a novel Markov model. Genet Epidemiol 2008; 32:497-504. [PMID: 18383184 DOI: 10.1002/gepi.20322] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.
Collapse
Affiliation(s)
- Fay J Hosking
- Department of Mathematics, University of Bristol, Bristol, UK.
| | | | | | | |
Collapse
|
21
|
Szumska D, Pieles G, Essalmani R, Bilski M, Mesnard D, Kaur K, Franklyn A, El Omari K, Jefferis J, Bentham J, Taylor JM, Schneider JE, Arnold SJ, Johnson P, Tymowska-Lalanne Z, Stammers D, Clarke K, Neubauer S, Morris A, Brown SD, Shaw-Smith C, Cama A, Capra V, Ragoussis J, Constam D, Seidah NG, Prat A, Bhattacharya S. VACTERL/caudal regression/Currarino syndrome-like malformations in mice with mutation in the proprotein convertase Pcsk5. Genes Dev 2008; 22:1465-77. [PMID: 18519639 DOI: 10.1101/gad.479408] [Citation(s) in RCA: 105] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have identified an ethylnitrosourea (ENU)-induced recessive mouse mutation (Vcc) with a pleiotropic phenotype that includes cardiac, tracheoesophageal, anorectal, anteroposterior patterning defects, exomphalos, hindlimb hypoplasia, a presacral mass, renal and palatal agenesis, and pulmonary hypoplasia. It results from a C470R mutation in the proprotein convertase PCSK5 (PC5/6). Compound mutants (Pcsk5(Vcc/null)) completely recapitulate the Pcsk5(Vcc/Vcc) phenotype, as does an epiblast-specific conditional deletion of Pcsk5. The C470R mutation ablates a disulfide bond in the P domain, and blocks export from the endoplasmic reticulum and proprotein convertase activity. We show that GDF11 is cleaved and activated by PCSK5A, but not by PCSK5A-C470R, and that Gdf11-deficient embryos, in addition to having anteroposterior patterning defects and renal and palatal agenesis, also have a presacral mass, anorectal malformation, and exomphalos. Pcsk5 mutation results in abnormal expression of several paralogous Hox genes (Hoxa, Hoxc, and Hoxd), and of Mnx1 (Hlxb9). These include known Gdf11 targets, and are necessary for caudal embryo development. We identified nonsynonymous mutations in PCSK5 in patients with VACTERL (vertebral, anorectal, cardiac, tracheoesophageal, renal, limb malformation OMIM 192350) and caudal regression syndrome, the phenotypic features of which resemble the mouse mutation. We propose that Pcsk5, at least in part via GDF11, coordinately regulates caudal Hox paralogs, to control anteroposterior patterning, nephrogenesis, skeletal, and anorectal development.
Collapse
Affiliation(s)
- Dorota Szumska
- Department of Cardiovascular Medicine and Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Tachmazidou I, Verzilli CJ, De Iorio M. Genetic association mapping via evolution-based clustering of haplotypes. PLoS Genet 2008; 3:e111. [PMID: 17616979 PMCID: PMC1913101 DOI: 10.1371/journal.pgen.0030111] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2006] [Accepted: 05/21/2007] [Indexed: 11/19/2022] Open
Abstract
Multilocus analysis of single nucleotide polymorphism haplotypes is a promising approach to dissecting the genetic basis of complex diseases. We propose a coalescent-based model for association mapping that potentially increases the power to detect disease-susceptibility variants in genetic association studies. The approach uses Bayesian partition modelling to cluster haplotypes with similar disease risks by exploiting evolutionary information. We focus on candidate gene regions with densely spaced markers and model chromosomal segments in high linkage disequilibrium therein assuming a perfect phylogeny. To make this assumption more realistic, we split the chromosomal region of interest into sub-regions or windows of high linkage disequilibrium. The haplotype space is then partitioned into disjoint clusters, within which the phenotype-haplotype association is assumed to be the same. For example, in case-control studies, we expect chromosomal segments bearing the causal variant on a common ancestral background to be more frequent among cases than controls, giving rise to two separate haplotype clusters. The novelty of our approach arises from the fact that the distance used for clustering haplotypes has an evolutionary interpretation, as haplotypes are clustered according to the time to their most recent common ancestor. Our approach is fully Bayesian and we develop a Markov Chain Monte Carlo algorithm to sample efficiently over the space of possible partitions. We compare the proposed approach to both single-marker analyses and recently proposed multi-marker methods and show that the Bayesian partition modelling performs similarly in localizing the causal allele while yielding lower false-positive rates. Also, the method is computationally quicker than other multi-marker approaches. We present an application to real genotype data from the CYP2D6 gene region, which has a confirmed role in drug metabolism, where we succeed in mapping the location of the susceptibility variant within a small error.
Collapse
Affiliation(s)
- Ioanna Tachmazidou
- Department of Epidemiology and Public Health, Imperial College London, United Kingdom.
| | | | | |
Collapse
|
23
|
Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 2008; 9:356-69. [PMID: 18398418 DOI: 10.1038/nrg2344] [Citation(s) in RCA: 1861] [Impact Index Per Article: 116.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The past year has witnessed substantial advances in understanding the genetic basis of many common phenotypes of biomedical importance. These advances have been the result of systematic, well-powered, genome-wide surveys exploring the relationships between common sequence variation and disease predisposition. This approach has revealed over 50 disease-susceptibility loci and has provided insights into the allelic architecture of multifactorial traits. At the same time, much has been learned about the successful prosecution of association studies on such a scale. This Review highlights the knowledge gained, defines areas of emerging consensus, and describes the challenges that remain as researchers seek to obtain more complete descriptions of the susceptibility architecture of biomedical traits of interest and to translate the information gathered into improvements in clinical management.
Collapse
|
24
|
Abstract
Our aim is to review methods to optimize detection of all disease genes in a genetic region. As a starting point, we assume there is sufficient evidence from linkage and/or association studies, based on significance levels or replication studies, for the involvement in disease risk of the genetic region under study. For closely linked markers, there will often be multiple associations with disease, and linkage analyses identify a region rather than the specific disease-predisposing gene. Hence, the first task is to identify the primary (major) disease-predisposing gene or genes in a genetic region, and single nucleotide polymorphisms thereof, that is, how to distinguish true associations from those that are just due to linkage disequilibrium with the actual disease-predisposing variants. Then, how do we detect additional disease genes in this genetic region? These two issues are of course very closely interrelated. No existing programs, either individually or in aggregate, can handle the magnitude and complexity of the analyses needed using currently available methods. Further, even with modern computers, one cannot study every possible combination of genetic markers and their haplotypes across the genome, or even within a genetic region. Although we must rely heavily on computers, in the final analysis of multiple effects in a genetic region and/or interaction or independent effects between unlinked genes, manipulation of the data by the individual investigator will play a crucial role. We recommend a multistrategy approach using a variety of complementary methods described below.
Collapse
|
25
|
Servin B, Stephens M. Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet 2007; 3:e114. [PMID: 17676998 PMCID: PMC1934390 DOI: 10.1371/journal.pgen.0030114] [Citation(s) in RCA: 378] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2006] [Accepted: 05/30/2007] [Indexed: 11/18/2022] Open
Abstract
We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Collapse
Affiliation(s)
- Bertrand Servin
- Department of Statistics, University of Washington, Seattle, Washington, United States of America.
| | | |
Collapse
|