251
|
Beaudoin M, Lo KS, N'Diaye A, Rivas MA, Dubé MP, Laplante N, Phillips MS, Rioux JD, Tardif JC, Lettre G. Pooled DNA resequencing of 68 myocardial infarction candidate genes in French canadians. ACTA ACUST UNITED AC 2012; 5:547-54. [PMID: 22923420 DOI: 10.1161/circgenetics.112.963165] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
BACKGROUND Familial history is a strong risk factor for coronary artery disease (CAD), especially for early-onset myocardial infarction (MI). Several genes and chromosomal regions have been implicated in the genetic cause of coronary artery disease/MI, mostly through the discovery of familial mutations implicated in hyper-/hypocholesterolemia by linkage studies and single nucleotide polymorphisms by genome-wide association studies. Except for a few examples (eg, PCSK9), the role of low-frequency genetic variation (minor allele frequency [MAF]) ≈0.1%-5% on MI/coronary artery disease predisposition has not been extensively investigated. METHODS AND RESULTS We selected 68 candidate genes and sequenced their exons (394 kb) in 500 early-onset MI cases and 500 matched controls, all of French-Canadian ancestry, using solution-based capture in pools of nonindexed DNA samples. In these regions, we identified 1852 single nucleotide variants (695 novel) and captured 85% of the variants with MAF≥1% found by the 1000 Genomes Project in Europe-ancestry individuals. Using gene-based association testing, we prioritized for follow-up 29 low-frequency variants in 8 genes and attempted to genotype them for replication in 1594 MI cases and 2988 controls from 2 French-Canadian panels. Our pilot association analysis of low-frequency variants in 68 candidate genes did not identify genes with large effect on MI risk in French Canadians. CONCLUSIONS We have optimized a strategy, applicable to all complex diseases and traits, to discover efficiently and cost-effectively DNA sequence variants in large populations. Resequencing endeavors to find low-frequency variants implicated in common human diseases are likely to require very large sample size.
Collapse
Affiliation(s)
- Mélissa Beaudoin
- Montreal Heart Institute, 5000 Rue Bélanger, Montreal, Québec, Canada
| | | | | | | | | | | | | | | | | | | |
Collapse
|
252
|
Altmann A, Weber P, Bader D, Preuß M, Binder EB, Müller-Myhsok B. A beginners guide to SNP calling from high-throughput DNA-sequencing data. Hum Genet 2012; 131:1541-54. [DOI: 10.1007/s00439-012-1213-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Accepted: 07/31/2012] [Indexed: 01/02/2023]
|
253
|
Epstein M, Duncan R, Jiang Y, Conneely K, Allen A, Satten G. A permutation procedure to correct for confounders in case-control studies, including tests of rare variation. Am J Hum Genet 2012; 91:215-23. [PMID: 22818855 DOI: 10.1016/j.ajhg.2012.06.004] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2012] [Revised: 05/03/2012] [Accepted: 06/05/2012] [Indexed: 01/30/2023] Open
Abstract
Many case-control tests of rare variation are implemented in statistical frameworks that make correction for confounders like population stratification difficult. Simple permutation of disease status is unacceptable for resolving this issue because the replicate data sets do not have the same confounding as the original data set. These limitations make it difficult to apply rare-variant tests to samples in which confounding most likely exists, e.g., samples collected from admixed populations. To enable the use of such rare-variant methods in structured samples, as well as to facilitate permutation tests for any situation in which case-control tests require adjustment for confounding covariates, we propose to establish the significance of a rare-variant test via a modified permutation procedure. Our procedure uses Fisher's noncentral hypergeometric distribution to generate permuted data sets with the same structure present in the actual data set such that inference is valid in the presence of confounding factors. We use simulated sequence data based on coalescent models to show that our permutation strategy corrects for confounding due to population stratification that, if ignored, would otherwise inflate the size of a rare-variant test. We further illustrate the approach by using sequence data from the Dallas Heart Study of energy metabolism traits. Researchers can implement our permutation approach by using the R package BiasedUrn.
Collapse
|
254
|
Silversides CK, Lionel AC, Costain G, Merico D, Migita O, Liu B, Yuen T, Rickaby J, Thiruvahindrapuram B, Marshall CR, Scherer SW, Bassett AS. Rare copy number variations in adults with tetralogy of Fallot implicate novel risk gene pathways. PLoS Genet 2012; 8:e1002843. [PMID: 22912587 PMCID: PMC3415418 DOI: 10.1371/journal.pgen.1002843] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 05/29/2012] [Indexed: 12/03/2022] Open
Abstract
Structural genetic changes, especially copy number variants (CNVs), represent a major source of genetic variation contributing to human disease. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease, but to date little is known about the role of CNVs in the etiology of TOF. Using high-resolution genome-wide microarrays and stringent calling methods, we investigated rare CNVs in a prospectively recruited cohort of 433 unrelated adults with TOF and/or pulmonary atresia at a single centre. We excluded those with recognized syndromes, including 22q11.2 deletion syndrome. We identified candidate genes for TOF based on converging evidence between rare CNVs that overlapped the same gene in unrelated individuals and from pathway analyses comparing rare CNVs in TOF cases to those in epidemiologic controls. Even after excluding the 53 (10.7%) subjects with 22q11.2 deletions, we found that adults with TOF had a greater burden of large rare genic CNVs compared to controls (8.82% vs. 4.33%, p = 0.0117). Six loci showed evidence for recurrence in TOF or related congenital heart disease, including typical 1q21.1 duplications in four (1.18%) of 340 Caucasian probands. The rare CNVs implicated novel candidate genes of interest for TOF, including PLXNA2, a gene involved in semaphorin signaling. Independent pathway analyses highlighted developmental processes as potential contributors to the pathogenesis of TOF. These results indicate that individually rare CNVs are collectively significant contributors to the genetic burden of TOF. Further, the data provide new evidence for dosage sensitive genes in PLXNA2-semaphorin signaling and related developmental processes in human cardiovascular development, consistent with previous animal models. Congenital heart disease affects nearly 1% of all live births. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease. This condition is associated with hemizygous deletions of chromosome 22q11.2 and chromosomal trisomies, but little else is known about the genetic heterogeneity of this complex disease. We used high-resolution microarrays and stringent methods to study structural (copy number) variants in a systematically phenotyped cohort of unrelated adults with TOF. We found that individually rare genic copy number variants (CNVs) were collectively significant contributors to the genetic burden in TOF. Among CNVs that implicated candidate genes of interest were loss CNVs overlapping the PLXNA2 gene that codes for plexin A2. This is the first study to show a role for this semaphorin receptor in human congenital heart disease, consistent with a Plxna2 mouse knockout phenotype. Pathway analyses comparing rare exonic loss CNVs in the TOF sample to controls implicated other novel gene sets suggest new pathogenetic mechanisms.
Collapse
Affiliation(s)
- Candice K. Silversides
- Toronto Congenital Cardiac Centre for Adults, Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
- Division of Cardiology, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Anath C. Lionel
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics and the McLaughlin Centre, University of Toronto, Ontario, Canada
| | - Gregory Costain
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Daniele Merico
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ohsuke Migita
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Ben Liu
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Tracy Yuen
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
| | - Jessica Rickaby
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Bhooma Thiruvahindrapuram
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Christian R. Marshall
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics and the McLaughlin Centre, University of Toronto, Ontario, Canada
| | - Stephen W. Scherer
- The Centre for Applied Genomics and Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
- Department of Molecular Genetics and the McLaughlin Centre, University of Toronto, Ontario, Canada
| | - Anne S. Bassett
- Toronto Congenital Cardiac Centre for Adults, Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
- Clinical Genetics Research Program, Centre for Addiction and Mental Health, Toronto, Ontario, Canada
- Department of Psychiatry, University of Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
255
|
Xu C, Ladouceur M, Dastani Z, Richards JB, Ciampi A, Greenwood CMT. Multiple regression methods show great potential for rare variant association tests. PLoS One 2012; 7:e41694. [PMID: 22916111 PMCID: PMC3420665 DOI: 10.1371/journal.pone.0041694] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Accepted: 06/25/2012] [Indexed: 01/08/2023] Open
Abstract
The investigation of associations between rare genetic variants and diseases or phenotypes has two goals. Firstly, the identification of which genes or genomic regions are associated, and secondly, discrimination of associated variants from background noise within each region. Over the last few years, many new methods have been developed which associate genomic regions with phenotypes. However, classical methods for high-dimensional data have received little attention. Here we investigate whether several classical statistical methods for high-dimensional data: ridge regression (RR), principal components regression (PCR), partial least squares regression (PLS), a sparse version of PLS (SPLS), and the LASSO are able to detect associations with rare genetic variants. These approaches have been extensively used in statistics to identify the true associations in data sets containing many predictor variables. Using genetic variants identified in three genes that were Sanger sequenced in 1998 individuals, we simulated continuous phenotypes under several different models, and we show that these feature selection and feature extraction methods can substantially outperform several popular methods for rare variant analysis. Furthermore, these approaches can identify which variants are contributing most to the model fit, and therefore both goals of rare variant analysis can be achieved simultaneously with the use of regression regularization methods. These methods are briefly illustrated with an analysis of adiponectin levels and variants in the ADIPOQ gene.
Collapse
Affiliation(s)
- ChangJiang Xu
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | | | | | | | | | | |
Collapse
|
256
|
Cheung YH, Wang G, Leal SM, Wang S. A fast and noise-resilient approach to detect rare-variant associations with deep sequencing data for complex disorders. Genet Epidemiol 2012; 36:675-85. [PMID: 22865616 DOI: 10.1002/gepi.21662] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/14/2012] [Indexed: 11/11/2022]
Abstract
Next generation sequencing technology has enabled the paradigm shift in genetic association studies from the common disease/common variant to common disease/rare-variant hypothesis. Analyzing individual rare variants is known to be underpowered; therefore association methods have been developed that aggregate variants across a genetic region, which for exome sequencing is usually a gene. The foreseeable widespread use of whole genome sequencing poses new challenges in statistical analysis. It calls for new rare-variant association methods that are statistically powerful, robust against high levels of noise due to inclusion of noncausal variants, and yet computationally efficient. We propose a simple and powerful statistic that combines the disease-associated P-values of individual variants using a weight that is the inverse of the expected standard deviation of the allele frequencies under the null. This approach, dubbed as Sigma-P method, is extremely robust to the inclusion of a high proportion of noncausal variants and is also powerful when both detrimental and protective variants are present within a genetic region. The performance of the Sigma-P method was tested using simulated data based on realistic population demographic and disease models and its power was compared to several previously published methods. The results demonstrate that this method generally outperforms other rare-variant association methods over a wide range of models. Additionally, sequence data on the ANGPTL family of genes from the Dallas Heart Study were tested for associations with nine metabolic traits and both known and novel putative associations were uncovered using the Sigma-P method.
Collapse
Affiliation(s)
- Yee Him Cheung
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York 10032, USA
| | | | | | | |
Collapse
|
257
|
Nilsson D, Andiappan AK, Halldén C, De Yun W, Säll T, Tim CF, Cardell LO. Toll-like receptor gene polymorphisms are associated with allergic rhinitis: a case control study. BMC MEDICAL GENETICS 2012; 13:66. [PMID: 22857391 PMCID: PMC3459792 DOI: 10.1186/1471-2350-13-66] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/30/2012] [Accepted: 07/25/2012] [Indexed: 12/26/2022]
Abstract
Background The Toll-like receptor proteins are important in host defense and initiation of the innate and adaptive immune responses. A number of studies have identified associations between genetic variation in the Toll-like receptor genes and allergic disorders such as asthma and allergic rhinitis. The present study aim to search for genetic variation associated with allergic rhinitis in the Toll-like receptor genes. Methods A first association analysis genotyped 73 SNPs in 182 cases and 378 controls from a Swedish population. Based on these results an additional 24 SNPs were analyzed in one Swedish population with 352 cases and 709 controls and one Chinese population with 948 cases and 580 controls. Results The first association analysis identified 4 allergic rhinitis-associated SNPs in the TLR7-TLR8 gene region. Subsequent analysis of 24 SNPs from this region identified 7 and 5 significant SNPs from the Swedish and Chinese populations, respectively. The corresponding risk-associated haplotypes are significant after Bonferroni correction and are the most common haplotypes in both populations. The associations are primarily detected in females in the Swedish population, whereas it is seen in males in the Chinese population. Further independent support for the involvement of this region in allergic rhinitis was obtained from quantitative skin prick test data generated in both populations. Conclusions Haplotypes in the TLR7-TLR8 gene region were associated with allergic rhinitis in one Swedish and one Chinese population. Since this region has earlier been associated with asthma and allergic rhinitis in a Danish linkage study this speaks strongly in favour of this region being truly involved in the development of this disease.
Collapse
Affiliation(s)
- Daniel Nilsson
- Division of ENT Diseases, Department of Clinical Sciences, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden.
| | | | | | | | | | | | | |
Collapse
|
258
|
Foo JN, Liu JJ, Tan EK. Whole-genome and whole-exome sequencing in neurological diseases. Nat Rev Neurol 2012; 8:508-17. [PMID: 22847385 DOI: 10.1038/nrneurol.2012.148] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genetic risk factors that underlie many rare and common neurological disorders remain poorly understood because of the multifactorial and heterogeneous nature of these complex traits. With the decreasing cost of massively parallel sequencing technologies, whole-genome and whole-exome sequencing will soon allow the characterization of the full spectrum of sequence and structural variants present in each individual. Methods are being developed to parse the huge amount of genomic data and to sift out which variants are associated with diseases. Numerous challenges are inherent in the identification of rare and common variants that have a role in complex neurological diseases, and tools are being developed to overcome these challenges. Given that genomic data will soon be the main driver towards the goal of personalized medicine, future developments in the production and interpretation of data, as well as in ethics and counselling, will be needed for whole-genome and whole-exome sequencing to be used as informative tools in a clinical setting.
Collapse
Affiliation(s)
- Jia-Nee Foo
- Human Genetics, Genome Institute of Singapore, A*STAR, 60 Biopolis Street, Genome #02-01, Singapore 138672
| | | | | |
Collapse
|
259
|
Hendricks AE, Dupuis J, Gupta M, Logue MW, Lunetta KL. A comparison of gene region simulation methods. PLoS One 2012; 7:e40925. [PMID: 22815869 PMCID: PMC3399793 DOI: 10.1371/journal.pone.0040925] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2012] [Accepted: 06/15/2012] [Indexed: 11/19/2022] Open
Abstract
Background Accurately modeling LD in simulations is essential to correctly evaluate new and existing association methods. At present, there has been minimal research comparing the quality of existing gene region simulation methods to produce LD structures similar to an existing gene region. Here we compare the ability of three approaches to accurately simulate the LD within a gene region: HapSim (2005), Hapgen (2009), and a minor extension to simple haplotype resampling. Methodology/Principal Findings In order to observe the variation and bias for each method, we compare the simulated pairwise LD measures and minor allele frequencies to the original HapMap data in an extensive simulation study. When possible, we also evaluate the effects of changing parameters. HapSim produces samples of haplotypes with lower LD, on average, compared to the original haplotype set while both our resampling method and Hapgen do not introduce this bias. The variation introduced across the replicates by our resampling method is quite small and may not provide enough sampling variability to make a generalizable simulation study. Conclusion We recommend using Hapgen to simulate replicate haplotypes from a gene region. Hapgen produces moderate sampling variation between the replicates while retaining the overall unique LD structure of the gene region.
Collapse
Affiliation(s)
- Audrey E Hendricks
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America.
| | | | | | | | | |
Collapse
|
260
|
Smoothed functional principal component analysis for testing association of the entire allelic spectrum of genetic variation. Eur J Hum Genet 2012; 21:217-24. [PMID: 22781089 DOI: 10.1038/ejhg.2012.141] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Fast and cheaper next-generation sequencing technologies will generate unprecedentedly massive and highly dimensional genetic variation data that allow nearly complete evaluation of genetic variation including both common and rare variants. There are two types of association tests: variant-by-variant test and group test. The variant-by-variant test is designed to test the association of common variants, while the group test is suitable to collectively test the association of multiple rare variants. We propose here a smoothed functional principal component analysis (SFPCA) statistic as a general approach for testing association of the entire allelic spectrum of genetic variation (both common and rare variants), which utilizes the merits of both variant-by-variant analysis and group tests. By intensive simulations, we demonstrate that the SFPCA statistic has the correct type 1 error rates and much higher power than the existing methods to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants and (4) variants with opposite directions of effects. To further evaluate its performance, the SFPCA statistic is applied to ANGPTL4 sequence and six continuous phenotypes data from the Dallas Heart Study as an example for testing association of rare variants and a GWAS of schizophrenia data as an example for testing association of common variants. The results show that the SFPCA statistic has much smaller P-values than many existing statistics in both real data analysis examples.
Collapse
|
261
|
Integration of biological networks and pathways with genetic association studies. Hum Genet 2012; 131:1677-86. [PMID: 22777728 DOI: 10.1007/s00439-012-1198-7] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 06/27/2012] [Indexed: 12/13/2022]
Abstract
Millions of genetic variants have been assessed for their effects on the trait of interest in genome-wide association studies (GWAS). The complex traits are affected by a set of inter-related genes. However, the typical GWAS only examine the association of a single genetic variant at a time. The individual effects of a complex trait are usually small, and the simple sum of these individual effects may not reflect the holistic effect of the genetic system. High-throughput methods enable genomic studies to produce a large amount of data to expand the knowledge base of the biological systems. Biological networks and pathways are built to represent the functional or physical connectivity among genes. Integrated with GWAS data, the network- and pathway-based methods complement the approach of single genetic variant analysis, and may improve the power to identify trait-associated genes. Taking advantage of the biological knowledge, these approaches are valuable to interpret the functional role of the genetic variants, and to further understand the molecular mechanism influencing the traits. The network- and pathway-based methods have demonstrated their utilities, and will be increasingly important to address a number of challenges facing the mainstream GWAS.
Collapse
|
262
|
Chang D, Keinan A. Predicting signatures of "synthetic associations" and "natural associations" from empirical patterns of human genetic variation. PLoS Comput Biol 2012; 8:e1002600. [PMID: 22792059 PMCID: PMC3390358 DOI: 10.1371/journal.pcbi.1002600] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Accepted: 05/23/2012] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have in recent years discovered thousands of associated markers for hundreds of phenotypes. However, associated loci often only explain a relatively small fraction of heritability and the link between association and causality has yet to be uncovered for most loci. Rare causal variants have been suggested as one scenario that may partially explain these shortcomings. Specifically, Dickson et al. recently reported simulations of rare causal variants that lead to association signals of common, tag single nucleotide polymorphisms, dubbed "synthetic associations". However, an open question is what practical implications synthetic associations have for GWAS. Here, we explore the signatures exhibited by such "synthetic associations" and their implications based on patterns of genetic variation observed in human populations, thus accounting for human evolutionary history -a force disregarded in previous simulation studies. This is made possible by human population genetic data from HapMap 3 consisting of both resequencing and array-based genotyping data for the same set of individuals from multiple populations. We report that synthetic associations tend to be further away from the underlying risk alleles compared to "natural associations" (i.e. associations due to underlying common causal variants), but to a much lesser extent than previously predicted, with both the age and the effect size of the risk allele playing a part in this phenomenon. We find that while a synthetic association has a lower probability of capturing causal variants within its linkage disequilibrium block, sequencing around the associated variant need not extend substantially to have a high probability of capturing at least one causal variant. We also show that the minor allele frequency of synthetic associations is lower than of natural associations for most, but not all, loci that we explored. Finally, we find the variance in associated allele frequency to be a potential indicator of synthetic associations.
Collapse
Affiliation(s)
- Diana Chang
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Program in Computational Biology and Medicine, Cornell University, Ithaca, New York, United States of America
| | - Alon Keinan
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- * E-mail:
| |
Collapse
|
263
|
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012; 5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.
Collapse
|
264
|
Xiong M. Genetic Studies of Complex Diseases in the Sequence Era. JOURNAL OF GENETIC DISORDERS & DISEASE INFORMATION 2012; 1:e102. [PMID: 27441202 PMCID: PMC4948187 DOI: 10.4172/2327-5790.1000e102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Affiliation(s)
- Momiao Xiong
- Division of Biostatistics, the University of Texas School of Public Health, Houston, TX 77030
| |
Collapse
|
265
|
Wang K, Fingert JH. Statistical tests for detecting rare variants using variance-stabilising transformations. Ann Hum Genet 2012; 76:402-9. [PMID: 22724536 DOI: 10.1111/j.1469-1809.2012.00718.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Next generation sequencing holds great promise for detecting rare variants underlying complex human traits. Due to their extremely low allele frequencies, the normality approximation for a proportion no longer works well. The Fisher's exact method appears to be suitable but it is conservative. We investigate the utility of various variance-stabilising transformations in single marker association analysis on rare variants. Unlike a proportion itself, the variance of the transformed proportions no longer depends on the proportion, making application of such transformations to rare variant association analysis extremely appealing. Simulation studies demonstrate that tests based on such transformations are more powerful than the Fisher's exact test while controlling for type I error rate. Based on theoretical considerations and results from simulation studies, we recommend the test based on the Anscombe transformation over tests with other transformations.
Collapse
Affiliation(s)
- Kai Wang
- Department of Biostatistics, College of Public Health, The University of Iowa, Iowa City, IA 52242, USA.
| | | |
Collapse
|
266
|
Sha Q, Wang X, Wang X, Zhang S. Detecting association of rare and common variants by testing an optimally weighted combination of variants. Genet Epidemiol 2012; 36:561-71. [PMID: 22714994 DOI: 10.1002/gepi.21649] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2012] [Revised: 04/13/2012] [Accepted: 05/09/2012] [Indexed: 11/07/2022]
Abstract
Next-generation sequencing technology will soon allow sequencing the whole genome of large groups of individuals, and thus will make directly testing rare variants possible. Currently, most of existing methods for rare variant association studies are essentially testing the effect of a weighted combination of variants with different weighting schemes. Performance of these methods depends on the weights being used and no optimal weights are available. By putting large weights on rare variants and small weights on common variants, these methods target at rare variants only, although increasing evidence shows that complex diseases are caused by both common and rare variants. In this paper, we analytically derive optimal weights under a certain criterion. Based on the optimal weights, we propose a Variable Weight Test for testing the effect of an Optimally Weighted combination of variants (VW-TOW). VW-TOW aims to test the effects of both rare and common variants. VW-TOW is applicable to both quantitative and qualitative traits, allows covariates, can control for population stratification, and is robust to directions of effects of causal variants. Extensive simulation studies and application to the Genetic Analysis Workshop 17 (GAW17) data show that VW-TOW is more powerful than existing ones either for testing effects of both rare and common variants or for testing effects of rare variants only.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan 49931, USA
| | | | | | | |
Collapse
|
267
|
Lipman PJ, Yip WK, AlChawa T, Ludwig KU, Mangold E, Lange C. On the analysis of sequence data: testing for disease susceptibility loci using patterns of linkage disequilibrium. Genet Epidemiol 2012; 35:880-6. [PMID: 22125225 DOI: 10.1002/gepi.20638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite the numerous and successful applications of genome-wide association studies (GWASs), there has been a lot of difficulty in discovering disease susceptibility loci (DSLs). This is due to the fact that the GWAS approach is an indirect mapping technique, often identifying markers. For the identification of DSLs, which is required for the understanding of the genetic pathways for complex diseases, sequencing data that examines every genetic locus directly is necessary. Yet, there is currently a lack of methodology targeted at the identification of the DSLs in sequencing data: existing methods localize the causal variant to a region but not to a single variant, and therefore do not allow one to identify unique loci that cause the phenotype association. Here, we have developed such a method to determine if there is evidence that an individual loci affects case/control status with sequencing data. This methodology differs from other rare variant approaches: rather than testing an entire region comprised of many loci for association with the phenotype, we can identify the individual genetic locus that causes the association between the phenotype and the genetic region. For each variant, the test determines if the pattern of linkage disequilibrium (LD) across the other variants coincides with the pattern expected if that variant were a DSL. Power simulations show that the method successfully detects the causal variant, distinguishing it from other nearby variants (in high LD with the causal variant), and outperforms the standard tests. The efficiency of the method is especially apparent with small samples, which are currently realistic for studies due to sequencing data costs. The practical relevance of the approach is illustrated by an application to a sequencing dataset for nonsyndromic cleft lip with or without cleft palate. The proposed method implicated one variant (P = 0.002, 0.062 after Bonferroni correction), which was not found by standard analyses. Code for implementation is available.
Collapse
Affiliation(s)
- Peter J Lipman
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA.
| | | | | | | | | | | |
Collapse
|
268
|
Guo W, Shugart YY. Detecting rare variants for quantitative traits using nuclear families. Hum Hered 2012; 73:148-58. [PMID: 22699804 DOI: 10.1159/000338439] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2011] [Accepted: 03/30/2012] [Indexed: 01/17/2023] Open
Abstract
With the advent of sequencing technology opening up a new era of personal genome sequencing, huge amounts of rare variant data have suddenly become available to researchers seeking genetic variants related to human complex disorders. There is an urgent need for the development of novel statistical methods to analyze rare variants in a statistically powerful manner. While a number of statistical tests have already been developed to analyze collapsed rare variants identified by association tests in case-control studies, to date, only two FBAT tests-for-rare (described in the updated FBAT version v2.0.4) have applied collapsing methods analogously in family-based designs. For further research in this area, this study aims to introduce three new beta-determined weight tests for detecting rare variants for quantitative traits in nuclear families. In addition to evaluating the performance of these new methods, it also evaluates that of the two FBAT tests-for-rare, using extensive simulations of situations with and without linkage disequilibrium. Results from these simulations suggest that the four tests using beta-determined weights outperform the two collapsing methods used in FBAT (-v0 and -v1). In addition, both the linear combination method (detailed in the FBAT menu v2.0.4) and the multiple regression method (mixing LASSO and Ridge penalties) performed better than the other two beta-determined weight tests we proposed. Following testing and evaluation, we submitted four new beta-determined weight methods of statistical analysis in a computer program to the Comprehensive R Archive Network (CRAN) for general use.
Collapse
Affiliation(s)
- Wei Guo
- Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, Bethesda, MD 20892, USA
| | | |
Collapse
|
269
|
Abstract
Sudden cardiac death (SCD), a sudden pulseless condition due to cardiac arrhythmia, remains a major public health problem despite recent progress in the treatment and prevention of overall coronary heart disease. In this review, we examine the evidence for genetic susceptibility to SCD in order to provide biological insight into the pathogenesis of this devastating disease and to explore the potential for genetics to impact clinical management of SCD risk. Both candidate gene approaches and unbiased genome-wide scans have identified novel biological pathways contributing to SCD risk. Although risk stratification in the general population remains an elusive goal, several studies point to the potential utility of these common genetic variants in high-risk individuals. Finally, we highlight novel methodological approaches to deciphering the molecular mechanisms involved in arrhythmogenesis. Although further epidemiological and clinical applications research is needed, it is increasingly clear that genetic approaches are yielding important insights into SCD that may impact the public health burden imposed by SCD and its associated outcomes.
Collapse
Affiliation(s)
- Dan E Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21209, USA.
| | | |
Collapse
|
270
|
Zhu Y, Xiong M. Family-based association studies for next-generation sequencing. Am J Hum Genet 2012; 90:1028-45. [PMID: 22682329 DOI: 10.1016/j.ajhg.2012.04.022] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2011] [Revised: 04/19/2012] [Accepted: 04/28/2012] [Indexed: 12/31/2022] Open
Abstract
An individual's disease risk is determined by the compounded action of both common variants, inherited from remote ancestors, that segregated within the population and rare variants, inherited from recent ancestors, that segregated mainly within pedigrees. Next-generation sequencing (NGS) technologies generate high-dimensional data that allow a nearly complete evaluation of genetic variation. Despite their promise, NGS technologies also suffer from remarkable limitations: high error rates, enrichment of rare variants, and a large proportion of missing values, as well as the fact that most current analytical methods are designed for population-based association studies. To meet the analytical challenges raised by NGS, we propose a general framework for sequence-based association studies that can use various types of family and unrelated-individual data sampled from any population structure and a universal procedure that can transform any population-based association test statistic for use in family-based association tests. We develop family-based functional principal-component analysis (FPCA) with or without smoothing, a generalized T(2), combined multivariate and collapsing (CMC) method, and single-marker association test statistics. Through intensive simulations, we demonstrate that the family-based smoothed FPCA (SFPCA) has the correct type I error rates and much more power to detect association of (1) common variants, (2) rare variants, (3) both common and rare variants, and (4) variants with opposite directions of effect from other population-based or family-based association analysis methods. The proposed statistics are applied to two data sets with pedigree structures. The results show that the smoothed FPCA has a much smaller p value than other statistics.
Collapse
|
271
|
Fang S, Sha Q, Zhang S. Two adaptive weighting methods to test for rare variant associations in family-based designs. Genet Epidemiol 2012; 36:499-507. [PMID: 22674630 DOI: 10.1002/gepi.21646] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Revised: 04/26/2012] [Accepted: 04/26/2012] [Indexed: 11/06/2022]
Abstract
Although next-generation DNA sequencing technologies have made rare variant association studies feasible and affordable, the development of powerful statistical methods for rare variant association studies is still under way. Most of the existing methods for rare variant association studies compare the number of rare mutations in a group of rare variants (in a gene or a pathway) between cases and controls. However, these methods assume that all causal variants are risk to diseases. Recently, several methods that are robust to the direction and magnitude of effects of causal variants have been proposed. However, they are applicable to unrelated individuals only, whereas family data have been shown to improve power to detect rare variants. In this article, we propose two adaptive weighting methods for rare variant association studies based on family data for quantitative traits. Using extensive simulation studies, we evaluate and compare our proposed methods with two methods based on the weights proposed by Madsen and Browning. Our results show that both proposed methods are robust to population stratification, robust to the direction and magnitude of the effects of causal variants, and more powerful than the methods using weights suggested by Madsen and Browning, especially when both risk and protective variants are present.
Collapse
Affiliation(s)
- Shurong Fang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan 49931, USA
| | | | | |
Collapse
|
272
|
Abstract
Imputation of genome-wide single-nucleotide polymorphism (SNP) arrays to a larger known reference panel of SNPs has become a standard and an essential part of genome-wide association studies. However, little is known about the behavior of imputation in African Americans with respect to the different imputation algorithms, the reference population(s) and the reference SNP panels used. Genome-wide SNP data (Affymetrix 6.0) from 3207 African American samples in the Atherosclerosis Risk in Communities Study (ARIC) was used to systematically evaluate imputation quality and yield. Imputation was performed with the imputation algorithms MACH, IMPUTE and BEAGLE using several combinations of three reference panels of HapMap III (ASW, YRI and CEU) and 1000 Genomes Project (pilot 1 YRI June 2010 release, EUR and AFR August 2010 and June 2011 releases) panels with SNP data on chromosomes 18, 20 and 22. About 10% of the directly genotyped SNPs from each chromosome were masked, and SNPs common between the reference panels were used for evaluating the imputation quality using two statistical metrics-concordance accuracy and Cohen's kappa (κ) coefficient. The dependencies of these metrics on the minor allele frequencies (MAF) and specific genotype categories (minor allele homozygotes, heterozygotes and major allele homozygotes) were thoroughly investigated to determine the best panel and method for imputation in African Americans. In addition, the power to detect imputed SNPs associated with simulated phenotypes was studied using the mean genotype of each masked SNP in the imputed data. Our results indicate that the genotype concordances after stratification into each genotype category and Cohen's κ coefficient are considerably better equipped to differentiate imputation performance compared with the traditionally used total concordance statistic, and both statistics improved with increasing MAF irrespective of the imputation method. We also find that both MACH and IMPUTE performed equally well and consistently better than BEAGLE irrespective of the reference panel used. Of the various combinations of reference panels, for both HapMap III and 1000 Genomes Project reference panels, the multi-ethnic panels had better imputation accuracy than those containing only single ethnic samples. The most recent 1000 Genomes Project release June 2011 had substantially higher number of imputed SNPs than HapMap III and performed as well or better than the best combined HapMap III reference panels and previous releases of the 1000 Genomes Project.
Collapse
|
273
|
Luo L, Zhu Y, Xiong M. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data. J Comput Biol 2012; 19:731-44. [PMID: 22651812 DOI: 10.1089/cmb.2012.0035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.
Collapse
Affiliation(s)
- Li Luo
- Human Genetics Center, School of Public Health, University of Texas, Houston, TX, USA
| | | | | |
Collapse
|
274
|
|
275
|
Abstract
Many common human diseases are complex and are expected to be highly heterogeneous, with multiple causative loci and multiple rare and common variants at some of the causative loci contributing to the risk of these diseases. Data from the genome-wide association studies (GWAS) and metadata such as known gene functions and pathways provide the possibility of identifying genetic variants, genes and pathways that are associated with complex phenotypes. Single-marker-based tests have been very successful in identifying thousands of genetic variants for hundreds of complex phenotypes. However, these variants only explain very small percentages of the heritabilities. To account for the locus- and allelic-heterogeneity, gene-based and pathway-based tests can be very useful in the next stage of the analysis of GWAS data. U-statistics, which summarize the genomic similarity between pair of individuals and link the genomic similarity to phenotype similarity, have proved to be very useful for testing the associations between a set of single nucleotide polymorphisms and the phenotypes. Compared to single marker analysis, the advantages afforded by the U-statistics-based methods is large when the number of markers involved is large. We review several formulations of U-statistics in genetic association studies and point out the links of these statistics with other similarity-based tests of genetic association. Finally, potential application of U-statistics in analysis of the next-generation sequencing data and rare variants association studies are discussed.
Collapse
Affiliation(s)
- Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
276
|
Qiao D, Yip WK, Lange C. Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data. BMC Bioinformatics 2012; 13:100. [PMID: 22591016 PMCID: PMC3434015 DOI: 10.1186/1471-2105-13-100] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2011] [Accepted: 05/16/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed. RESULTS Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs. CONCLUSIONS The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary.
Collapse
Affiliation(s)
- Dandi Qiao
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, USA.
| | | | | |
Collapse
|
277
|
Naidoo N, Pawitan Y, Soong R, Cooper DN, Ku CS. Human genetics and genomics a decade after the release of the draft sequence of the human genome. Hum Genomics 2012; 5:577-622. [PMID: 22155605 PMCID: PMC3525251 DOI: 10.1186/1479-7364-5-6-577] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Collapse
Affiliation(s)
- Nasheen Naidoo
- Centre for Molecular Epidemiology, Department of Epidemiology and Public Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | | | | | | | | |
Collapse
|
278
|
Ameur A, Enroth S, Johansson Å, Zaboli G, Igl W, Johansson A, Rivas M, Daly M, Schmitz G, Hicks A, Meitinger T, Feuk L, van Duijn C, Oostra B, Pramstaller P, Rudan I, Wright A, Wilson J, Campbell H, Gyllensten U. Genetic adaptation of fatty-acid metabolism: a human-specific haplotype increasing the biosynthesis of long-chain omega-3 and omega-6 fatty acids. Am J Hum Genet 2012; 90:809-20. [PMID: 22503634 DOI: 10.1016/j.ajhg.2012.03.014] [Citation(s) in RCA: 164] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2011] [Revised: 02/03/2012] [Accepted: 03/15/2012] [Indexed: 10/28/2022] Open
Abstract
Omega-3 and omega-6 long-chain polyunsaturated fatty acids (LC-PUFAs) are essential for the development and function of the human brain. They can be obtained directly from food, e.g., fish, or synthesized from precursor molecules found in vegetable oils. To determine the importance of genetic variability to fatty-acid biosynthesis, we studied FADS1 and FADS2, which encode rate-limiting enzymes for fatty-acid conversion. We performed genome-wide genotyping (n = 5,652 individuals) and targeted resequencing (n = 960 individuals) of the FADS region in five European population cohorts. We also analyzed available genomic data from human populations, archaic hominins, and more distant primates. Our results show that present-day humans have two common FADS haplotypes-defined by 28 closely linked SNPs across 38.9 kb-that differ dramatically in their ability to generate LC-PUFAs. No independent effects on FADS activity were seen for rare SNPs detected by targeted resequencing. The more efficient, evolutionarily derived haplotype appeared after the lineage split leading to modern humans and Neanderthals and shows evidence of positive selection. This human-specific haplotype increases the efficiency of synthesizing essential long-chain fatty acids from precursors and thereby might have provided an advantage in environments with limited access to dietary LC-PUFAs. In the modern world, this haplotype has been associated with lifestyle-related diseases, such as coronary artery disease.
Collapse
|
279
|
Coppola G, Chinnathambi S, Lee JJ, Dombroski BA, Baker MC, Soto-Ortolaza AI, Lee SE, Klein E, Huang AY, Sears R, Lane JR, Karydas AM, Kenet RO, Biernat J, Wang LS, Cotman CW, Decarli CS, Levey AI, Ringman JM, Mendez MF, Chui HC, Le Ber I, Brice A, Lupton MK, Preza E, Lovestone S, Powell J, Graff-Radford N, Petersen RC, Boeve BF, Lippa CF, Bigio EH, Mackenzie I, Finger E, Kertesz A, Caselli RJ, Gearing M, Juncos JL, Ghetti B, Spina S, Bordelon YM, Tourtellotte WW, Frosch MP, Vonsattel JPG, Zarow C, Beach TG, Albin RL, Lieberman AP, Lee VM, Trojanowski JQ, Van Deerlin VM, Bird TD, Galasko DR, Masliah E, White CL, Troncoso JC, Hannequin D, Boxer AL, Geschwind MD, Kumar S, Mandelkow EM, Wszolek ZK, Uitti RJ, Dickson DW, Haines JL, Mayeux R, Pericak-Vance MA, Farrer LA, Ross OA, Rademakers R, Schellenberg GD, Miller BL, Mandelkow E, Geschwind DH. Evidence for a role of the rare p.A152T variant in MAPT in increasing the risk for FTD-spectrum and Alzheimer's diseases. Hum Mol Genet 2012; 21:3500-12. [PMID: 22556362 DOI: 10.1093/hmg/dds161] [Citation(s) in RCA: 186] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Rare mutations in the gene encoding for tau (MAPT, microtubule-associated protein tau) cause frontotemporal dementia-spectrum (FTD-s) disorders, including FTD, progressive supranuclear palsy (PSP) and corticobasal syndrome, and a common extended haplotype spanning across the MAPT locus is associated with increased risk of PSP and Parkinson's disease. We identified a rare tau variant (p.A152T) in a patient with a clinical diagnosis of PSP and assessed its frequency in multiple independent series of patients with neurodegenerative conditions and controls, in a total of 15 369 subjects. Tau p.A152T significantly increases the risk for both FTD-s (n = 2139, OR = 3.0, CI: 1.6-5.6, P = 0.0005) and Alzheimer's disease (AD) (n = 3345, OR = 2.3, CI: 1.3-4.2, P = 0.004) compared with 9047 controls. Functionally, p.A152T (i) decreases the binding of tau to microtubules and therefore promotes microtubule assembly less efficiently; and (ii) reduces the tendency to form abnormal fibers. However, there is a pronounced increase in the formation of tau oligomers. Importantly, these findings suggest that other regions of the tau protein may be crucial in regulating normal function, as the p.A152 residue is distal to the domains considered responsible for microtubule interactions or aggregation. These data provide both the first genetic evidence and functional studies supporting the role of MAPT p.A152T as a rare risk factor for both FTD-s and AD and the concept that rare variants can increase the risk for relatively common, complex neurodegenerative diseases, but since no clear significance threshold for rare genetic variation has been established, some caution is warranted until the findings are further replicated.
Collapse
Affiliation(s)
- Giovanni Coppola
- Department of Neurology, University of California, Los Angeles, CA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
280
|
Service SK, Verweij KJH, Lahti J, Congdon E, Ekelund J, Hintsanen M, Räikkönen K, Lehtimäki T, Kähönen M, Widen E, Taanila A, Veijola J, Heath AC, Madden PAF, Montgomery GW, Sabatti C, Järvelin MR, Palotie A, Raitakari O, Viikari J, Martin NG, Eriksson JG, Keltikangas-Järvinen L, Wray NR, Freimer NB. A genome-wide meta-analysis of association studies of Cloninger's Temperament Scales. Transl Psychiatry 2012; 2:e116. [PMID: 22832960 PMCID: PMC3365256 DOI: 10.1038/tp.2012.37] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Temperament has a strongly heritable component, yet multiple independent genome-wide studies have failed to identify significant genetic associations. We have assembled the largest sample to date of persons with genome-wide genotype data, who have been assessed with Cloninger's Temperament and Character Inventory. Sum scores for novelty seeking, harm avoidance, reward dependence and persistence have been measured in over 11,000 persons collected in four different cohorts. Our study had >80% power to identify genome-wide significant loci (P<1.25 × 10(-8), with correction for testing four scales) accounting for ≥0.4% of the phenotypic variance in temperament scales. Using meta-analysis techniques, gene-based tests and pathway analysis we have tested over 1.2 million single-nucleotide polymorphisms (SNPs) for association to each of the four temperament dimensions. We did not discover any SNPs, genes, or pathways to be significantly related to the four temperament dimensions, after correcting for multiple testing. Less than 1% of the variability in any temperament dimension appears to be accounted for by a risk score derived from the SNPs showing strongest association to the temperament dimensions. Elucidation of genetic loci significantly influencing temperament and personality will require potentially very large samples, and/or a more refined phenotype. Item response theory methodology may be a way to incorporate data from cohorts assessed with multiple personality instruments, and might be a method by which a large sample of a more refined phenotype could be acquired.
Collapse
Affiliation(s)
- S K Service
- Center for Neurobehavioral Genetics, University of California, Los Angeles, CA, USA
| | - K J H Verweij
- Genetic Epidemiology, Molecular Epidemiology and Psychiatric Genetics Laboratories, Queensland Institute of Medical Research, Brisbane, QLD, Australia,School of Psychology, University of Queensland, Brisbane, QLD, Australia
| | - J Lahti
- Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland
| | - E Congdon
- Center for Neurobehavioral Genetics, University of California, Los Angeles, CA, USA
| | - J Ekelund
- Department of Psychiatry, University of Helsinki and Finland National Public Health Institute, Helsinki, Finland,Finland Vaasa Hospital District, Vaasa, Finland
| | - M Hintsanen
- Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland,Helsinki Collegium for Advanced Studies, University of Helsinki, Helsinki, Finland
| | - K Räikkönen
- Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland
| | - T Lehtimäki
- Department of Clinical Chemistry, Fimlab Laboratories, Tampere University Hospital, Tampere, Finland,University of Tampere School of Medicine, Tampere, Finland
| | - M Kähönen
- University of Tampere School of Medicine, Tampere, Finland,Department of Clinical Physiology, Tampere University Hospital, Tampere, Finland
| | - E Widen
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - A Taanila
- Institute of Health Sciences, Public Health and General Practice, University of Oulu, Oulu, Finland
| | - J Veijola
- Department of Psychiatry, Institute of Clinical Medicine, University of Oulu, Oulu, Finland
| | - A C Heath
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - P A F Madden
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - G W Montgomery
- Genetic Epidemiology, Molecular Epidemiology and Psychiatric Genetics Laboratories, Queensland Institute of Medical Research, Brisbane, QLD, Australia
| | - C Sabatti
- Department of Health and Research Policy, Stanford University, Stanford, CA, USA,Department of Statistics, Stanford University, Stanford, CA, USA
| | - M-R Järvelin
- Department of Epidemiology and Biostatistics, School of Public Health, MRC-HPA Centre for Environment and Health, Imperial College London, London, UK,Institute of Health Sciences, University of Oulu, Oulu, Finland,Biocenter Oulu, University of Oulu, Oulu, Finland,Department of Lifecourse and Services, National Institute of Health and Welfare, Oulu Finland
| | - A Palotie
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland,Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK,Department of Medical Genetics, University of Helsinki, Helsinki, Finland,University Central Hospital, Helsinki, Finland
| | - O Raitakari
- Department of Clinical Physiology and Nuclear Medicine, Turku University Hospital, Turku, Finland,Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland
| | - J Viikari
- Department of Medicine, Turku University Hospital, Turku, Finland,University of Turku, Turku, Finland
| | - N G Martin
- Genetic Epidemiology, Molecular Epidemiology and Psychiatric Genetics Laboratories, Queensland Institute of Medical Research, Brisbane, QLD, Australia
| | - J G Eriksson
- Finland Vaasa Hospital District, Vaasa, Finland,National Institute for Health and Welfare, Helsinki, Finland,Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland,Helsinki University Central Hospital, Unit of General Practice, Helsinki, Finland,Folkhalsan Research Centre, Helsinki, Finland
| | | | - N R Wray
- Genetic Epidemiology, Molecular Epidemiology and Psychiatric Genetics Laboratories, Queensland Institute of Medical Research, Brisbane, QLD, Australia
| | - N B Freimer
- Center for Neurobehavioral Genetics, University of California, Los Angeles, CA, USA,The Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, CA, USA,Department of Psychiatry, University of California, Los Angeles, Los Angeles, CA, USA,Center for Neurobehavioral Genetics, University of California, Gonda Center Room 3506, 695 Charles E Young Dr South, Box 951761, Los Angeles, CA 90095, USA. E-mail:
| |
Collapse
|
281
|
Kazma R, Bailey JN. Population-based and family-based designs to analyze rare variants in complex diseases. Genet Epidemiol 2012; 35 Suppl 1:S41-7. [PMID: 22128057 DOI: 10.1002/gepi.20648] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genotyping of rare variants on a large scale is now possible using next-generation sequencing. Sample selection is a crucial step in designing the genetic study of a complex disease, and knowledge of the efficiency and limitations of population-based and family-based designs can help researchers make the appropriate choice. The nine contributions to Group 5 of Genetic Analysis Workshop 17 evaluate population-based and family-based designs by comparing the results obtained with various methods applied to the mini-exome simulations. These simulations consisted of 200 replicates composed of unrelated individuals and eight extended pedigrees with genotypes and various phenotypes. The methods tested for association with a population-based and/or a family-based design, tested for linkage with a family-based design, or estimated heritability. We summarize the strengths and weaknesses of both designs. Although population-based designs seem more suitable for detecting the effect of multiple rare variants, family-based designs can potentially enrich the sample in rare variants, for which the effect would be concealed at the population level. However, as of today, the main limitation is still the high cost of next-generation sequencing.
Collapse
Affiliation(s)
- Rémi Kazma
- Department of Epidemiology and Biostatistics and Institute for Human Genetics, University of California, San Francisco, CA 94143-3110, USA.
| | | |
Collapse
|
282
|
Dering C, Hemmelmann C, Pugh E, Ziegler A. Statistical analysis of rare sequence variants: an overview of collapsing methods. Genet Epidemiol 2012; 35 Suppl 1:S12-7. [PMID: 22128052 DOI: 10.1002/gepi.20643] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the advent of novel sequencing technologies, interest in the identification of rare variants that influence common traits has increased rapidly. Standard statistical methods, such as the Cochrane-Armitage trend test or logistic regression, fail in this setting for the analysis of unrelated subjects because of the rareness of the variants. Recently, various alternative approaches have been proposed that circumvent the rareness problem by collapsing rare variants in a defined genetic region or sets of regions. We provide an overview of these collapsing methods for association analysis and discuss the use of permutation approaches for significance testing of the data-adaptive methods.
Collapse
Affiliation(s)
- Carmen Dering
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | | | | | | |
Collapse
|
283
|
Cole JW, Stine OC, Liu X, Pratap A, Cheng Y, Tallon LJ, Sadzewicz LK, Dueker N, Wozniak MA, Stern BJ, Meschia JF, Mitchell BD, Kittner SJ, O'Connell JR. Rare variants in ischemic stroke: an exome pilot study. PLoS One 2012; 7:e35591. [PMID: 22536414 PMCID: PMC3334983 DOI: 10.1371/journal.pone.0035591] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2011] [Accepted: 03/18/2012] [Indexed: 11/18/2022] Open
Abstract
The genetic architecture of ischemic stroke is complex and is likely to include rare or low frequency variants with high penetrance and large effect sizes. Such variants are likely to provide important insights into disease pathogenesis compared to common variants with small effect sizes. Because a significant portion of human functional variation may derive from the protein-coding portion of genes we undertook a pilot study to identify variation across the human exome (i.e., the coding exons across the entire human genome) in 10 ischemic stroke cases. Our efforts focused on evaluating the feasibility and identifying the difficulties in this type of research as it applies to ischemic stroke. The cases included 8 African-Americans and 2 Caucasians selected on the basis of similar stroke subtypes and by implementing a case selection algorithm that emphasized the genetic contribution of stroke risk. Following construction of paired-end sequencing libraries, all predicted human exons in each sample were captured and sequenced. Sequencing generated an average of 25.5 million read pairs (75 bp×2) and 3.8 Gbp per sample. After passing quality filters, screening the exomes against dbSNP demonstrated an average of 2839 novel SNPs among African-Americans and 1105 among Caucasians. In an aggregate analysis, 48 genes were identified to have at least one rare variant across all stroke cases. One gene, CSN3, identified by screening our prior GWAS results in conjunction with our exome results, was found to contain an interesting coding polymorphism as well as containing excess rare variation as compared with the other genes evaluated. In conclusion, while rare coding variants may predispose to the risk of ischemic stroke, this fact has yet to be definitively proven. Our study demonstrates the complexities of such research and highlights that while exome data can be obtained, the optimal analytical methods have yet to be determined.
Collapse
Affiliation(s)
- John W Cole
- Veterans Administration Medical Center, Baltimore, Maryland, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
284
|
Properties and power of the Drosophila Synthetic Population Resource for the routine dissection of complex traits. Genetics 2012; 191:935-49. [PMID: 22505626 DOI: 10.1534/genetics.112.138537] [Citation(s) in RCA: 130] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The Drosophila Synthetic Population Resource (DSPR) is a newly developed multifounder advanced intercross panel consisting of >1600 recombinant inbred lines (RILs) designed for the genetic dissection of complex traits. Here, we describe the inference of the underlying mosaic founder structure for the full set of RILs from a dense set of semicodominant restriction-site-associated DNA (RAD) markers and use simulations to explore how variation in marker density and sequencing coverage affects inference. For a given sequencing effort, marker density is more important than sequence coverage per marker in terms of the amount of genetic information we can infer. We also assessed the power of the DSPR by assigning genotypes at a hidden QTL to each RIL on the basis of the inferred founder state and simulating phenotypes for different experimental designs, different genetic architectures, different sample sizes, and QTL of varying effect sizes. We found the DSPR has both high power (e.g., 84% power to detect a 5% QTL) and high mapping resolution (e.g., ∼1.5 cM for a 5% QTL).
Collapse
|
285
|
Ross S, Anand SS, Joseph P, Paré G. Promises and challenges of pharmacogenetics: an overview of study design, methodological and statistical issues. JRSM Cardiovasc Dis 2012; 1:10.1258_cvd.2012.012001. [PMID: 24175062 PMCID: PMC3738322 DOI: 10.1258/cvd.2012.012001] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Pharmacogenetics is the study of inherited variation in drug response. The goal of pharmacogenetics is to develop novel ways of maximizing drug efficacy and minimizing toxicity for individual patients. Personalized medicine has the potential to allow for a patient's genetic information to predict optimal dosage for a drug with a narrow therapeutic index, to select the most appropriate pharmacological agent for a given patient and to develop cost-effective treatments. Although there is supporting evidence in favour of pharmacogenetics, its adoption in clinical practice has been slow because of sometimes conflicting findings among studies. This failure to replicate findings may result from a lack of high-quality pharmacogenetic studies, as well as unresolved methodological and statistical issues. The objective of this review is to discuss the benefits of incorporating pharmacogenetics into clinical practice. We will also address outstanding methodological and statistical issues that may lead to heterogeneity among reported pharmacogenetic studies and how they may be addressed.
Collapse
Affiliation(s)
- Stephanie Ross
- Population Health Research Institute, Hamilton Health Sciences, McMaster University , Hamilton, Ontario L8L 2X2 , Canada
| | | | | | | |
Collapse
|
286
|
Rubio JP, Topp S, Warren L, St Jean PL, Wegmann D, Kessner D, Novembre J, Shen J, Fraser D, Aponte J, Nangle K, Cardon LR, Ehm MG, Chissoe SL, Whittaker JC, Nelson MR, Mooser VE. Deep sequencing of the LRRK2 gene in 14,002 individuals reveals evidence of purifying selection and independent origin of the p.Arg1628Pro mutation in Europe. Hum Mutat 2012; 33:1087-98. [PMID: 22415848 DOI: 10.1002/humu.22075] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 02/24/2012] [Indexed: 12/12/2022]
Abstract
Genetic variation in LRRK2 predisposes to Parkinson disease (PD), which underpins its development as a therapeutic target. Here, we aimed to identify novel genotype-phenotype associations that might support developing LRRK2 therapies for other conditions. We sequenced the 51 exons of LRRK2 in cases comprising 12 common diseases (n = 9,582), and in 4,420 population controls. We identified 739 single-nucleotide variants, 62% of which were observed in only one person, including 316 novel exonic variants. We found evidence of purifying selection for the LRRK2 gene and a trend suggesting that this is more pronounced in the central (ROC-COR-kinase) core protein domains of LRRK2 than the flanking domains. Population genetic analyses revealed that LRRK2 is not especially polymorphic or differentiated in comparison to 201 other drug target genes. Among Europeans, we identified 17 carriers (0.13%) of pathogenic LRRK2 mutations that were not significantly enriched within any disease or in those reporting a family history of PD. Analysis of pathogenic mutations within Europe reveals that the p.Arg1628Pro (c4883G>C) mutation arose independently in Europe and Asia. Taken together, these findings demonstrate how targeted deep sequencing can help to reveal fundamental characteristics of clinically important loci.
Collapse
Affiliation(s)
- Justin P Rubio
- Quantitative Sciences, Research and Development, GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire, England, United Kingdom.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
287
|
Abstract
PURPOSE OF REVIEW We review the main findings from genome-wide association studies (GWAS) for levels of HDL-cholesterol, LDL-cholesterol and triglycerides, including approaches to identify the functional variant(s) or gene(s). We discuss study design and challenges related to whole genome or exome sequencing to identify novel genes and variants. RECENT FINDINGS GWAS have detected approximately 100 loci associated with one or more lipid trait. Fine mapping of several loci for LDL-cholesterol demonstrated that the trait variance explained may double when the functional variants responsible for the association signals are identified. Experimental follow-up of three loci identified by GWAS has identified functional genes GALNT2, TRIB1, and SORT1, and a functional variant at SORT1. SUMMARY The goal of genetic studies for lipid levels is to improve treatment and ultimately reduce the prevalence of heart disease. Many signals identified by GWAS have modest effect sizes, useful for identifying novel biologically relevant genes, but less useful for personalized medicine. Whole genome or exome sequencing studies may fill this gap by identifying rare variants of larger effect associated with lipid levels and heart disease.
Collapse
Affiliation(s)
- Cristen J Willer
- Division of Cardiovascular Medicine, Departments of Internal Medicine and Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | |
Collapse
|
288
|
Abstract
The individual human genome and epigenome are being defined at unprecedented resolution by current advances in sequencing technologies with important implications for human disease. This review uses examples relevant to clinical practice to illustrate the functional consequences of genetic and epigenetic variation. The insights gained from genome-wide association studies are described together with current efforts to understand the role of rare variants in common disease, set in the context of recent successes in Mendelian traits through the application of whole exome sequencing. The application of functional genomics to interrogate the genome and epigenome, build up an integrated picture of the regulatory genomic landscape and inform disease association studies is discussed, together with the role of expression quantitative trait mapping and analysis of allele-specific gene expression.
Collapse
Affiliation(s)
- J C Knight
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK.
| |
Collapse
|
289
|
Yang HC, Liang YJ, Chen JW, Chiang KM, Chung CM, Ho HY, Ting CT, Lin TH, Sheu SH, Tsai WC, Chen JH, Leu HB, Yin WH, Chiu TY, Chern CI, Lin SJ, Tomlinson B, Guo Y, Sham PC, Cherny SS, Lam TH, Thomas GN, Pan WH. Identification of IGF1, SLC4A4, WWOX, and SFMBT1 as hypertension susceptibility genes in Han Chinese with a genome-wide gene-based association study. PLoS One 2012; 7:e32907. [PMID: 22479346 PMCID: PMC3315540 DOI: 10.1371/journal.pone.0032907] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Accepted: 02/07/2012] [Indexed: 01/11/2023] Open
Abstract
Hypertension is a complex disorder with high prevalence rates all over the world. We conducted the first genome-wide gene-based association scan for hypertension in a Han Chinese population. By analyzing genome-wide single-nucleotide-polymorphism data of 400 matched pairs of young-onset hypertensive patients and normotensive controls genotyped with the Illumina HumanHap550-Duo BeadChip, 100 susceptibility genes for hypertension were identified and also validated with permutation tests. Seventeen of the 100 genes exhibited differential allelic and expression distributions between patient and control groups. These genes provided a good molecular signature for classifying hypertensive patients and normotensive controls. Among the 17 genes, IGF1, SLC4A4, WWOX, and SFMBT1 were not only identified by our gene-based association scan and gene expression analysis but were also replicated by a gene-based association analysis of the Hong Kong Hypertension Study. Moreover, cis-acting expression quantitative trait loci associated with the differentially expressed genes were found and linked to hypertension. IGF1, which encodes insulin-like growth factor 1, is associated with cardiovascular disorders, metabolic syndrome, decreased body weight/size, and changes of insulin levels in mice. SLC4A4, which encodes the electrogenic sodium bicarbonate cotransporter 1, is associated with decreased body weight/size and abnormal ion homeostasis in mice. WWOX, which encodes the WW domain-containing protein, is related to hypoglycemia and hyperphosphatemia. SFMBT1, which encodes the scm-like with four MBT domains protein 1, is a novel hypertension gene. GRB14, TMEM56 and KIAA1797 exhibited highly significant differential allelic and expressed distributions between hypertensive patients and normotensive controls. GRB14 was also found relevant to blood pressure in a previous genetic association study in East Asian populations. TMEM56 and KIAA1797 may be specific to Taiwanese populations, because they were not validated by the two replication studies. Identification of these genes enriches the collection of hypertension susceptibility genes, thereby shedding light on the etiology of hypertension in Han Chinese populations.
Collapse
Affiliation(s)
- Hsin-Chou Yang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Yu-Jen Liang
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Jaw-Wen Chen
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Kuang-Mao Chiang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- School of Public Health, National Medical Defense Center, Taipei, Taiwan
| | - Chia-Min Chung
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Hung-Yun Ho
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chih-Tai Ting
- Cardiovascular Center, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Tsung-Hsien Lin
- Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Sheng-Hsiung Sheu
- Department of Internal Medicine, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| | - Wei-Chuan Tsai
- Department of Internal Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Jyh-Hong Chen
- Department of Internal Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Hsin-Bang Leu
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Wei-Hsian Yin
- Division of Cardiology, Cheng-Hsin Rehabilitation Medical Center, Taipei, Taiwan
| | - Ting-Yu Chiu
- Division of Cardiology, Min-Sheng General Hospital, Taoyuan, Taiwan
| | - Ching-Iuan Chern
- Division of Cardiology, Min-Sheng General Hospital, Taoyuan, Taiwan
| | - Shing-Jong Lin
- National Yang-Ming University School of Medicine and Taipei Veterans General Hospital, Taipei, Taiwan
| | - Brian Tomlinson
- Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Hong Kong, China
| | - Youling Guo
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
| | - Pak C. Sham
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
- The State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong, China
| | - Stacey S. Cherny
- Department of Psychiatry, The University of Hong Kong, Hong Kong, China
- The State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong, China
| | - Tai Hing Lam
- School of Public Health, The University of Hong Kong, Hong Kong, China
| | - G. Neil Thomas
- Public Health, Epidemiology and Biostatistics, School of Health and Population Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Wen-Harn Pan
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
- Division of Preventive Medicine and Health Services Research, Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
- * E-mail:
| |
Collapse
|
290
|
Meulenbelt I. Osteoarthritis year 2011 in review: genetics. Osteoarthritis Cartilage 2012; 20:218-22. [PMID: 22261407 DOI: 10.1016/j.joca.2012.01.007] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2011] [Revised: 12/27/2011] [Accepted: 01/04/2012] [Indexed: 02/02/2023]
Abstract
In the last decades, many researchers aimed to identify causal genetic variants by means of candidate gene analyses, genome wide linkage and association studies to elucidate underlying mechanisms of osteoarthritis (OA). Although several consistent genetic variants were identified the successes are limited. This review has a focus on studies published until mid 2011 and on data presented at the Osteoarthritis Research Society International 2011 (OARSI) in San Diego and that aim to elucidate the primary molecular and cellular events commencing OA onset in humans by applying genetic study designs.
Collapse
Affiliation(s)
- I Meulenbelt
- Department of Molecular Epidemiology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands.
| |
Collapse
|
291
|
Moraes CF, Lins TC, Carmargos EF, Naves JOS, Pereira RW, Nóbrega OT. Lessons from genome-wide association studies findings in Alzheimer's disease. Psychogeriatrics 2012; 12:62-73. [PMID: 22416831 DOI: 10.1111/j.1479-8301.2011.00378.x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Alzheimer's disease (AD) is the most common neurodegenerative disorder with a complex genetic background. Recent genome-wide association studies (GWAS) have placed important new contributors into the genetic framework of early- and late-onset forms of this dementia. Besides confirming the major role of classic allelic variants (e.g. apolipoprotein E) in the development of AD, GWAS have thus far implicated over 20 single nucleotide polymorphisms in AD. In this review, we summarize the findings of 16 AD-based GWAS performed to date whose public registries are available at the National Human Genome Research Institute, with an emphasis on understanding whether the polymorphic markers under consideration support functional implications to the pathophysiological role of the major genetic risk factors unraveled by GWAS.
Collapse
Affiliation(s)
- Clayton F Moraes
- Geriatric Service, Hospital of the Catholic University of Brasília, Graduate Program in Medical Sciences, University of Brasília, Brasília - DF, Brazil
| | | | | | | | | | | |
Collapse
|
292
|
Zhi D, Chen R. Statistical guidance for experimental design and data analysis of mutation detection in rare monogenic mendelian diseases by exome sequencing. PLoS One 2012; 7:e31358. [PMID: 22348076 PMCID: PMC3277495 DOI: 10.1371/journal.pone.0031358] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Accepted: 01/06/2012] [Indexed: 01/19/2023] Open
Abstract
Recently, whole-genome sequencing, especially exome sequencing, has successfully led to the identification of causal mutations for rare monogenic Mendelian diseases. However, it is unclear whether this approach can be generalized and effectively applied to other Mendelian diseases with high locus heterogeneity. Moreover, the current exome sequencing approach has limitations such as false positive and false negative rates of mutation detection due to sequencing errors and other artifacts, but the impact of these limitations on experimental design has not been systematically analyzed. To address these questions, we present a statistical modeling framework to calculate the power, the probability of identifying truly disease-causing genes, under various inheritance models and experimental conditions, providing guidance for both proper experimental design and data analysis. Based on our model, we found that the exome sequencing approach is well-powered for mutation detection in recessive, but not dominant, Mendelian diseases with high locus heterogeneity. A disease gene responsible for as low as 5% of the disease population can be readily identified by sequencing just 200 unrelated patients. Based on these results, for identifying rare Mendelian disease genes, we propose that a viable approach is to combine, sequence, and analyze patients with the same disease together, leveraging the statistical framework presented in this work.
Collapse
Affiliation(s)
- Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Rui Chen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| |
Collapse
|
293
|
van Es MA, Schelhaas HJ, van Vught PWJ, Ticozzi N, Andersen PM, Groen EJN, Schulte C, Blauw HM, Koppers M, Diekstra FP, Fumoto K, LeClerc AL, Keagle P, Bloem BR, Scheffer H, van Nuenen BFL, van Blitterswijk M, van Rheenen W, Wills AM, Lowe PP, Hu GF, Yu W, Kishikawa H, Wu D, Folkerth RD, Mariani C, Goldwurm S, Pezzoli G, Van Damme P, Lemmens R, Dahlberg C, Birve A, Fernández-Santiago R, Waibel S, Klein C, Weber M, van der Kooi AJ, de Visser M, Verbaan D, van Hilten JJ, Heutink P, Hennekam EAM, Cuppen E, Berg D, Brown RH, Silani V, Gasser T, Ludolph AC, Robberecht W, Ophoff RA, Veldink JH, Pasterkamp RJ, de Bakker PIW, Landers JE, van de Warrenburg BP, van den Berg LH. Angiogenin variants in Parkinson disease and amyotrophic lateral sclerosis. Ann Neurol 2012; 70:964-73. [PMID: 22190368 DOI: 10.1002/ana.22611] [Citation(s) in RCA: 147] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
OBJECTIVE Several studies have suggested an increased frequency of variants in the gene encoding angiogenin (ANG) in patients with amyotrophic lateral sclerosis (ALS). Interestingly, a few ALS patients carrying ANG variants also showed signs of Parkinson disease (PD). Furthermore, relatives of ALS patients have an increased risk to develop PD, and the prevalence of concomitant motor neuron disease in PD is higher than expected based on chance occurrence. We therefore investigated whether ANG variants could predispose to both ALS and PD. METHODS We reviewed all previous studies on ANG in ALS and performed sequence experiments on additional samples, which allowed us to analyze data from 6,471 ALS patients and 7,668 controls from 15 centers (13 from Europe and 2 from the USA). We sequenced DNA samples from 3,146 PD patients from 6 centers (5 from Europe and 1 from the USA). Statistical analysis was performed using the variable threshold test, and the Mantel-Haenszel procedure was used to estimate odds ratios. RESULTS Analysis of sequence data from 17,258 individuals demonstrated a significantly higher frequency of ANG variants in both ALS and PD patients compared to control subjects (p = 9.3 × 10(-6) for ALS and p = 4.3 × 10(-5) for PD). The odds ratio for any ANG variant in patients versus controls was 9.2 for ALS and 6.7 for PD. INTERPRETATION The data from this multicenter study demonstrate that there is a strong association between PD, ALS, and ANG variants. ANG is a genetic link between ALS and PD.
Collapse
Affiliation(s)
- Michael A van Es
- Department of Neurology, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
294
|
Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet 2012; 44:243-6. [PMID: 22306651 PMCID: PMC3303124 DOI: 10.1038/ng.1074] [Citation(s) in RCA: 303] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 12/12/2011] [Indexed: 12/15/2022]
Abstract
Well-powered genome-wide association studies, now made possible through advances in technology and large-scale collaborative projects, promise to characterize the contribution of rare variants to complex traits and disease. However, while population structure is a known confounder of association studies, it remains unknown whether methods developed to control stratification are equally effective for rare variants. Here, we demonstrate that rare variants can show a stratification that is systematically different from, and typically stronger than, common variants, and this is not necessarily corrected by existing methods. We show that the same process leads to inflation for load-based tests and can obscure signals at truly associated variants. Furthermore, we show that populations can display spatial structure in rare variants, even when Wright's fixation index F(ST) is low, but that allele frequency-dependent metrics of allele sharing can reveal localized stratification. These results underscore the importance of collecting and integrating spatial information in the genetic analysis of complex traits.
Collapse
Affiliation(s)
- Iain Mathieson
- Wellcome Trust Centre for Human Genetics, University of Oxford, UK.
| | | |
Collapse
|
295
|
Affiliation(s)
- David Evans
- From the Endokrinologie und Stoffwechsel, Medizinische Klinik III, Zentrum für Innere Medizin (D.E.), and Klinik und Poliklinik für Allgemeine und Interventionelle Kardiologie, Universitätsklinikum Hamburg-Eppendorf (P.D.), Martinistrasse 52, 20246 Hamburg, Germany
| | - Patrick Diemert
- From the Endokrinologie und Stoffwechsel, Medizinische Klinik III, Zentrum für Innere Medizin (D.E.), and Klinik und Poliklinik für Allgemeine und Interventionelle Kardiologie, Universitätsklinikum Hamburg-Eppendorf (P.D.), Martinistrasse 52, 20246 Hamburg, Germany
| |
Collapse
|
296
|
Asimit J, Day-Williams A, Zgaga L, Rudan I, Boraska V, Zeggini E. An evaluation of different meta-analysis approaches in the presence of allelic heterogeneity. Eur J Hum Genet 2012; 20:709-12. [PMID: 22293689 PMCID: PMC3355266 DOI: 10.1038/ejhg.2011.274] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Meta-analysis has proven a useful tool in genetic association studies. Allelic heterogeneity can arise from ethnic background differences across populations being meta-analyzed (for example, in search of common frequency variants through genome-wide association studies), and through the presence of multiple low frequency and rare associated variants in the same functional unit of interest (for example, within a gene or a regulatory region). The latter challenge will be increasingly relevant in whole-genome and whole-exome sequencing studies investigating association with complex traits. Here, we evaluate the performance of different approaches to meta-analysis in the presence of allelic heterogeneity. We simulate allelic heterogeneity scenarios in three populations and examine the performance of current approaches to the analysis of these data. We show that current approaches can detect only a small fraction of common frequency causal variants. We also find that for low-frequency variants with large effects (odds ratios 2–3), single-point tests have high power, but also high false-positive rates. P-value based meta-analysis of summary results from allele-matching locus-wide tests outperforms collapsing approaches. We conclude that current strategies for the combination of genetic association data in the presence of allelic heterogeneity are insufficiently powered.
Collapse
Affiliation(s)
- Jennifer Asimit
- Department of Human Genetics, Wellcome Trust Sanger Institute, Hinxton, UK
| | | | | | | | | | | |
Collapse
|
297
|
Genomic research to identify novel pathways in the development of abdominal aortic aneurysm. Cardiol Res Pract 2012; 2012:852829. [PMID: 22400124 PMCID: PMC3286885 DOI: 10.1155/2012/852829] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 10/27/2011] [Indexed: 11/18/2022] Open
Abstract
Abdominal aortic aneurysm (AAA) is a common disease with a large heritable component. There is a need to improve our understanding of AAA pathogenesis in order to develop novel treatment paradigms. Genomewide association studies have revolutionized research into the genetic variants that underpin the development of many complex diseases including AAA. This article reviews the progress that has been made to date in this regard, including mechanisms by which loci identified by GWAS may contribute to the development of AAA. It also highlights potential post-GWAS analytical strategies to improve our understanding of the disease further.
Collapse
|
298
|
Surrogate genetics and metabolic profiling for characterization of human disease alleles. Genetics 2012; 190:1309-23. [PMID: 22267502 DOI: 10.1534/genetics.111.137471] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Cystathionine-β-synthase (CBS) deficiency is a human genetic disease causing homocystinuria, thrombosis, mental retardation, and a suite of other devastating manifestations. Early detection coupled with dietary modification greatly reduces pathology, but the response to treatment differs with the allele of CBS. A better understanding of the relationship between allelic variants and protein function will improve both diagnosis and treatment. To this end, we tested the function of 84 CBS alleles previously sequenced from patients with homocystinuria by ortholog replacement in Saccharomyces cerevisiae. Within this clinically associated set, 15% of variant alleles were indistinguishable from the predominant CBS allele in function, suggesting enzymatic activity was retained. An additional 37% of the alleles were partially functional or could be rescued by cofactor supplementation in the growth medium. This large class included alleles rescued by elevated levels of the cofactor vitamin B6, but also alleles rescued by elevated heme, a second CBS cofactor. Measurement of the metabolite levels in CBS-substituted yeast grown with different B6 levels using LC-MS revealed changes in metabolism that propagated beyond the substrate and product of CBS. Production of the critical antioxidant glutathione through the CBS pathway was greatly decreased when CBS function was restricted through genetic, cofactor, or substrate restriction, a metabolic consequence with implications for treatment.
Collapse
|
299
|
Abstract
Genome-wide association studies have greatly improved our understanding of the genetic basis of disease risk. The fact that they tend not to identify more than a fraction of the specific causal loci has led to divergence of opinion over whether most of the variance is hidden as numerous rare variants of large effect or as common variants of very small effect. Here I review 20 arguments for and against each of these models of the genetic basis of complex traits and conclude that both classes of effect can be readily reconciled.
Collapse
Affiliation(s)
- Greg Gibson
- School of Biology and Center for Integrative Genomics, 770 State Street, Georgia Institute of Technology, Atlanta, Georgia 30332, USA. greg.gibson@biology. gatech.edu
| |
Collapse
|
300
|
Pongpanich M, Neely ML, Tzeng JY. On the Aggregation of Multimarker Information for Marker-Set and Sequencing Data Analysis: Genotype Collapsing vs. Similarity Collapsing. Front Genet 2012; 2:110. [PMID: 22303404 PMCID: PMC3266618 DOI: 10.3389/fgene.2011.00110] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 12/25/2011] [Indexed: 12/12/2022] Open
Abstract
Methods that collapse information across genetic markers when searching for association signals are gaining momentum in the literature. Although originally developed to achieve a better balance between retaining information and controlling degrees of freedom when performing multimarker association analysis, these methods have recently been proven to be a powerful tool for identifying rare variants that contribute to complex phenotypes. The information among markers can be collapsed at the genotype level, which focuses on the mean of genetic information, or the similarity level, which focuses on the variance of genetic information. The aim of this work is to understand the strengths and weaknesses of these two collapsing strategies. Our results show that neither collapsing strategy outperforms the other across all simulated scenarios. Two factors that dominate the performance of these strategies are the signal-to-noise ratio and the underlying genetic architecture of the causal variants. Genotype collapsing is more sensitive to the marker set being contaminated by noise loci than similarity collapsing. In addition, genotype collapsing performs best when the genetic architecture of the causal variants is not complex (e.g., causal loci with similar effects and similar frequencies). Similarity collapsing is more robust as the complexity of the genetic architecture increases and outperforms genotype collapsing when the genetic architecture of the marker set becomes more sophisticated (e.g., causal loci with various effect sizes or frequencies and potential non-linear or interactive effects). Because the underlying genetic architecture is not known a priori, we also considered a two-stage analysis that combines the two top-performing methods from different collapsing strategies. We find that it is reasonably robust across all simulated scenarios.
Collapse
Affiliation(s)
- Monnat Pongpanich
- Bioinformatics Research Center, North Carolina State University Raleigh, NC, USA
| | | | | |
Collapse
|