51
|
Eriksson N, Tung JY, Kiefer AK, Hinds DA, Francke U, Mountain JL, Do CB. Novel associations for hypothyroidism include known autoimmune risk loci. PLoS One 2012; 7:e34442. [PMID: 22493691 PMCID: PMC3321023 DOI: 10.1371/journal.pone.0034442] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 03/05/2012] [Indexed: 02/06/2023] Open
Abstract
Hypothyroidism is the most common thyroid disorder, affecting about 5% of the general population. Here we present the current largest genome-wide association study of hypothyroidism, in 3,736 cases and 35,546 controls. Hypothyroidism was assessed via web-based questionnaires. We identify five genome-wide significant associations, three of which are well known to be involved in a large spectrum of autoimmune diseases: rs6679677 near PTPN22, rs3184504 in SH2B3, and rs2517532 in the HLA class I region (-values , , and , respectively). We also report associations with rs4915077 near VAV3 (-value ) and rs925489 near FOXE1 (-value ). VAV3 is involved in immune function, and FOXE1 and PTPN22 have previously been associated with hypothyroidism. Although the HLA class I region and SH2B3 have previously been linked with a number of autoimmune diseases, this is the first report of their association with thyroid disease. The VAV3 association is also novel. We also show suggestive evidence of association for hypothyroidism with a SNP in the HLA class II region (independent of the other HLA association) as well as SNPs in CAPZB, PDE8B, and CTLA4. CAPZB and PDE8B have been linked to TSH levels and CTLA4 to a variety of autoimmune diseases. These results suggest heterogeneity in the genetic etiology of hypothyroidism, implicating genes involved in both autoimmune disorders and thyroid function. Using a genetic risk profile score based on the top association from each of the five genome-wide significant regions in our study, the relative risk between the highest and lowest deciles of genetic risk is 2.0.
Collapse
Affiliation(s)
- Nicholas Eriksson
- 23andMe, Inc., Mountain View, California, United States of America
- * E-mail:
| | | | | | | | | | | | | |
Collapse
|
52
|
Uricchio LH, Chong JX, Ross KD, Ober C, Nicolae DL. Accurate imputation of rare and common variants in a founder population from a small number of sequenced individuals. Genet Epidemiol 2012; 36:312-9. [PMID: 22460724 DOI: 10.1002/gepi.21623] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Revised: 01/04/2012] [Accepted: 01/09/2012] [Indexed: 11/08/2022]
Abstract
Advances in DNA sequencing technologies have greatly facilitated the discovery of rare genetic variants in the human genome, many of which may contribute to common disease risk. However, evaluating their individual or even collective effects on disease risk requires very large sample sizes, which involves study designs that are often prohibitively expensive. We present an alternative approach for determining genotypes in large numbers of individuals for all variants discovered in the sequence of relatively few individuals. Specifically, we developed a new imputation algorithm that utilizes whole-exome sequencing data from 25 members of the South Dakota Hutterite population, and genome-wide single nucleotide polymorphism (SNP) genotypes from >1,400 individuals from the same founder population. The algorithm relies on identity-by-descent sharing of phased haplotypes, a different strategy than the linkage disequilibrium methods found in most imputation algorithms. We imputed genotypes discovered in the sequence data to on average ∼77% of chromosomes among the 1,400 individuals. Median R(2) between imputed and directly genotyped data was >0.99. As expected, many variants that are vanishingly rare in European populations have risen to larger frequencies in the founder population and would be amenable to single-SNP analyses.
Collapse
Affiliation(s)
- Lawrence H Uricchio
- Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | |
Collapse
|
53
|
Dissecting the genetic make-up of North-East Sardinia using a large set of haploid and autosomal markers. Eur J Hum Genet 2012; 20:956-64. [PMID: 22378280 DOI: 10.1038/ejhg.2012.22] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Sardinia has been used for genetic studies because of its historical isolation, genetic homogeneity and increased prevalence of certain rare diseases. Controversy remains concerning the genetic substructure and the extent of genetic homogeneity, which has implications for the design of genome-wide association studies (GWAS). We revisited this issue by examining the genetic make-up of a sample from North-East Sardinia using a dense set of autosomal, Y chromosome and mitochondrial markers to assess the potential of the sample for GWAS and fine mapping studies. We genotyped individuals for 500K single-nucleotide polymorphisms, Y chromosome markers and sequenced the mitochondrial hypervariable (HVI-HVII) regions. We identified major haplogroups and compared these with other populations. We estimated linkage disequilibrium (LD) and haplotype diversity across autosomal markers, and compared these with other populations. Our results show that within Sardinia there is no major population substructure and thus it can be considered a genetically homogenous population. We did not find substantial differences in the extent of LD in Sardinians compared with other populations. However, we showed that at least 9% of genomic regions in Sardinians differed in LD structure, which is helpful for identifying functional variants using fine mapping. We concluded that Sardinia is a powerful setting for genetic studies including GWAS and other mapping approaches.
Collapse
|
54
|
A genome-wide association scan on the levels of markers of inflammation in Sardinians reveals associations that underpin its complex regulation. PLoS Genet 2012; 8:e1002480. [PMID: 22291609 PMCID: PMC3266885 DOI: 10.1371/journal.pgen.1002480] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 11/30/2011] [Indexed: 11/18/2022] Open
Abstract
Identifying the genes that influence levels of pro-inflammatory molecules can help to elucidate the mechanisms underlying this process. We first conducted a two-stage genome-wide association scan (GWAS) for the key inflammatory biomarkers Interleukin-6 (IL-6), the general measure of inflammation erythrocyte sedimentation rate (ESR), monocyte chemotactic protein-1 (MCP-1), and high-sensitivity C-reactive protein (hsCRP) in a large cohort of individuals from the founder population of Sardinia. By analysing 731,213 autosomal or X chromosome SNPs and an additional ∼1.9 million imputed variants in 4,694 individuals, we identified several SNPs associated with the selected quantitative trait loci (QTLs) and replicated all the top signals in an independent sample of 1,392 individuals from the same population. Next, to increase power to detect and resolve associations, we further genotyped the whole cohort (6,145 individuals) for 293,875 variants included on the ImmunoChip and MetaboChip custom arrays. Overall, our combined approach led to the identification of 9 genome-wide significant novel independent signals-5 of which were identified only with the custom arrays-and provided confirmatory evidence for an additional 7. Novel signals include: for IL-6, in the ABO gene (rs657152, p = 2.13×10(-29)); for ESR, at the HBB (rs4910472, p = 2.31×10(-11)) and UCN119B/SPPL3 (rs11829037, p = 8.91×10(-10)) loci; for MCP-1, near its receptor CCR2 (rs17141006, p = 7.53×10(-13)) and in CADM3 (rs3026968, p = 7.63×10(-13)); for hsCRP, within the CRP gene (rs3093077, p = 5.73×10(-21)), near DARC (rs3845624, p = 1.43×10(-10)), UNC119B/SPPL3 (rs11829037, p = 1.50×10(-14)), and ICOSLG/AIRE (rs113459440, p = 1.54×10(-08)) loci. Confirmatory evidence was found for IL-6 in the IL-6R gene (rs4129267); for ESR at CR1 (rs12567990) and TMEM57 (rs10903129); for MCP-1 at DARC (rs12075); and for hsCRP at CRP (rs1205), HNF1A (rs225918), and APOC-I (rs4420638). Our results improve the current knowledge of genetic variants underlying inflammation and provide novel clues for the understanding of the molecular mechanisms regulating this complex process.
Collapse
|
55
|
Abstract
Identity-by-descent (IBD) mapping tests whether cases share more segments of IBD around a putative causal variant than do controls. These segments of IBD can be accurately detected from genome-wide SNP data. We investigate the power of IBD mapping relative to that of SNP association testing for genome-wide case-control SNP data. Our focus is particularly on rare variants, as these tend to be more recent and hence more likely to have recent shared ancestry. We simulate data from both large and small populations and find that the relative performance of IBD mapping and SNP association testing depends on population demographic history and the strength of selection against causal variants. We also present an IBD mapping analysis of a type 1 diabetes data set. In those data we find that we can detect association only with the HLA region using IBD mapping. Overall, our results suggest that IBD mapping may have higher power than association analysis of SNP data when multiple rare causal variants are clustered within a gene. However, for outbred populations, very large sample sizes may be required for genome-wide significance unless the causal variants have strong effects.
Collapse
|
56
|
Low-pass genome-wide sequencing and variant inference using identity-by-descent in an isolated human population. Genetics 2011; 190:679-89. [PMID: 22135348 DOI: 10.1534/genetics.111.134874] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference.
Collapse
|
57
|
Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS, Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L, Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM, Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA, Masys DR, Roden DM, de Andrade M. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet 2011; 89:529-42. [PMID: 21981779 PMCID: PMC3188836 DOI: 10.1016/j.ajhg.2011.09.008] [Citation(s) in RCA: 193] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Revised: 09/15/2011] [Accepted: 09/15/2011] [Indexed: 12/20/2022] Open
Abstract
We repurposed existing genotypes in DNA biobanks across the Electronic Medical Records and Genomics network to perform a genome-wide association study for primary hypothyroidism, the most common thyroid disease. Electronic selection algorithms incorporating billing codes, laboratory values, text queries, and medication records identified 1317 cases and 5053 controls of European ancestry within five electronic medical records (EMRs); the algorithms' positive predictive values were 92.4% and 98.5% for cases and controls, respectively. Four single-nucleotide polymorphisms (SNPs) in linkage disequilibrium at 9q22 near FOXE1 were associated with hypothyroidism at genome-wide significance, the strongest being rs7850258 (odds ratio [OR] 0.74, p = 3.96 × 10(-9)). This association was replicated in a set of 263 cases and 1616 controls (OR = 0.60, p = 5.7 × 10(-6)). A phenome-wide association study (PheWAS) that was performed on this locus with 13,617 individuals and more than 200,000 patient-years of billing data identified associations with additional phenotypes: thyroiditis (OR = 0.58, p = 1.4 × 10(-5)), nodular (OR = 0.76, p = 3.1 × 10(-5)) and multinodular (OR = 0.69, p = 3.9 × 10(-5)) goiters, and thyrotoxicosis (OR = 0.76, p = 1.5 × 10(-3)), but not Graves disease (OR = 1.03, p = 0.82). Thyroid cancer, previously associated with this locus, was not significantly associated in the PheWAS (OR = 1.29, p = 0.09). The strongest association in the PheWAS was hypothyroidism (OR = 0.76, p = 2.7 × 10(-13)), which had an odds ratio that was nearly identical to that of the curated case-control population in the primary analysis, providing further validation of the PheWAS method. Our findings indicate that EMR-linked genomic data could allow discovery of genes associated with many diseases without additional genotyping cost.
Collapse
Affiliation(s)
- Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN 37232, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Gusev A, Kenny EE, Lowe JK, Salit J, Saxena R, Kathiresan S, Altshuler DM, Friedman JM, Breslow JL, Pe'er I. DASH: a method for identical-by-descent haplotype mapping uncovers association with recent variation. Am J Hum Genet 2011; 88:706-717. [PMID: 21620352 DOI: 10.1016/j.ajhg.2011.04.023] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Revised: 04/13/2011] [Accepted: 04/26/2011] [Indexed: 02/01/2023] Open
Abstract
Rare variants affecting phenotype pose a unique challenge for human genetics. Although genome-wide association studies have successfully detected many common causal variants, they are underpowered in identifying disease variants that are too rare or population-specific to be imputed from a general reference panel and thus are poorly represented on commercial SNP arrays. We set out to overcome these challenges and detect association between disease and rare alleles using SNP arrays by relying on long stretches of genomic sharing that are identical by descent. We have developed an algorithm, DASH, which builds upon pairwise identical-by-descent shared segments to infer clusters of individuals likely to be sharing a single haplotype. DASH constructs a graph with nodes representing individuals and links on the basis of such segments spanning a locus and uses an iterative minimum cut algorithm to identify densely connected components. We have applied DASH to simulated data and diverse GWAS data sets by constructing haplotype clusters and testing them for association. In simulations we show this approach to be significantly more powerful than single-marker testing in an isolated population that is from Kosrae, Federated States of Micronesia and has abundant IBD, and we provide orthogonal information for rare, recent variants in the outbred Wellcome Trust Case-Control Consortium (WTCCC) data. In both cohorts, we identified a number of haplotype associations, five such loci in the WTCCC data and ten in the isolated, that were conditionally significant beyond any individual nearby markers. We have replicated one of these loci in an independent European cohort and identified putative structural changes in low-pass whole-genome sequence of the cluster carriers.
Collapse
Affiliation(s)
- Alexander Gusev
- Department of Computer Science, Columbia University, New York, NY 10027, USA
| | - Eimear E Kenny
- Department of Computer Science, Columbia University, New York, NY 10027, USA; Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Jennifer K Lowe
- Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Jaqueline Salit
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Richa Saxena
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Cardiovascular Disease Prevention Center, Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA
| | - David M Altshuler
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Center for Human Genetic Research and Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Jeffrey M Friedman
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Jan L Breslow
- Medical Sciences and Human Genetics, Rockefeller University, New York, NY 10065, USA
| | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, NY 10027, USA.
| |
Collapse
|
59
|
Kraja AT, Hunt SC, Rao DC, Dávila-Román VG, Arnett DK, Province MA. Genetics of hypertension and cardiovascular disease and their interconnected pathways: lessons from large studies. Curr Hypertens Rep 2011; 13:46-54. [PMID: 21128019 DOI: 10.1007/s11906-010-0174-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Blood pressure (BP), hypertension (HT) and cardiovascular disease (CVD) are common complex phenotypes, which are affected by multiple genetic and environmental factors. This article describes recent genome-wide association studies (GWAS) that have reported causative variants for BP/HT and CVD/heart traits and analyzes the overlapping associated gene polymorphisms. It also examines potential replication of findings from the HyperGEN data on African Americans and whites. Several genes involved in BP/HT regulation also appear to be involved in CVD. A better picture is emerging, with overlapping hot-spot regions and with interconnected pathways between BP/HT and CVD. A systemic approach to full understanding of BP/HT and CVD development and their progression to disease may lead to the identification of gene targets and pathways for the development of novel therapeutic interventions.
Collapse
Affiliation(s)
- Aldi T Kraja
- Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Avenue, St. Louis, MO 63108, USA.
| | | | | | | | | | | |
Collapse
|
60
|
A genome-wide analysis of population structure in the Finnish Saami with implications for genetic association studies. Eur J Hum Genet 2010; 19:347-52. [PMID: 21150888 DOI: 10.1038/ejhg.2010.179] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The understanding of patterns of genetic variation within and among human populations is a prerequisite for successful genetic association mapping studies of complex diseases and traits. Some populations are more favorable for association mapping studies than others. The Saami from northern Scandinavia and the Kola Peninsula represent a population isolate that, among European populations, has been less extensively sampled, despite some early interest for association mapping studies. In this paper, we report the results of a first genome-wide SNP-based study of genetic population structure in the Finnish Saami. Using data from the HapMap and the human genome diversity project (HGDP-CEPH) and recently developed statistical methods, we studied individual genetic ancestry. We quantified genetic differentiation between the Saami population and the HGDP-CEPH populations by calculating pair-wise F(ST) statistics and by characterizing identity-by-state sharing for pair-wise population comparisons. This study affirms an east Asian contribution to the predominantly European-derived Saami gene pool. Using model-based individual ancestry analysis, the median estimated percentage of the genome with east Asian ancestry was 6% (first and third quartiles: 5 and 8%, respectively). We found that genetic similarity between population pairs roughly correlated with geographic distance. Among the European HGDP-CEPH populations, F(ST) was smallest for the comparison with the Russians (F(ST)=0.0098), and estimates for the other population comparisons ranged from 0.0129 to 0.0263. Our analysis also revealed fine-scale substructure within the Finnish Saami and warns against the confounding effects of both hidden population structure and undocumented relatedness in genetic association studies of isolated populations.
Collapse
|
61
|
Kenny EE, Kim M, Gusev A, Lowe JK, Salit J, Smith JG, Kovvali S, Kang HM, Newton-Cheh C, Daly MJ, Stoffel M, Altshuler DM, Friedman JM, Eskin E, Breslow JL, Pe'er I. Increased power of mixed models facilitates association mapping of 10 loci for metabolic traits in an isolated population. Hum Mol Genet 2010; 20:827-39. [PMID: 21118897 DOI: 10.1093/hmg/ddq510] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The potential benefits of using population isolates in genetic mapping, such as reduced genetic, phenotypic and environmental heterogeneity, are offset by the challenges posed by the large amounts of direct and cryptic relatedness in these populations confounding basic assumptions of independence. We have evaluated four representative specialized methods for association testing in the presence of relatedness; (i) within-family (ii) within- and between-family and (iii) mixed-models methods, using simulated traits for 2906 subjects with known genome-wide genotype data from an extremely isolated population, the Island of Kosrae, Federated States of Micronesia. We report that mixed models optimally extract association information from such samples, demonstrating 88% power to rank the true variant as among the top 10 genome-wide with 56% achieving genome-wide significance, a >80% improvement over the other methods, and demonstrate that population isolates have similar power to non-isolate populations for observing variants of known effects. We then used the mixed-model method to reanalyze data for 17 published phenotypes relating to metabolic traits and electrocardiographic measures, along with another 8 previously unreported. We replicate nine genome-wide significant associations with known loci of plasma cholesterol, high-density lipoprotein, low-density lipoprotein, triglycerides, thyroid stimulating hormone, homocysteine, C-reactive protein and uric acid, with only one detected in the previous analysis of the same traits. Further, we leveraged shared identity-by-descent genetic segments in the region of the uric acid locus to fine-map the signal, refining the known locus by a factor of 4. Finally, we report a novel associations for height (rs17629022, P< 2.1 × 10⁻⁸).
Collapse
Affiliation(s)
- Eimear E Kenny
- Department of Computer Science, Columbia University, 505 Computer Science Building, 1214 Amsterdam Ave.: Mailcode 0401, New York, NY 10027-7003, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
62
|
Oh SH, Cho SA, Park TS. Joint Identification of Multiple Genetic Variants of Obesity in a Korean Genome-wide Association Study. Genomics Inform 2010. [DOI: 10.5808/gi.2010.8.3.142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
63
|
Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M. Genome-wide association studies in diverse populations. Nat Rev Genet 2010; 11:356-66. [PMID: 20395969 PMCID: PMC3079573 DOI: 10.1038/nrg2760] [Citation(s) in RCA: 414] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genome-wide association (GWA) studies have identified a large number of SNPs associated with disease phenotypes. As most GWA studies have been performed in populations of European descent, this Review examines the issues involved in extending the consideration of GWA studies to diverse worldwide populations. Although challenges exist with issues such as imputation, admixture and replication, investigation of a greater diversity of populations could make substantial contributions to the goal of mapping the genetic determinants of complex diseases for the human population as a whole.
Collapse
Affiliation(s)
- Noah A Rosenberg
- Department of Human Genetics, University of Michigan, Ann Arbor, Michigan 48109, USA.
| | | | | | | | | | | |
Collapse
|
64
|
Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010; 42:348-54. [PMID: 20208533 DOI: 10.1038/ng.548] [Citation(s) in RCA: 1815] [Impact Index Per Article: 129.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 02/09/2010] [Indexed: 02/07/2023]
Abstract
Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.
Collapse
|
65
|
Bonnen PE, Lowe JK, Altshuler DM, Breslow JL, Stoffel M, Friedman JM, Pe'er I. European admixture on the Micronesian island of Kosrae: lessons from complete genetic information. Eur J Hum Genet 2010; 18:309-16. [PMID: 19844264 PMCID: PMC2987223 DOI: 10.1038/ejhg.2009.180] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2009] [Revised: 08/11/2009] [Accepted: 09/11/2009] [Indexed: 01/16/2023] Open
Abstract
The architecture of natural variation present in a contemporary population is a result of multiple population genetic forces, including population bottleneck and expansion, selection, drift, and admixture. We seek to untangle the contribution of admixture to genetic diversity on the Micronesian island of Kosrae. Toward this goal, we used a complete genetic approach by combining a dense genome-wide map of 100,000 single-nucleotide polymorphisms (SNPs) with data from uniparental markers from the mitochondrial genome and the nonrecombining portion of the Y chromosome. These markers were typed in approximately 3200 individuals from Kosrae, representing 80% of the adult population of the island. We developed novel software that uses SNP data to delineate ancestry for individual segments of the genome. Through this analysis, we determined that 39% of Kosraens have some European ancestry. However, the vast majority of admixed individuals (77%) have European alleles spanning less than 10% of their genomes. Data from uniparental markers show most of this admixture to be male, introduced in the late nineteenth century. Furthermore, pedigree analysis shows that the majority of European admixture on Kosrae is because of the contribution of one individual. This approach shows the benefit of combining information from autosomal and uniparental polymorphisms and provides new methodology for determining ancestry in a population.
Collapse
Affiliation(s)
- Penelope E Bonnen
- Department of Human and Molecular Genetics, Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | |
Collapse
|
66
|
Bittles AH, Black ML. Evolution in health and medicine Sackler colloquium: Consanguinity, human evolution, and complex diseases. Proc Natl Acad Sci U S A 2010; 107 Suppl 1:1779-86. [PMID: 19805052 PMCID: PMC2868287 DOI: 10.1073/pnas.0906079106] [Citation(s) in RCA: 318] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
There is little information on inbreeding during the critical early years of human existence. However, given the small founding group sizes and restricted mate choices it seems inevitable that intrafamilial reproduction occurred and the resultant levels of inbreeding would have been substantial. Currently, couples related as second cousins or closer (F >or= 0.0156) and their progeny account for an estimated 10.4% of the global population. The highest rates of consanguineous marriage occur in north and sub-Saharan Africa, the Middle East, and west, central, and south Asia. In these regions even couples who regard themselves as unrelated may exhibit high levels of homozygosity, because marriage within clan, tribe, caste, or biraderi boundaries has been a long-established tradition. Mortality in first-cousin progeny is approximately 3.5% higher than in nonconsanguineous offspring, although demographic, social, and economic factors can significantly influence the outcome. Improving socioeconomic conditions and better access to health care will impact the effects of consanguinity, with a shift from infant and childhood mortality to extended morbidity. At the same time, a range of primarily social factors, including urbanization, improved female education, and smaller family sizes indicate that the global prevalence of consanguineous unions will decline. This shift in marriage patterns will initially result in decreased homozygosity, accompanied by a reduction in the expression of recessive single-gene disorders. Although the roles of common and rare gene variants in the etiology of complex disease remain contentious, it would be expected that declining consanguinity would also be reflected in reduced prevalence of complex diseases, especially in population isolates.
Collapse
Affiliation(s)
- A H Bittles
- Centre for Comparative Genomics, Murdoch University, South Street, Perth WA 6150, Australia.
| | | |
Collapse
|
67
|
Genetic variation in SCN10A influences cardiac conduction. Nat Genet 2010; 42:149-52. [DOI: 10.1038/ng.516] [Citation(s) in RCA: 216] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 11/20/2009] [Indexed: 12/17/2022]
|
68
|
Systematic haplotype analysis resolves a complex plasma plant sterol locus on the Micronesian Island of Kosrae. Proc Natl Acad Sci U S A 2009; 106:13886-91. [PMID: 19667188 DOI: 10.1073/pnas.0907336106] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Pinpointing culprit causal variants along signal peaks of genome-wide association studies (GWAS) is challenging. To overcome confounding effects of multiple independent variants at such a locus and narrow the interval for causal allele capture, we developed an approach that maps local shared haplotypes harboring a putative causal variant. We demonstrate our method in an extreme isolate founder population, the pacific Island of Kosrae. We analyzed plasma plant sterol (PPS) levels, a surrogate measure of cholesterol absorption from the intestine, where previous studies have implicated 2p21 mutations in the ATP binding cassette subfamily G members 5 or 8 (ABCG5 or ABCG8) genes. We have previously reported that 11.1% of the islanders are carriers of a frameshift ABCG8 mutation increasing PPS levels in carriers by 50%. GWAS adjusted for this mutation revealed genomewide significant signals along 11 Mb around it. To fine-map this signal, we detected pairwise identity-by-descent haplotypes using our tool GERMLINE and implemented a clustering algorithm to identify haplotypes shared across multiple samples with their unique shared boundaries. A single 526-kb haplotype mapped strongly to PPS levels, dramatically refining the mapped interval. This haplotype spans the ABCG5/ABCG8 genes, is carried by 1.8% of the islanders, and results in a striking 100% increase of PPS in carriers. Resequencing of ABCG5 in these carriers found a D450H missense mutation along the associated haplotype. These findings exemplify the power of haplotype analysis for mapping mutations in isolated populations and specifically for dissecting effects of multiple variants of the same locus.
Collapse
|
69
|
Tomaszewski M, Charchar FJ, Barnes T, Gawron-Kiszka M, Sedkowska A, Podolecka E, Kowalczyk J, Rathbone W, Kalarus Z, Grzeszczak W, Goodall AH, Samani NJ, Zukowska-Szczechowska E. A common variant in low-density lipoprotein receptor-related protein 6 gene (LRP6) is associated with LDL-cholesterol. Arterioscler Thromb Vasc Biol 2009; 29:1316-21. [PMID: 19667113 DOI: 10.1161/atvbaha.109.185355] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
OBJECTIVE A rare mutation in low-density lipoprotein receptor-related protein 6 gene (LRP6) was identified as the primary molecular defect underlying monogenic form of coronary artery disease. We hypothesized that common variants in LRP6 could predispose subjects to elevated LDL-cholesterol (LDL-C). METHODS AND RESULTS Twelve common (minor allele frequency > or =0.1) single nucleotide polymorphisms in LRP6 were genotyped in 703 individuals from 213 Polish pedigrees (Silesian Cardiovascular Study families). The family-based analysis revealed that the minor allele of rs10845493 clustered with elevated LDL-C in offspring more frequently than expected by chance (P=0.0053). The quantitative analysis restricted to subjects free of lipid-lowering treatment confirmed the association between rs10845493 and age-, sex-, and BMI-adjusted circulating levels of LDL-C in families as well as 2 additional populations - 218 unrelated subjects from Silesian Cardiovascular Study replication panel and 1138 individuals from Young Men Cardiovascular Association cohort (P=0.0268, P=0.0476, and P=0.0472, respectively). In the inverse variance weighted meta-analysis of the 3 populations each extra minor allele copy of rs10845493 was associated with 0.14 mmol/L increase in age-, sex-, and BMI-adjusted LDL-C (SE=0.05, P=0.0038). CONCLUSIONS Common polymorphism in the gene underlying monogenic form of coronary artery disease impacts on risk of LDL-C elevation.
Collapse
Affiliation(s)
- Maciej Tomaszewski
- Department of Cardiovascular Sciences, University of Leicester, Clinical Sciences Wing, Glenfield Hospital, Leicester, LE3 9QP, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
70
|
Abstract
The last few years have seen major advances in common non-syndromic obesity research, much of it the result of genetic studies. This Review outlines the competing hypotheses about the mechanisms underlying the genetic and physiological basis of obesity, and then examines the recent explosion of genetic association studies that have yielded insights into obesity, both at the candidate gene level and the genome-wide level. With obesity genetics now entering the post-genome-wide association scan era, the obvious question is how to improve the results obtained so far using single nucleotide polymorphism markers and how to move successfully into the other areas of genomic variation that may be associated with common obesity.
Collapse
|
71
|
Smith JG, Lowe JK, Kovvali S, Maller JB, Salit J, Daly MJ, Stoffel M, Altshuler DM, Friedman JM, Breslow JL, Newton-Cheh C. Genome-wide association study of electrocardiographic conduction measures in an isolated founder population: Kosrae. Heart Rhythm 2009; 6:634-41. [PMID: 19389651 DOI: 10.1016/j.hrthm.2009.02.022] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 02/11/2009] [Indexed: 12/19/2022]
Abstract
BACKGROUND Cardiac conduction, as assessed by electrocardiographic PR interval and QRS duration, is an important electrophysiological trait and a determinant of arrhythmia risk. OBJECTIVE We sought to identify common genetic determinants of these measures. METHODS We examined 1604 individuals from the island of Kosrae, Federated States of Micronesia, an isolated founder population. We adjusted for covariates and estimated the heritability of quantitative electrocardiographic QRS duration and PR interval and, secondarily, its subcomponents, P-wave duration and PR segment. Finally, we performed a genome-wide association study (GWAS) in a subset of 1262 individuals genotyped using the Affymetrix GeneChip Human Mapping 500K microarray. RESULTS The heritability of PR interval was 34% (standard error [SE] 5%, P = 4 x 10(-18)); of PR segment, 31% (SE 6%, P = 3.2 x 10(-13)); and of P-wave duration, 17% (SE 5%, P = 5.8 x 10(-6)), but the heritablility of QRS duration was only 3% (SE 4%, P = .20). Hence, GWAS was performed only for the PR interval and its subcomponents. A total of 338,049 single nucleotide polymorphisms (SNPs) passed quality filters. For the PR interval, the most significantly associated SNPs were located in and downstream of the alpha-subunit of the cardiac voltage-gated sodium channel gene SCN5A, with a 4.8 ms (SE 1.0) or 0.23 standard deviation increase in adjusted PR interval for each minor allele copy of rs7638909 (P = 1.6 x 10(-6), minor allele frequency 0.40). These SNPs were also associated with P-wave duration (P = 1.5 x 10(-4)) and PR segment (P = .01) but not with QRS duration (P > or =.22). CONCLUSIONS The PR interval and its subcomponents showed substantial heritability in a South Pacific islander population and were associated with common genetic variation in SCN5A.
Collapse
Affiliation(s)
- J Gustav Smith
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Abstract
Genome-wide association studies have opened a new era in the study of the genetic basis of common, multifactorial diseases and traits. Before the introduction of this approach only a handful of common genetic variants showed consistent association for any phenotype. Using genome-wide association, scores of novel and unsuspected loci have been discovered and later replicated for many complex traits. The principle is to genotype a dense set of common genetic variants across the genomes of individuals with phenotypic differences and examine whether genotype is associated with phenotype. Because the last common human ancestor was relatively recent and recombination events are concentrated in focal hotspots, most common variation in the human genome can be surveyed using a few hundred thousand variants acting as proxies for ungenotyped variants. Here, we describe the different steps of genome-wide association studies and use a recent study as example.
Collapse
Affiliation(s)
- J Gustav Smith
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA
| | | |
Collapse
|
73
|
Gusev A, Lowe JK, Stoffel M, Daly MJ, Altshuler D, Breslow JL, Friedman JM, Pe'er I. Whole population, genome-wide mapping of hidden relatedness. Genome Res 2008; 19:318-26. [PMID: 18971310 DOI: 10.1101/gr.081398.108] [Citation(s) in RCA: 312] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.
Collapse
Affiliation(s)
- Alexander Gusev
- Department of Computer Science, Columbia University, New York, New York 10027, USA.
| | | | | | | | | | | | | | | |
Collapse
|