1
|
A comprehensive evaluation of polygenic score and genotype imputation performances of human SNP arrays in diverse populations. Sci Rep 2022; 12:17556. [PMID: 36266455 PMCID: PMC9585077 DOI: 10.1038/s41598-022-22215-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 10/11/2022] [Indexed: 01/13/2023] Open
Abstract
Regardless of the overwhelming use of next-generation sequencing technologies, microarray-based genotyping combined with the imputation of untyped variants remains a cost-effective means to interrogate genetic variations across the human genome. This technology is widely used in genome-wide association studies (GWAS) at bio-bank scales, and more recently, in polygenic score (PGS) analysis to predict and stratify disease risk. Over the last decade, human genotyping arrays have undergone a tremendous growth in both number and content making a comprehensive evaluation of their performances became more important. Here, we performed a comprehensive performance assessment for 23 available human genotyping arrays in 6 ancestry groups using diverse public and in-house datasets. The analyses focus on performance estimation of derived imputation (in terms of accuracy and coverage) and PGS (in terms of concordance to PGS estimated from whole-genome sequencing data) in three different traits and diseases. We found that the arrays with a higher number of SNPs are not necessarily the ones with higher imputation performance, but the arrays that are well-optimized for the targeted population could provide very good imputation performance. In addition, PGS estimated by imputed SNP array data is highly correlated to PGS estimated by whole-genome sequencing data in most cases. When optimal arrays are used, the correlations of PGS between two types of data are higher than 0.97, but interestingly, arrays with high density can result in lower PGS performance. Our results suggest the importance of properly selecting a suitable genotyping array for PGS applications. Finally, we developed a web tool that provides interactive analyses of tag SNP contents and imputation performance based on population and genomic regions of interest. This study would act as a practical guide for researchers to design their genotyping arrays-based studies. The tool is available at: https://genome.vinbigdata.org/tools/saa/ .
Collapse
|
2
|
Al-Maitah M. Analyzing genetic diseases using multimedia processing techniques associative decision tree-based learning and Hopfield dynamic neural networks from medical images. Neural Comput Appl 2020. [DOI: 10.1007/s00521-018-04004-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
3
|
Moon S, Kim YJ, Han S, Hwang MY, Shin DM, Park MY, Lu Y, Yoon K, Jang HM, Kim YK, Park TJ, Song DS, Park JK, Lee JE, Kim BJ. The Korea Biobank Array: Design and Identification of Coding Variants Associated with Blood Biochemical Traits. Sci Rep 2019; 9:1382. [PMID: 30718733 PMCID: PMC6361960 DOI: 10.1038/s41598-018-37832-9] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 12/11/2018] [Indexed: 02/05/2023] Open
Abstract
We introduce the design and implementation of a new array, the Korea Biobank Array (referred to as KoreanChip), optimized for the Korean population and demonstrate findings from GWAS of blood biochemical traits. KoreanChip comprised >833,000 markers including >247,000 rare-frequency or functional variants estimated from >2,500 sequencing data in Koreans. Of the 833 K markers, 208 K functional markers were directly genotyped. Particularly, >89 K markers were presented in East Asians. KoreanChip achieved higher imputation performance owing to the excellent genomic coverage of 95.38% for common and 73.65% for low-frequency variants. From GWAS (Genome-wide association study) using 6,949 individuals, 28 associations were successfully recapitulated. Moreover, 9 missense variants were newly identified, of which we identified new associations between a common population-specific missense variant, rs671 (p.Glu457Lys) of ALDH2, and two traits including aspartate aminotransferase (P = 5.20 × 10−13) and alanine aminotransferase (P = 4.98 × 10−8). Furthermore, two novel missense variants of GPT with rare frequency in East Asians but extreme rarity in other populations were associated with alanine aminotransferase (rs200088103; p.Arg133Trp, P = 2.02 × 10−9 and rs748547625; p.Arg143Cys, P = 1.41 × 10−6). These variants were successfully replicated in 6,000 individuals (P = 5.30 × 10−8 and P = 1.24 × 10−6). GWAS results suggest the promising utility of KoreanChip with a substantial number of damaging variants to identify new population-specific disease-associated rare/functional variants.
Collapse
Affiliation(s)
- Sanghoon Moon
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Young Jin Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Sohee Han
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Mi Yeong Hwang
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Dong Mun Shin
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | | | | | - Kyungheon Yoon
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Hye-Mi Jang
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Yun Kyoung Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Tae-Joon Park
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Dae Sub Song
- Division of Epidemiology and Health Index, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Jae Kyung Park
- Division of Epidemiology and Health Index, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea
| | - Jong-Eun Lee
- DNA link, Incorporated, Seoul, 03759, Republic of Korea
| | - Bong-Jo Kim
- Division of Genome Research, Center for Genome Science, National Institute of Health, Chungcheongbuk-do, 28159, Republic of Korea.
| |
Collapse
|
4
|
Davidson BA, Hassan S, Garcia EJ, Tayebi N, Sidransky E. Exploring genetic modifiers of Gaucher disease: The next horizon. Hum Mutat 2018; 39:1739-1751. [PMID: 30098107 PMCID: PMC6240360 DOI: 10.1002/humu.23611] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 08/01/2018] [Accepted: 08/03/2018] [Indexed: 12/26/2022]
Abstract
Gaucher disease is an autosomal recessive lysosomal storage disorder resulting from mutations in the gene GBA1 that lead to a deficiency in the enzyme glucocerebrosidase. Accumulation of the enzyme's substrates, glucosylceramide and glucosylsphingosine, results in symptoms ranging from skeletal and visceral involvement to neurological manifestations. Nonetheless, there is significant variability in clinical presentations amongst patients, with limited correlation between genotype and phenotype. Contributing to this clinical variation are genetic modifiers that influence the phenotypic outcome of the disorder. In this review, we explore the role of genetic modifiers in Mendelian disorders and describe methods to facilitate their discovery. In addition, we provide examples of candidate modifiers of Gaucher disease, explore their relevance in the development of potential therapeutics, and discuss the impact of GBA1 and modifying mutations on other more common diseases like Parkinson disease. Identifying these important modulators of Gaucher phenotype may ultimately unravel the complex relationship between genotype and phenotype and lead to improved counseling and treatments.
Collapse
Affiliation(s)
- Brad A. Davidson
- Section on Molecular Neurogenetics, Medical Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Shahzeb Hassan
- Section on Molecular Neurogenetics, Medical Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Eric Joshua Garcia
- Section on Molecular Neurogenetics, Medical Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Nahid Tayebi
- Section on Molecular Neurogenetics, Medical Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Ellen Sidransky
- Section on Molecular Neurogenetics, Medical Genetics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
5
|
Abstract
Sample automation and management is increasingly important as the number and size of population-scale and high-throughput projects grow. This is particularly the case in large-scale population studies where sample size is far outpacing the commonly used 96-well plate format. To facilitate management and transfer of samples in this format, we present Samasy, a web-based application for the construction of a sample database, intuitive display of sample and batch information, and facilitation of automated sample transfer or subset. Samasy is designed with ease-of-use in mind, can be quickly set up, and runs in any web browser.
Collapse
|
6
|
Wojcik GL, Fuchsberger C, Taliun D, Welch R, Martin AR, Shringarpure S, Carlson CS, Abecasis G, Kang HM, Boehnke M, Bustamante CD, Gignoux CR, Kenny EE. Imputation-Aware Tag SNP Selection To Improve Power for Large-Scale, Multi-ethnic Association Studies. G3 (BETHESDA, MD.) 2018; 8:3255-3267. [PMID: 30131328 PMCID: PMC6169386 DOI: 10.1534/g3.118.200502] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 08/03/2018] [Indexed: 01/26/2023]
Abstract
The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.
Collapse
Affiliation(s)
- Genevieve L Wojcik
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christian Fuchsberger
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
- Center for Biomedicine, European Academy of Bolzano/Bozen (EURAC), affiliated with the University of Lübeck, Bolzano, Bozen, 39100, Italy
| | - Daniel Taliun
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Ryan Welch
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Alicia R Martin
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Suyash Shringarpure
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christopher S Carlson
- Fred Hutchinson Cancer Center, University of Washington, 1100 Fairview Ave. N., Seattle, WA 98109
| | - Goncalo Abecasis
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109
| | - Carlos D Bustamante
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
- Department of Biomedical Data Science, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Christopher R Gignoux
- Department of Genetics, Stanford University School of Medicine, 365 Lasuen Street, Littlefield Center MC2069, Stanford, CA 94305
| | - Eimear E Kenny
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Icahn Institute of Multiscale Biology and Genomics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
- The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029
| |
Collapse
|
7
|
Jorgenson E, Thai KK, Hoffmann TJ, Sakoda LC, Kvale MN, Banda Y, Schaefer C, Risch N, Mertens J, Weisner C, Choquet H. Genetic contributors to variation in alcohol consumption vary by race/ethnicity in a large multi-ethnic genome-wide association study. Mol Psychiatry 2017; 22:1359-1367. [PMID: 28485404 PMCID: PMC5568932 DOI: 10.1038/mp.2017.101] [Citation(s) in RCA: 95] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Revised: 03/03/2017] [Accepted: 03/27/2017] [Indexed: 01/08/2023]
Abstract
Alcohol consumption is a complex trait determined by both genetic and environmental factors, and is correlated with the risk of alcohol use disorders. Although a small number of genetic loci have been reported to be associated with variation in alcohol consumption, genetic factors are estimated to explain about half of the variance in alcohol consumption, suggesting that additional loci remain to be discovered. We conducted a genome-wide association study (GWAS) of alcohol consumption in the large Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort, in four race/ethnicity groups: non-Hispanic whites, Hispanic/Latinos, East Asians and African Americans. We examined two statistically independent phenotypes reflecting subjects' alcohol consumption during the past year, based on self-reported information: any alcohol intake (drinker/non-drinker status) and the regular quantity of drinks consumed per week (drinks/week) among drinkers. We assessed these two alcohol consumption phenotypes in each race/ethnicity group, and in a combined trans-ethnic meta-analysis comprising a total of 86 627 individuals. We observed the strongest association between the previously reported single nucleotide polymorphism (SNP) rs671 in ALDH2 and alcohol drinker status (odd ratio (OR)=0.40, P=2.28 × 10-72) in East Asians, and also an effect on drinks/week (beta=-0.17, P=5.42 × 10-4) in the same group. We also observed a genome-wide significant association in non-Hispanic whites between the previously reported SNP rs1229984 in ADH1B and both alcohol consumption phenotypes (OR=0.79, P=2.47 × 10-20 for drinker status and beta=-0.19, P=1.91 × 10-35 for drinks/week), which replicated in Hispanic/Latinos (OR=0.72, P=4.35 × 10-7 and beta=-0.21, P=2.58 × 10-6, respectively). Although prior studies reported effects of ADH1B and ALDH2 on lifetime measures, such as risk of alcohol dependence, our study adds further evidence of the effect of the same genes on a cross-sectional measure of average drinking. Our trans-ethnic meta-analysis confirmed recent findings implicating the KLB and GCKR loci in alcohol consumption, with strongest associations observed for rs7686419 (beta=-0.04, P=3.41 × 10-10 for drinks/week and OR=0.96, P=4.08 × 10-5 for drinker status), and rs4665985 (beta=0.04, P=2.26 × 10-8 for drinks/week and OR=1.04, P=5 × 10-4 for drinker status), respectively. Finally, we also obtained confirmatory results extending previous findings implicating AUTS2, SGOL1 and SERPINC1 genes in alcohol consumption traits in non-Hispanic whites.
Collapse
Affiliation(s)
- Eric Jorgenson
- Kaiser Permanente Division of Research, Oakland, CA, USA
| | - Khanh K. Thai
- Kaiser Permanente Division of Research, Oakland, CA, USA
| | - Thomas J. Hoffmann
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA,Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Lori C. Sakoda
- Kaiser Permanente Division of Research, Oakland, CA, USA
| | - Mark N. Kvale
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Yambazi Banda
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | | | - Neil Risch
- Kaiser Permanente Division of Research, Oakland, CA, USA,Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA,Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | | | - Constance Weisner
- Kaiser Permanente Division of Research, Oakland, CA, USA,Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Hélène Choquet
- Kaiser Permanente Division of Research, Oakland, CA, USA
| |
Collapse
|
8
|
Hoffmann TJ, Witte JS. Strategies for Imputing and Analyzing Rare Variants in Association Studies. Trends Genet 2016; 31:556-563. [PMID: 26450338 DOI: 10.1016/j.tig.2015.07.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Revised: 07/28/2015] [Accepted: 07/31/2015] [Indexed: 01/22/2023]
Abstract
Rare genetic variants may be responsible for a significant amount of the uncharacterized genetic risk underlying many diseases. An efficient approach to characterizing the disease burden of rare variants may be to impute them into existing large datasets. It is well known that the ability to impute a rare variant is dependent both on the array choice and number of individuals in the reference panel carrying that variant, although it is still unclear exactly how well imputation will work for rare variants. Here, we review the additional challenges that arise when imputing rare variants, looking at studies that have been able to impute rare variants, methods behind merging reference panels, approaches for imputing rare variants, and methods for analyzing rare variants.
Collapse
Affiliation(s)
- Thomas J Hoffmann
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143 USA.
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA; Institute for Human Genetics, University of California San Francisco, San Francisco, CA, 94143 USA; Department of Urology, University of California San Francisco, San Francisco, CA 94158, USA; UCSF Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA 94158, USA
| |
Collapse
|
9
|
Common coding variants in the HLA-DQB1 region confer susceptibility to age-related macular degeneration. Eur J Hum Genet 2016; 24:1049-55. [PMID: 26733291 DOI: 10.1038/ejhg.2015.247] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Revised: 09/21/2015] [Accepted: 10/15/2015] [Indexed: 11/08/2022] Open
Abstract
Age-related macular degeneration (AMD) risk variants in the complement system point to the important role of immune response and inflammation in the pathogenesis of AMD. Although the human leukocyte antigen (HLA) region has a central role in regulating immune response, previous studies of genetic variation in HLA genes and AMD have been limited by sample size or incomplete coverage of the HLA region by first-generation genotyping arrays and imputation panels. Here, we conducted a large-scale HLA fine-mapping study with 4841 AMD cases and 23 790 controls of non-Hispanic white ancestry from the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging cohort. Genotyping was conducted using custom Affymetrix Axiom arrays, with dense coverage of the HLA region. Classic HLA polymorphisms were imputed using SNP2HLA, which utilizes a large reference panel to provide improved imputation accuracy of variants in this region. We examined a total of 6937 SNPs and 172 classical HLA alleles, conditioning on established AMD risk variants, which revealed novel associations with two non-synonymous SNPs in perfect linkage disequilibrium, rs9274390 and rs41563814 (odds ratio (OR)=1.21; P=1.4 × 10(-11)) corresponding to amino-acid changes at position 66 and 67 in HLA-DQB1, respectively, and the DQB1*02 classical HLA allele (OR=1.22; P=3.9 × 10(-10)) with the risk of AMD. We confirmed these association signals, again conditioning on established risk variants, in the MMAP data set of subjects with advanced AMD (rs9274390/rs41563814: OR=1.28; P=1.30 × 10(-3), DQB1*02: OR=1.32; P=9.00 × 10(-4)). These findings support a role of HLA class II alleles in the risk of AMD.
Collapse
|
10
|
Common polygenic variation in coeliac disease and confirmation of ZNF335 and NIFA as disease susceptibility loci. Eur J Hum Genet 2015; 24:291-7. [PMID: 25920553 PMCID: PMC4717209 DOI: 10.1038/ejhg.2015.87] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2014] [Revised: 03/05/2015] [Accepted: 03/10/2015] [Indexed: 12/22/2022] Open
Abstract
Coeliac disease (CD) is a chronic immune-mediated disease triggered by the ingestion of gluten. It has an estimated prevalence of approximately 1% in European populations. Specific HLA-DQA1 and HLA-DQB1 alleles are established coeliac susceptibility genes and are required for the presentation of gliadin to the immune system resulting in damage to the intestinal mucosa. In the largest association analysis of CD to date, 39 non-HLA risk loci were identified, 13 of which were new, in a sample of 12 014 individuals with CD and 12 228 controls using the Immunochip genotyping platform. Including the HLA, this brings the total number of known CD loci to 40. We have replicated this study in an independent Irish CD case–control population of 425 CD and 453 controls using the Immunochip platform. Using a binomial sign test, we show that the direction of the effects of previously described risk alleles were highly correlated with those reported in the Irish population, (P=2.2 × 10−16). Using the Polygene Risk Score (PRS) approach, we estimated that up to 35% of the genetic variance could be explained by loci present on the Immunochip (P=9 × 10−75). When this is limited to non-HLA loci, we explain a maximum of 4.5% of the genetic variance (P=3.6 × 10−18). Finally, we performed a meta-analysis of our data with the previous reports, identifying two further loci harbouring the ZNF335 and NIFA genes which now exceed genome-wide significance, taking the total number of CD susceptibility loci to 42.
Collapse
|
11
|
Medici M, Visser WE, Visser TJ, Peeters RP. Genetic determination of the hypothalamic-pituitary-thyroid axis: where do we stand? Endocr Rev 2015; 36:214-44. [PMID: 25751422 DOI: 10.1210/er.2014-1081] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
For a long time it has been known that both hypo- and hyperthyroidism are associated with an increased risk of morbidity and mortality. In recent years, it has also become clear that minor variations in thyroid function, including subclinical dysfunction and variation in thyroid function within the reference range, can have important effects on clinical endpoints, such as bone mineral density, depression, metabolic syndrome, and cardiovascular mortality. Serum thyroid parameters show substantial interindividual variability, whereas the intraindividual variability lies within a narrow range. This suggests that every individual has a unique hypothalamus-pituitary-thyroid axis setpoint that is mainly determined by genetic factors, and this heritability has been estimated to be 40-60%. Various mutations in thyroid hormone pathway genes have been identified in persons with thyroid dysfunction or altered thyroid function tests. Because these causes are rare, many candidate gene and linkage studies have been performed over the years to identify more common variants (polymorphisms) associated with thyroid (dys)function, but only a limited number of consistent associations have been found. However, in the past 5 years, advances in genetic research have led to the identification of a large number of new candidate genes. In this review, we provide an overview of the current knowledge about the polygenic basis of thyroid (dys)function. This includes new candidate genes identified by genome-wide approaches, what insights these genes provide into the genetic basis of thyroid (dys)function, and which new techniques will help to further decipher the genetic basis of thyroid (dys)function in the near future.
Collapse
Affiliation(s)
- Marco Medici
- Rotterdam Thyroid Center, Department of Internal Medicine, Erasmus Medical Center, 3015 GE Rotterdam, The Netherlands
| | | | | | | |
Collapse
|
12
|
Pistis G, Porcu E, Vrieze SI, Sidore C, Steri M, Danjou F, Busonero F, Mulas A, Zoledziewska M, Maschio A, Brennan C, Lai S, Miller MB, Marcelli M, Urru MF, Pitzalis M, Lyons RH, Kang HM, Jones CM, Angius A, Iacono WG, Schlessinger D, McGue M, Cucca F, Abecasis GR, Sanna S. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur J Hum Genet 2014; 23:975-83. [PMID: 25293720 DOI: 10.1038/ejhg.2014.216] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 08/14/2014] [Accepted: 09/09/2014] [Indexed: 12/25/2022] Open
Abstract
The utility of genotype imputation in genome-wide association studies is increasing as progressively larger reference panels are improved and expanded through whole-genome sequencing. Developing general guidelines for optimally cost-effective imputation, however, requires evaluation of performance issues that include the relative utility of study-specific compared with general/multipopulation reference panels; genotyping with various array scaffolds; effects of different ethnic backgrounds; and assessment of ranges of allele frequencies. Here we compared the effectiveness of study-specific reference panels to the commonly used 1000 Genomes Project (1000G) reference panels in the isolated Sardinian population and in cohorts of European ancestry including samples from Minnesota (USA). We also examined different combinations of genome-wide and custom arrays for baseline genotypes. In Sardinians, the study-specific reference panel provided better coverage and genotype imputation accuracy than the 1000G panels and other large European panels. In fact, even gene-centered custom arrays (interrogating ~200 000 variants) provided highly informative content across the entire genome. Gain in accuracy was also observed for Minnesotans using the study-specific reference panel, although the increase was smaller than in Sardinians, especially for rare variants. Notably, a combined panel including both study-specific and 1000G reference panels improved imputation accuracy only in the Minnesota sample, and only at rare sites. Finally, we found that when imputation is performed with a study-specific reference panel, cutoffs different from the standard thresholds of MACH-Rsq and IMPUTE-INFO metrics should be used to efficiently filter badly imputed rare variants. This study thus provides general guidelines for researchers planning large-scale genetic studies.
Collapse
Affiliation(s)
- Giorgio Pistis
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Eleonora Porcu
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Scott I Vrieze
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Carlo Sidore
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA [3] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Maristella Steri
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy
| | - Fabrice Danjou
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy
| | - Fabio Busonero
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Antonella Mulas
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | | | - Andrea Maschio
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Christine Brennan
- University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Sandra Lai
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy
| | - Michael B Miller
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Marco Marcelli
- CRS4, Parco tecnologico della Sardegna, Pula, Cagliari, Italy
| | | | | | - Robert H Lyons
- University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Hyun M Kang
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Chris M Jones
- CRS4, Parco tecnologico della Sardegna, Pula, Cagliari, Italy
| | - Andrea Angius
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] University of Michigan Sequencing Core, University of Michigan Medical School, Ann Arbor, MI, USA
| | - William G Iacono
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | | | - Matt McGue
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | - Francesco Cucca
- 1] Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy [2] Dipartimento di Scienze Biomediche, Università di Sassari, Sassari, Italy
| | - Gonçalo R Abecasis
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Serena Sanna
- Istituto di Ricerca Genetica e Biomedica (IRGB), CNR, Monserrato, Italy
| |
Collapse
|
13
|
Leslie R, O'Donnell CJ, Johnson AD. GRASP: analysis of genotype-phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 2014; 30:i185-94. [PMID: 24931982 DOI: 10.1093/bioinformatics/btu273] [Citation(s) in RCA: 179] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
SUMMARY We created a deeply extracted and annotated database of genome-wide association studies (GWAS) results. GRASP v1.0 contains >6.2 million SNP-phenotype association from among 1390 GWAS studies. We re-annotated GWAS results with 16 annotation sources including some rarely compared to GWAS results (e.g. RNAediting sites, lincRNAs, PTMs). MOTIVATION To create a high-quality resource to facilitate further use and interpretation of human GWAS results in order to address important scientific questions. RESULTS GWAS have grown exponentially, with increases in sample sizes and markers tested, and continuing bias toward European ancestry samples. GRASP contains >100 000 phenotypes, roughly: eQTLs (71.5%), metabolite QTLs (21.2%), methylation QTLs (4.4%) and diseases, biomarkers and other traits (2.8%). cis-eQTLs, meQTLs, mQTLs and MHC region SNPs are highly enriched among significant results. After removing these categories, GRASP still contains a greater proportion of studies and results than comparable GWAS catalogs. Cardiovascular disease and related risk factors pre-dominate remaining GWAS results, followed by immunological, neurological and cancer traits. Significant results in GWAS display a highly gene-centric tendency. Sex chromosome X (OR = 0.18[0.16-0.20]) and Y (OR = 0.003[0.001-0.01]) genes are depleted for GWAS results. Gene length is correlated with GWAS results at nominal significance (P ≤ 0.05) levels. We show this gene-length correlation decays at increasingly more stringent P-value thresholds. Potential pleotropic genes and SNPs enriched for multi-phenotype association in GWAS are identified. However, we note possible population stratification at some of these loci. Finally, via re-annotation we identify compelling functional hypotheses at GWAS loci, in some cases unrealized in studies to date. CONCLUSION Pooling summary-level GWAS results and re-annotating with bioinformatics predictions and molecular features provides a good platform for new insights. AVAILABILITY The GRASP database is available at http://apps.nhlbi.nih.gov/grasp.
Collapse
Affiliation(s)
- Richard Leslie
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USACardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Christopher J O'Donnell
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USACardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Andrew D Johnson
- Cardiovascular Epidemiology and Human Genomics Branch, National Heart, Lung and Blood Institute, The Framingham Heart Study, Framingham, MA 01702, University of Massachusetts Medical School, Worcester, MA 01655 and Division of Cardiology, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
14
|
Polygenic inheritance of paclitaxel-induced sensory peripheral neuropathy driven by axon outgrowth gene sets in CALGB 40101 (Alliance). THE PHARMACOGENOMICS JOURNAL 2014; 14:336-42. [PMID: 24513692 PMCID: PMC4111770 DOI: 10.1038/tpj.2014.2] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2013] [Revised: 12/28/2013] [Accepted: 01/06/2014] [Indexed: 01/15/2023]
Abstract
Peripheral neuropathy is a common dose-limiting toxicity for patients treated with paclitaxel. For most individuals there are no known risk factors that predispose patients to the adverse event, and pathogenesis for paclitaxel-induced peripheral neuropathy is unknown. Determining whether there is a heritable component to paclitaxel induced peripheral neuropathy would be valuable in guiding clinical decisions and may provide insight into treatment of and mechanisms for the toxicity. Using genotype and patient information from the paclitaxel arm of CALGB 40101 (Alliance), a phase III clinical trial evaluating adjuvant therapies for breast cancer in women, we estimated the variance in maximum grade and dose at first instance of sensory peripheral neuropathy. Our results suggest that paclitaxel-induced neuropathy has a heritable component, driven in part by genes involved in axon outgrowth. Disruption of axon outgrowth may be one of the mechanisms by which paclitaxel treatment results in sensory peripheral neuropathy in susceptible patients.
Collapse
|
15
|
Imputation-based genomic coverage assessments of current human genotyping arrays. G3-GENES GENOMES GENETICS 2013; 3:1795-807. [PMID: 23979933 PMCID: PMC3789804 DOI: 10.1534/g3.113.007161] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. “Genomic coverage” is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50–80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies.
Collapse
|
16
|
Gasparini CF, Sutherland HG, Griffiths LR. Studies on the pathophysiology and genetic basis of migraine. Curr Genomics 2013; 14:300-15. [PMID: 24403849 PMCID: PMC3763681 DOI: 10.2174/13892029113149990007] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 07/09/2013] [Accepted: 07/09/2013] [Indexed: 01/01/2023] Open
Abstract
Migraine is a neurological disorder that affects the central nervous system causing painful attacks of headache. A genetic vulnerability and exposure to environmental triggers can influence the migraine phenotype. Migraine interferes in many facets of people's daily life including employment commitments and their ability to look after their families resulting in a reduced quality of life. Identification of the biological processes that underlie this relatively common affliction has been difficult because migraine does not have any clearly identifiable pathology or structural lesion detectable by current medical technology. Theories to explain the symptoms of migraine have focused on the physiological mechanisms involved in the various phases of headache and include the vascular and neurogenic theories. In relation to migraine pathophysiology the trigeminovascular system and cortical spreading depression have also been implicated with supporting evidence from imaging studies and animal models. The objective of current research is to better understand the pathways and mechanisms involved in causing pain and headache to be able to target interventions. The genetic component of migraine has been teased apart using linkage studies and both candidate gene and genome-wide association studies, in family and case-control cohorts. Genomic regions that increase individual risk to migraine have been identified in neurological, vascular and hormonal pathways. This review discusses knowledge of the pathophysiology and genetic basis of migraine with the latest scientific evidence from genetic studies.
Collapse
Affiliation(s)
| | | | - Lyn R Griffiths
- Genomics Research Centre, Griffith Health Institute, Griffith University, Gold Coast Campus, Building G05, GRIFFITH UNIVERSITY QLD 4222, Australia
| |
Collapse
|
17
|
A systematic review of cancer GWAS and candidate gene meta-analyses reveals limited overlap but similar effect sizes. Eur J Hum Genet 2013; 22:402-8. [PMID: 23881057 DOI: 10.1038/ejhg.2013.161] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Revised: 04/05/2013] [Accepted: 06/19/2013] [Indexed: 01/03/2023] Open
Abstract
Candidate gene and genome-wide association studies (GWAS) represent two complementary approaches to uncovering genetic contributions to common diseases. We systematically reviewed the contributions of these approaches to our knowledge of genetic associations with cancer risk by analyzing the data in the Cancer Genome-wide Association and Meta Analyses database (Cancer GAMAdb). The database catalogs studies published since January 1, 2000, by study and cancer type. In all, we found that meta-analyses and pooled analyses of candidate genes reported 349 statistically significant associations and GWAS reported 269, for a total of 577 unique associations. Only 41 (7.1%) associations were reported in both candidate gene meta-analyses and GWAS, usually with similar effect sizes. When considering only noteworthy associations (defined as those with false-positive report probabilities≤0.2) and accounting for indirect overlap, we found 202 associations, with 27 of those appearing in both meta-analyses and GWAS. Our findings suggest that meta-analyses of well-conducted candidate gene studies may continue to add to our understanding of the genetic associations in the post-GWAS era.
Collapse
|