1
|
Herzig AF, Velo-Suárez L, Dina C, Redon R, Deleuze JF, Génin E. How local reference panels improve imputation in French populations. Sci Rep 2024; 14:370. [PMID: 38172507 PMCID: PMC10764714 DOI: 10.1038/s41598-023-49931-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 12/13/2023] [Indexed: 01/05/2024] Open
Abstract
Imputation servers offer the exclusive possibility to harness the largest public reference panels which have been shown to deliver very high precision in the imputation of European genomes. Many studies have nonetheless stressed the importance of 'study specific panels' (SSPs) as an alternative and have shown the benefits of combining public reference panels with SSPs. But such combined approaches are not attainable when using external imputation servers. To investigate how to confront this challenge, we imputed 550 French individuals using either the University of Michigan imputation server with the Haplotype Reference Consortium (HRC) panel or an in-house SSP of 850 whole-genome sequenced French individuals. With approximate geo-localization of both our target and SSP individuals we are able to pinpoint different scenarios where SSP-based imputation would be preferred over server-based imputation or vice-versa. This is achieved by showing to a high degree of resolution the importance of the proximity of the reference panel to target individuals; with a focus on the clear added value of SSPs for estimating haplotype phase and for the imputation of rare variants (minor allele-frequency below 0.01). Such benefits were most evident for individuals from the same geographical regions in France as the SSP individuals. Overall, only 42.3% of all 125,442 variants evaluated were better imputed with an SSP from France compared to an external reference panel, however this rises to 58.1% for individuals from geographic regions well covered by the SSP. By investigating haplotype sharing and population fine-structure in France, we show the importance of including SSP haplotypes for imputation but also that they should ideally be combined with large public panels. In the absence of the unattainable results from a combined panel of the HRC and our French SSP, we put forward a pragmatic solution where server-based and SSP-based imputation outcomes can be combined based on comparing posterior genotype probabilities. We show that such an approach can give a level of imputation accuracy in excess of what could be achieved with either strategy alone. The results presented provide detailed insights into the accuracy of imputation that should be expected from different strategies for European populations.
Collapse
Affiliation(s)
| | - Lourdes Velo-Suárez
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| | - Christian Dina
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Richard Redon
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du thorax, Nantes, France
| | - Jean-François Deleuze
- Centre National de Recherche en Génomique Humaine (CNRGH), Université Paris-Saclay, CEA, Evry, France
- Fondation Jean Dausset - Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France
| | - Emmanuelle Génin
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- CHRU Brest, Brest, France
| |
Collapse
|
2
|
Dekeyser T, Génin E, Herzig AF. Opening the Black Box of Imputation Software to Study the Impact of Reference Panel Composition on Performance. Genes (Basel) 2023; 14:410. [PMID: 36833337 PMCID: PMC9956390 DOI: 10.3390/genes14020410] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/22/2023] [Accepted: 01/30/2023] [Indexed: 02/09/2023] Open
Abstract
Genotype imputation is widely used to enrich genetic datasets. The operation relies on panels of known reference haplotypes, typically with whole-genome sequencing data. How to choose a reference panel has been widely studied and it is essential to have a panel that is well matched to the individuals who require missing genotype imputation. However, it is broadly accepted that such an imputation panel will have an enhanced performance with the inclusion of diversity (haplotypes from many different populations). We investigate this observation by examining, in fine detail, exactly which reference haplotypes are contributing at different regions of the genome. This is achieved using a novel method of inserting synthetic genetic variation into the reference panel in order to track the performance of leading imputation algorithms. We show that while diversity may globally improve imputation accuracy, there can be occasions where incorrect genotypes are imputed following the inclusion of more diverse haplotypes in the reference panel. We, however, demonstrate a technique for retaining and benefitting from the diversity in the reference panel whilst avoiding the occasional adverse effects on imputation accuracy. What is more, our results more clearly elucidate the role of diversity in a reference panel than has been shown in previous studies.
Collapse
Affiliation(s)
- Thibault Dekeyser
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
- CHRU Brest, F-29200 Brest, France
| | - Emmanuelle Génin
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
- CHRU Brest, F-29200 Brest, France
| | - Anthony F. Herzig
- Inserm, Université de Brest, EFS, UMR 1078, GGB, F-29200 Brest, France
| |
Collapse
|
3
|
Kentistou KA, Luan J, Wittemans LBL, Hambly C, Klaric L, Kutalik Z, Speakman JR, Wareham NJ, Kendall TJ, Langenberg C, Wilson JF, Joshi PK, Morton NM. Large scale phenotype imputation and in vivo functional validation implicate ADAMTS14 as an adiposity gene. Nat Commun 2023; 14:307. [PMID: 36658113 PMCID: PMC9852585 DOI: 10.1038/s41467-022-35563-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 12/09/2022] [Indexed: 01/20/2023] Open
Abstract
Obesity remains an unmet global health burden. Detrimental anatomical distribution of body fat is a major driver of obesity-mediated mortality risk and is demonstrably heritable. However, our understanding of the full genetic contribution to human adiposity is incomplete, as few studies measure adiposity directly. To address this, we impute whole-body imaging adiposity phenotypes in UK Biobank from the 4,366 directly measured participants onto the rest of the cohort, greatly increasing our discovery power. Using these imputed phenotypes in 392,535 participants yielded hundreds of genome-wide significant associations, six of which replicate in independent cohorts. The leading causal gene candidate, ADAMTS14, is further investigated in a mouse knockout model. Concordant with the human association data, the Adamts14-/- mice exhibit reduced adiposity and weight-gain under obesogenic conditions, alongside an improved metabolic rate and health. Thus, we show that phenotypic imputation at scale offers deeper biological insights into the genetics of human adiposity that could lead to therapeutic targets.
Collapse
Affiliation(s)
- Katherine A Kentistou
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, EH16 4TJ, UK
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, UK
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Jian'an Luan
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Laura B L Wittemans
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Catherine Hambly
- Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, AB24 2TZ, UK
| | - Lucija Klaric
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Zoltán Kutalik
- Centre for Primary Care and Public Health, University of Lausanne, Lausanne, 1010, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - John R Speakman
- Institute of Biological and Environmental Sciences, University of Aberdeen, Aberdeen, AB24 2TZ, UK
- Centre for Energy Metabolism and Reproduction, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Metabolic Health, CAS Centre of Excellence in Animal Evolution and Genetics, Kunming, China
| | - Nicholas J Wareham
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK
| | - Timothy J Kendall
- Centre for Inflammation Research, University of Edinburgh, Edinburgh, EH16 4TJ, UK
| | - Claudia Langenberg
- MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, CB2 0QQ, UK
- Computational Medicine, Berlin Institute of Health (BIH) Charité University Medicine, Berlin, Germany
| | - James F Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, UK
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU, UK
| | - Peter K Joshi
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - Nicholas M Morton
- Centre for Cardiovascular Science, Queen's Medical Research Institute, University of Edinburgh, Edinburgh, EH16 4TJ, UK.
| |
Collapse
|
4
|
Nutile T, Ruggiero D, Herzig AF, Tirozzi A, Nappo S, Sorice R, Marangio F, Bellenguez C, Leutenegger AL, Ciullo M. Whole-Exome Sequencing in the Isolated Populations of Cilento from South Italy. Sci Rep 2019; 9:4059. [PMID: 30858532 PMCID: PMC6411969 DOI: 10.1038/s41598-019-41022-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 02/22/2019] [Indexed: 12/21/2022] Open
Abstract
The present study describes the genetic architecture of the isolated populations of Cilento, through the analysis of exome sequence data of 245 representative individuals of these populations. By annotating the exome variants and cataloguing them according to their frequency and functional effects, we identified 347,684 variants, 67.4% of which are rare and low frequency variants, and 1% of them (corresponding to 319 variants per person) are classified as high functional impact variants; also, 39,946 (11.5% of the total) are novel variants, for which we determined a significant enrichment for deleterious effects. By comparing the allele frequencies in Cilento with those from the Tuscan population from the 1000 Genomes Project Phase 3, we highlighted an increase in allele frequency in Cilento especially for variants which map to genes involved in extracellular matrix formation and organization. Furthermore, among the variants showing increased frequency we identified several known rare disease-causing variants. By different population genetics analyses, we corroborated the status of the Cilento populations as genetic isolates. Finally, we showed that exome data of Cilento represents a useful local reference panel capable of improving the accuracy of genetic imputation, thus adding power to genetic studies of human traits in these populations.
Collapse
Affiliation(s)
- T Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - D Ruggiero
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - A F Herzig
- Inserm, UMR 946, Genetic variation and Human diseases, F-75010, Paris, France.,Université Paris-Diderot, Sorbonne Paris Cité, UMR946, F-75010, Paris, France
| | - A Tirozzi
- IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - S Nappo
- AORN Santobono-Pausilipon Hospital, Naples, Italy
| | - R Sorice
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - F Marangio
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - C Bellenguez
- Inserm, U1167, RID-AGE-Risk factors and molecular determinants of aging-related diseases, F-59000, Lille, France.,Institut Pasteur de Lille, F-59000, Lille, France.,Univ. Lille, U1167-Excellence Laboratory LabEx DISTALZ, F-59000, Lille, France
| | - A L Leutenegger
- Inserm, UMR 946, Genetic variation and Human diseases, F-75010, Paris, France.,Université Paris-Diderot, Sorbonne Paris Cité, UMR946, F-75010, Paris, France
| | - M Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy. .,IRCCS Neuromed, Pozzilli, Isernia, Italy.
| |
Collapse
|
5
|
Revisit Population-based and Family-based Genotype Imputation. Sci Rep 2019; 9:1800. [PMID: 30755687 PMCID: PMC6372660 DOI: 10.1038/s41598-018-38469-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 12/27/2018] [Indexed: 11/12/2022] Open
Abstract
Genome-Wide Association (GWA) with population-based imputation (PBI) has been successful in identifying common variants associated with complex diseases; however, much heritability remains to be explained and low frequency variants (LFV) may contribute. To identify LFV, a study of unrelated individuals may no longer be as efficient as a family study, where rare population variants can be frequent in families. Family-based imputation (FBI) provides an opportunity to evaluate LFV. To compare the performance of PBI and FBI, we conducted extensive simulations, generating genotypes using SeqSIMLA from various reference panels for families. We masked genotype information for variants unavailable in Framingham 550 K GWA genotype data in less informative subjects selected by GIGI-Pick. We implemented IMPUTE2 with duoHMM in SHAPEIT (Impute2_duoHMM) for PBI, MERLIN and GIGI for FBI and PedBLIMP for a hybrid approach. In general, FBI in both MERLIN and GIGI outperformed other approaches with imputation accuracy greater than 0.99 for the squared correlation and imputation quality scores (IQS) especially for LFV, although imputation accuracy from MERLIN depends on pedigree splitting for larger families. PBI performed worst with the exception of good imputation accuracy for common variants when a closely ancestry matched reference is used. In summary, linkage disequilibrium (LD) information from large available genotype resources provides good imputation for common variants with well-selected reference panels without requiring densely sequenced data in family members, while imputation of LFV with FBI benefits more from information on inheritance patterns within families yielding better imputation.
Collapse
|
6
|
Herzig AF, Nutile T, Babron MC, Ciullo M, Bellenguez C, Leutenegger AL. Strategies for phasing and imputation in a population isolate. Genet Epidemiol 2018; 42:201-213. [PMID: 29319195 DOI: 10.1002/gepi.22109] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 11/05/2022]
Abstract
In the search for genetic associations with complex traits, population isolates offer the advantage of reduced genetic and environmental heterogeneity. In addition, cost-efficient next-generation association approaches have been proposed in these populations where only a subsample of representative individuals is sequenced and then genotypes are imputed into the rest of the population. Gene mapping in such populations thus requires high-quality genetic imputation and preliminary phasing. To identify an effective study design, we compare by simulation a range of phasing and imputation software and strategies. We simulated 1,115,604 variants on chromosome 10 for 477 members of the large complex pedigree of Campora, a village within the established isolate of Cilento in southern Italy. We assessed the phasing performance of identical by descent based software ALPHAPHASE and SLRP, LD-based software SHAPEIT2, SHAPEIT3, and BEAGLE, and new software EAGLE that combines both methodologies. For imputation we compared IMPUTE2, IMPUTE4, MINIMAC3, BEAGLE, and new software PBWT. Genotyping errors and missing genotypes were simulated to observe their effects on the performance of each software. Highly accurate phased data were achieved by all software with SHAPEIT2, SHAPEIT3, and EAGLE2 providing the most accurate results. MINIMAC3, IMPUTE4, and IMPUTE2 all performed strongly as imputation software and our study highlights the considerable gain in imputation accuracy provided by a genome sequenced reference panel specific to the population isolate.
Collapse
Affiliation(s)
- Anthony Francis Herzig
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Teresa Nutile
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy
| | - Marie-Claude Babron
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| | - Marina Ciullo
- Institute of Genetics and Biophysics A. Buzzati-Traverso-CNR, Naples, Italy.,IRCCS Neuromed, Pozzilli, Isernia, Italy
| | - Céline Bellenguez
- Inserm, U1167, RID-AGE-Risk Factors and Molecular Determinants of Aging-Related Diseases, Lille, France.,Institut Pasteur de Lille, Lille, France.,Université de Lille, U1167-Excellence Laboratory LabEx DISTALZ, Lille, France
| | - Anne-Louise Leutenegger
- Université Paris-Diderot, Sorbonne Paris Cité, U946, Paris, France.,Inserm, U946, Genetic Variation and Human Diseases, Paris, France
| |
Collapse
|
7
|
Nagy R, Boutin TS, Marten J, Huffman JE, Kerr SM, Campbell A, Evenden L, Gibson J, Amador C, Howard DM, Navarro P, Morris A, Deary IJ, Hocking LJ, Padmanabhan S, Smith BH, Joshi P, Wilson JF, Hastie ND, Wright AF, McIntosh AM, Porteous DJ, Haley CS, Vitart V, Hayward C. Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants. Genome Med 2017; 9:23. [PMID: 28270201 PMCID: PMC5339960 DOI: 10.1186/s13073-017-0414-4] [Citation(s) in RCA: 73] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Accepted: 02/09/2017] [Indexed: 01/31/2023] Open
Abstract
Background The Generation Scotland: Scottish Family Health Study (GS:SFHS) is a family-based population cohort with DNA, biological samples, socio-demographic, psychological and clinical data from approximately 24,000 adult volunteers across Scotland. Although data collection was cross-sectional, GS:SFHS became a prospective cohort due to of the ability to link to routine Electronic Health Record (EHR) data. Over 20,000 participants were selected for genotyping using a large genome-wide array. Methods GS:SFHS was analysed using genome-wide association studies (GWAS) to test the effects of a large spectrum of variants, imputed using the Haplotype Research Consortium (HRC) dataset, on medically relevant traits measured directly or obtained from EHRs. The HRC dataset is the largest available haplotype reference panel for imputation of variants in populations of European ancestry and allows investigation of variants with low minor allele frequencies within the entire GS:SFHS genotyped cohort. Results Genome-wide associations were run on 20,032 individuals using both genotyped and HRC imputed data. We present results for a range of well-studied quantitative traits obtained from clinic visits and for serum urate measures obtained from data linkage to EHRs collected by the Scottish National Health Service. Results replicated known associations and additionally reveal novel findings, mainly with rare variants, validating the use of the HRC imputation panel. For example, we identified two new associations with fasting glucose at variants near to Y_RNA and WDR4 and four new associations with heart rate at SNPs within CSMD1 and ASPH, upstream of HTR1F and between PROKR2 and GPCPD1. All were driven by rare variants (minor allele frequencies in the range of 0.08–1%). Proof of principle for use of EHRs was verification of the highly significant association of urate levels with the well-established urate transporter SLC2A9. Conclusions GS:SFHS provides genetic data on over 20,000 participants alongside a range of phenotypes as well as linkage to National Health Service laboratory and clinical records. We have shown that the combination of deeper genotype imputation and extended phenotype availability make GS:SFHS an attractive resource to carry out association studies to gain insight into the genetic architecture of complex traits. Electronic supplementary material The online version of this article (doi:10.1186/s13073-017-0414-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Reka Nagy
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Thibaud S Boutin
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Jonathan Marten
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Jennifer E Huffman
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Shona M Kerr
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Archie Campbell
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, UK
| | - Louise Evenden
- Edinburgh Clinical Research Facility, University of Edinburgh, Edinburgh, UK
| | - Jude Gibson
- Edinburgh Clinical Research Facility, University of Edinburgh, Edinburgh, UK
| | - Carmen Amador
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - David M Howard
- Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, UK
| | - Pau Navarro
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Andrew Morris
- Farr Institute of Health Informatics Research, Edinburgh, UK
| | - Ian J Deary
- Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Lynne J Hocking
- Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
| | - Sandosh Padmanabhan
- Division of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK
| | - Blair H Smith
- Medical Research Institute, University of Dundee, Dundee, UK
| | - Peter Joshi
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - James F Wilson
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH8 9AG, UK
| | - Nicholas D Hastie
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Alan F Wright
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Andrew M McIntosh
- Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh, UK.,Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - David J Porteous
- Centre for Genomic and Experimental Medicine, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Edinburgh, UK.,Centre for Cognitive Ageing and Cognitive Epidemiology, Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Chris S Haley
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Veronique Vitart
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
| | - Caroline Hayward
- MRC Human Genetics Unit, University of Edinburgh, Institute of Genetics and Molecular Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
8
|
Kim YJ, Lee J, Kim BJ, Park T. PreCimp: Pre-collapsing imputation approach increases imputation accuracy of rare variants in terms of collapsed variables. Genet Epidemiol 2016; 41:41-50. [PMID: 27859580 DOI: 10.1002/gepi.22020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Revised: 08/17/2016] [Accepted: 09/21/2016] [Indexed: 12/22/2022]
Abstract
Imputation is widely used for obtaining information about rare variants. However, one issue concerning imputation is the low accuracy of imputed rare variants as the inaccurate imputed rare variants may distort the results of region-based association tests. Therefore, we developed a pre-collapsing imputation method (PreCimp) to improve the accuracy of imputation by using collapsed variables. Briefly, collapsed variables are generated using rare variants in the reference panel, and a new reference panel is constructed by inserting pre-collapsed variables into the original reference panel. Following imputation analysis provides the imputed genotypes of the collapsed variables. We demonstrated the performance of PreCimp on 5,349 genotyped samples using a Korean population specific reference panel including 848 samples of exome sequencing, Affymetrix 5.0, and exome chip. PreCimp outperformed a traditional post-collapsing method that collapses imputed variants after single rare variant imputation analysis. Compared with the results of post-collapsing method, PreCimp approach was shown to relatively increase imputation accuracy about 3.4-6.3% when dosage r2 is between 0.6 and 0.8, 10.9-16.1% when dosage r2 is between 0.4 and 0.6, and 21.4 ∼ 129.4% when dosage r2 is below 0.4.
Collapse
Affiliation(s)
- Young Jin Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, Korea
| | | | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.,Department of Statistics, Seoul National University, Seoul, Korea
| |
Collapse
|
9
|
Kim YJ, Lee J, Kim BJ, Park T. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data. BMC Genomics 2015; 16:1109. [PMID: 26715385 PMCID: PMC4696174 DOI: 10.1186/s12864-015-2192-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 11/03/2015] [Indexed: 02/07/2023] Open
Abstract
Background Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.
Collapse
Affiliation(s)
- Young Jin Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | | | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Department of Statistics, Seoul National University, San 56-1, Shilim-dong, Kwanak-gu, Seoul, 151-742, South Korea.
| |
Collapse
|
10
|
Hoffmann TJ, Sakoda LC, Shen L, Jorgenson E, Habel LA, Liu J, Kvale MN, Asgari MM, Banda Y, Corley D, Kushi LH, Quesenberry CP, Schaefer C, Van Den Eeden SK, Risch N, Witte JS. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort. PLoS Genet 2015; 11:e1004930. [PMID: 25629170 PMCID: PMC4309593 DOI: 10.1371/journal.pgen.1004930] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 12/01/2014] [Indexed: 11/25/2022] Open
Abstract
An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into large well-phenotyped cohorts with existing genome-wide genotype data using large sequenced referenced panels. The success of this approach hinges on the accuracy of rare variant imputation, which remains controversial. For example, a recent study suggested that one cannot adequately impute the HOXB13 G84E mutation associated with prostate cancer risk (carrier frequency of 0.0034 in European ancestry participants in the 1000 Genomes Project). We show that by utilizing the 1000 Genomes Project data plus an enriched reference panel of mutation carriers we were able to accurately impute the G84E mutation into a large cohort of 83,285 non-Hispanic White participants from the Kaiser Permanente Research Program on Genes, Environment and Health Genetic Epidemiology Research on Adult Health and Aging cohort. Imputation authenticity was confirmed via a novel classification and regression tree method, and then empirically validated analyzing a subset of these subjects plus an additional 1,789 men from Kaiser specifically genotyped for the G84E mutation (r2 = 0.57, 95% CI = 0.37−0.77). We then show the value of this approach by using the imputed data to investigate the impact of the G84E mutation on age-specific prostate cancer risk and on risk of fourteen other cancers in the cohort. The age-specific risk of prostate cancer among G84E mutation carriers was higher than among non-carriers. Risk estimates from Kaplan-Meier curves were 36.7% versus 13.6% by age 72, and 64.2% versus 24.2% by age 80, for G84E mutation carriers and non-carriers, respectively (p = 3.4×10−12). The G84E mutation was also associated with an increase in risk for the fourteen other most common cancers considered collectively (p = 5.8×10−4) and more so in cases diagnosed with multiple cancer types, both those including and not including prostate cancer, strongly suggesting pleiotropic effects. An efficient approach to characterizing the disease burden of rare genetic variants is to impute them into existing well-phenotyped cohorts with genome-wide data by using large sequenced reference panels; however, the efficacy of this approach remains controversial. A recent study suggested that it is not possible to impute the rare HOXB13 G84E variant using neighboring SNP markers. We show that by using an enriched reference sequenced sample of 22 mutation carriers, we were able to impute this mutation into a large cohort of 83,285 non-Hispanic White individuals from the Kaiser Permanente Research Program on Genes, Environment, and Health Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The imputation was confirmed via a novel classification and regression tree method, and then empirically validated by direct mutation genotyping of a subset of 1,673 of these individuals in addition to 1,789 other men from Kaiser. Using the same GERA cohort, we then confirmed that the G84E mutation is associated with increased risk of prostate cancer, and estimated the age-specific risk for carriers of the mutation. Finally, we obtained evidence that the mutation is associated with additional types of cancer in the GERA cohort.
Collapse
Affiliation(s)
- Thomas J. Hoffmann
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Lori C. Sakoda
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Ling Shen
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Eric Jorgenson
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Laurel A. Habel
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Jinghua Liu
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Mark N. Kvale
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Maryam M. Asgari
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Yambazi Banda
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
| | - Douglas Corley
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Lawrence H. Kushi
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Charles P. Quesenberry
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Catherine Schaefer
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - Stephen K. Van Den Eeden
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
- Department of Urology, University of California San Francisco, San Francisco, California, United States of America
| | - Neil Risch
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Division of Research, Kaiser Permanente, Northern California, Oakland, California, United States of America
| | - John S. Witte
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics, University of California San Francisco, San Francisco, California, United States of America
- Department of Urology, University of California San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|