1
|
Clouard C, Ausmees K, Nettelblad C. A joint use of pooling and imputation for genotyping SNPs. BMC Bioinformatics 2022; 23:421. [PMID: 36229780 PMCID: PMC9563787 DOI: 10.1186/s12859-022-04974-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 09/29/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. RESULTS We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. CONCLUSIONS We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation on human data. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.
Collapse
Affiliation(s)
- Camille Clouard
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| | - Kristiina Ausmees
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, hus 10, 75237 Uppsala, Sweden
| |
Collapse
|
2
|
Vergara C, Parker MM, Franco L, Cho MH, Valencia-Duarte AV, Beaty TH, Duggal P. Genotype imputation performance of three reference panels using African ancestry individuals. Hum Genet 2018; 137:281-292. [PMID: 29637265 PMCID: PMC6209094 DOI: 10.1007/s00439-018-1881-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2018] [Accepted: 03/31/2018] [Indexed: 12/22/2022]
Abstract
Genotype imputation estimates unobserved genotypes from genome-wide makers, to increase genome coverage and power for genome-wide association studies. Imputation has been successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We compared the performance of these reference panels when imputing variation in 3747 African Americans (AA) from two cohorts (HCV and COPDGene) genotyped using Illumina Omni microarrays. The haplotypes of 2504 (1000G), 883 (CAAPA) and 32,470 individuals (HRC) were used as reference. We compared the number of variants, imputation quality, imputation accuracy and coverage between panels. In both cohorts, 1000G imputed 1.5-1.6× more variants than CAAPA and 1.2× more than HRC. Similar findings were observed for variants with imputation R2 > 0.5 and for rare, low-frequency, and common variants. When merging imputed variants of the three panels, the total number was 62-63 M with 20 M overlapping variants imputed by all three panels, and a range of 5-15 M variants imputed exclusively with one of them. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. 1000G, HRC and CAAPA provided high performance and accuracy for imputation of African American individuals, increasing the number of variants available for subsequent analyses. These panels are complementary and would benefit from the development of an integrated African reference panel.
Collapse
Affiliation(s)
| | - Margaret M Parker
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Liliana Franco
- National School of Public Health, Universidad de Antioquia, Medellín, Colombia
- School of Medicine, Universidad Pontificia Bolivariana, Medellín, Colombia
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Terri H Beaty
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA
| | - Priya Duggal
- Johns Hopkins University, Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
3
|
Lert-Itthiporn W, Suktitipat B, Grove H, Sakuntabhai A, Malasit P, Tangthawornchaikul N, Matsuda F, Suriyaphol P. Validation of genotype imputation in Southeast Asian populations and the effect of single nucleotide polymorphism annotation on imputation outcome. BMC MEDICAL GENETICS 2018; 19:23. [PMID: 29439659 PMCID: PMC5812212 DOI: 10.1186/s12881-018-0534-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2016] [Accepted: 01/24/2018] [Indexed: 11/24/2022]
Abstract
Background Imputation involves the inference of untyped single nucleotide polymorphisms (SNPs) in genome-wide association studies. The haplotypic reference of choice for imputation in Southeast Asian populations is unclear. Moreover, the influence of SNP annotation on imputation results has not been examined. Methods This study was divided into two parts. In the first part, we applied imputation to genotyped SNPs from Southeast Asian populations from the Pan-Asian SNP database. Five percent of the total SNPs were removed. The remaining SNPs were applied to imputation with IMPUTE2. The imputed outcomes were verified with the removed SNPs. We compared imputation references from Chinese and Japanese haplotypes from the HapMap phase II (HMII) and the complete set of haplotypes from the 1000 Genomes Project (1000G). The second part was imputation accuracy and yield in Thai patient dataset. Half of the autosomal SNPs was removed to create Set 1. Another dataset, Set 2, was then created where we switched which half of the SNPs were removed. Both Set 1 and Set 2 were imputed with HMII to create a complete imputed SNPs dataset. The dataset was used to validate association testing, SNPs annotation and imputation outcome. Results The accuracy was highest for all populations when using the HMII reference, but at the cost of a lower yield. Thai genotypes showed the highest accuracy over other populations in both HMII and 1000G panels, although accuracy and yield varied across chromosomes. Imputation was tested in a clinical dataset to compare accuracy in gene-related regions, and coding regions were found to have a higher accuracy and yield. Conclusions This work provides the first evidence of imputation reference selection for Southeast Asian studies and highlights the effects of SNP locations respective to genes on imputation outcome. Researchers will need to consider the trade-off between accuracy and yield in future imputation studies. Electronic supplementary material The online version of this article (10.1186/s12881-018-0534-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Worachart Lert-Itthiporn
- Molecular Medicine Graduate Program, Faculty of Science, Mahidol University, Bangkok, Thailand.,Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Bhoom Suktitipat
- Integrative Computational BioScience Center, Department of Biochemistry, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand.,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Harald Grove
- Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand.,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Anavaj Sakuntabhai
- Unité de Génétique Fonctionnelle des Maladies Infectieuses, Department Genome and Genetics, Institut Pasteur, Paris, France.,Centre National de la Recherche Scientifique, URA3012, Paris, France.,Systems Biology of Diseases Research Unit, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Prida Malasit
- Medical Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Bangkok, Thailand.,Division of Dengue Hemorrhagic Fever Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Nattaya Tangthawornchaikul
- Medical Biotechnology Research Unit, National Center for Genetic Engineering and Biotechnology, National Science and Technology Development Agency, Bangkok, Thailand.,Division of Dengue Hemorrhagic Fever Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Fumihiko Matsuda
- Center for Genomic Medicine, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Prapat Suriyaphol
- Division of Bioinformatics and Data Management for Research, Department of Research and Development, Faculty of Medicine, Siriraj Hospital, Mahidol University, Bangkok, Thailand. .,Center of Excellence in Bioinformatics and Clinical Data Management, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
4
|
When Does Choice of Accuracy Measure Alter Imputation Accuracy Assessments? PLoS One 2015; 10:e0137601. [PMID: 26458263 PMCID: PMC4601794 DOI: 10.1371/journal.pone.0137601] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 08/19/2015] [Indexed: 11/20/2022] Open
Abstract
Imputation, the process of inferring genotypes for untyped variants, is used to identify and refine genetic association findings. Inaccuracies in imputed data can distort the observed association between variants and a disease. Many statistics are used to assess accuracy; some compare imputed to genotyped data and others are calculated without reference to true genotypes. Prior work has shown that the Imputation Quality Score (IQS), which is based on Cohen’s kappa statistic and compares imputed genotype probabilities to true genotypes, appropriately adjusts for chance agreement; however, it is not commonly used. To identify differences in accuracy assessment, we compared IQS with concordance rate, squared correlation, and accuracy measures built into imputation programs. Genotypes from the 1000 Genomes reference populations (AFR N = 246 and EUR N = 379) were masked to match the typed single nucleotide polymorphism (SNP) coverage of several SNP arrays and were imputed with BEAGLE 3.3.2 and IMPUTE2 in regions associated with smoking behaviors. Additional masking and imputation was conducted for sequenced subjects from the Collaborative Genetic Study of Nicotine Dependence and the Genetic Study of Nicotine Dependence in African Americans (N = 1,481 African Americans and N = 1,480 European Americans). Our results offer further evidence that concordance rate inflates accuracy estimates, particularly for rare and low frequency variants. For common variants, squared correlation, BEAGLE R2, IMPUTE2 INFO, and IQS produce similar assessments of imputation accuracy. However, for rare and low frequency variants, compared to IQS, the other statistics tend to be more liberal in their assessment of accuracy. IQS is important to consider when evaluating imputation accuracy, particularly for rare and low frequency variants.
Collapse
|
5
|
Advances in Human Biology: Combining Genetics and Molecular Biophysics to Pave the Way for Personalized Diagnostics and Medicine. ACTA ACUST UNITED AC 2014. [DOI: 10.1155/2014/471836] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Advances in several biology-oriented initiatives such as genome sequencing and structural genomics, along with the progress made through traditional biological and biochemical research, have opened up a unique opportunity to better understand the molecular effects of human diseases. Human DNA can vary significantly from person to person and determines an individual’s physical characteristics and their susceptibility to diseases. Armed with an individual’s DNA sequence, researchers and physicians can check for defects known to be associated with certain diseases by utilizing various databases. However, for unclassified DNA mutations or in order to reveal molecular mechanism behind the effects, the mutations have to be mapped onto the corresponding networks and macromolecular structures and then analyzed to reveal their effect on the wild type properties of biological processes involved. Predicting the effect of DNA mutations on individual’s health is typically referred to as personalized or companion diagnostics. Furthermore, once the molecular mechanism of the mutations is revealed, the patient should be given drugs which are the most appropriate for the individual genome, referred to as pharmacogenomics. Altogether, the shift in focus in medicine towards more genomic-oriented practices is the foundation of personalized medicine. The progress made in these rapidly developing fields is outlined.
Collapse
|
6
|
Alsmadi O, John SE, Thareja G, Hebbar P, Antony D, Behbehani K, Thanaraj TA. Genome at juncture of early human migration: a systematic analysis of two whole genomes and thirteen exomes from Kuwaiti population subgroup of inferred Saudi Arabian tribe ancestry. PLoS One 2014; 9:e99069. [PMID: 24896259 PMCID: PMC4045902 DOI: 10.1371/journal.pone.0099069] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 05/10/2014] [Indexed: 01/19/2023] Open
Abstract
Population of the State of Kuwait is composed of three genetic subgroups of inferred Persian, Saudi Arabian tribe and Bedouin ancestry. The Saudi Arabian tribe subgroup traces its origin to the Najd region of Saudi Arabia. By sequencing two whole genomes and thirteen exomes from this subgroup at high coverage (>40X), we identify 4,950,724 Single Nucleotide Polymorphisms (SNPs), 515,802 indels and 39,762 structural variations. Of the identified variants, 10,098 (8.3%) exomic SNPs, 139,923 (2.9%) non-exomic SNPs, 5,256 (54.3%) exomic indels, and 374,959 (74.08%) non-exomic indels are 'novel'. Up to 8,070 (79.9%) of the reported novel biallelic exomic SNPs are seen in low frequency (minor allele frequency <5%). We observe 5,462 known and 1,004 novel potentially deleterious nonsynonymous SNPs. Allele frequencies of common SNPs from the 15 exomes is significantly correlated with those from genotype data of a larger cohort of 48 individuals (Pearson correlation coefficient, 0.91; p <2.2×10-16). A set of 2,485 SNPs show significantly different allele frequencies when compared to populations from other continents. Two notable variants having risk alleles in high frequencies in this subgroup are: a nonsynonymous deleterious SNP (rs2108622 [19:g.15990431C>T] from CYP4F2 gene [MIM:*604426]) associated with warfarin dosage levels [MIM:#122700] required to elicit normal anticoagulant response; and a 3' UTR SNP (rs6151429 [22:g.51063477T>C]) from ARSA gene [MIM:*607574]) associated with Metachromatic Leukodystrophy [MIM:#250100]. Hemoglobin Riyadh variant (identified for the first time in a Saudi Arabian woman) is observed in the exome data. The mitochondrial haplogroup profiles of the 15 individuals are consistent with the haplogroup diversity seen in Saudi Arabian natives, who are believed to have received substantial gene flow from Africa and eastern provenance. We present the first genome resource imperative for designing future genetic studies in Saudi Arabian tribe subgroup. The full-length genome sequences and the identified variants are available at ftp://dgr.dasmaninstitute.org and http://dgr.dasmaninstitute.org/DGR/gb.html.
Collapse
Affiliation(s)
- Osama Alsmadi
- Dasman Diabetes Institute, Dasman, Kuwait
- * E-mail: (TAT); (OA)
| | | | | | | | | | | | | |
Collapse
|
7
|
|
8
|
Abstract
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Collapse
|
9
|
Song C, Chen GK, Millikan RC, Ambrosone CB, John EM, Bernstein L, Zheng W, Hu JJ, Ziegler RG, Nyante S, Bandera EV, Ingles SA, Press MF, Deming SL, Rodriguez-Gil JL, Chanock SJ, Wan P, Sheng X, Pooler LC, Van Den Berg DJ, Le Marchand L, Kolonel LN, Henderson BE, Haiman CA, Stram DO. A genome-wide scan for breast cancer risk haplotypes among African American women. PLoS One 2013; 8:e57298. [PMID: 23468962 PMCID: PMC3585353 DOI: 10.1371/journal.pone.0057298] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 01/23/2013] [Indexed: 12/03/2022] Open
Abstract
Genome-wide association studies (GWAS) simultaneously investigating hundreds of thousands of single nucleotide polymorphisms (SNP) have become a powerful tool in the investigation of new disease susceptibility loci. Haplotypes are sometimes thought to be superior to SNPs and are promising in genetic association analyses. The application of genome-wide haplotype analysis, however, is hindered by the complexity of haplotypes themselves and sophistication in computation. We systematically analyzed the haplotype effects for breast cancer risk among 5,761 African American women (3,016 cases and 2,745 controls) using a sliding window approach on the genome-wide scale. Three regions on chromosomes 1, 4 and 18 exhibited moderate haplotype effects. Furthermore, among 21 breast cancer susceptibility loci previously established in European populations, 10p15 and 14q24 are likely to harbor novel haplotype effects. We also proposed a heuristic of determining the significance level and the effective number of independent tests by the permutation analysis on chromosome 22 data. It suggests that the effective number was approximately half of the total (7,794 out of 15,645), thus the half number could serve as a quick reference to evaluating genome-wide significance if a similar sliding window approach of haplotype analysis is adopted in similar populations using similar genotype density.
Collapse
Affiliation(s)
- Chi Song
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Gary K. Chen
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Robert C. Millikan
- Department of Epidemiology, Gillings School of Global Public Health, and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Christine B. Ambrosone
- Department of Cancer Prevention and Control, Roswell Park Cancer Institute, Buffalo, New York, United States of America
| | - Esther M. John
- Cancer Prevention Institute of California, Fremont, California, United States of America
- Stanford University School of Medicine and Stanford Cancer Institute, Stanford, California, United States of America
| | - Leslie Bernstein
- Division of Cancer Etiology, Department of Population Science, Beckman Research Institute, City of Hope, Duarte, California, United States of America
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Jennifer J. Hu
- Sylvester Comprehensive Cancer Center and Department of Epidemiology and Public Health, University of Miami Miller School of Medicine, Miami, Florida, United States of America
| | - Regina G. Ziegler
- Epidemiology and Biostatistics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Sarah Nyante
- Department of Epidemiology, Gillings School of Global Public Health, and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Elisa V. Bandera
- The Cancer Institute of New Jersey, New Brunswick, New Jersey, United States of America
| | - Sue A. Ingles
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Michael F. Press
- Department of Pathology, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Sandra L. Deming
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, and Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Jorge L. Rodriguez-Gil
- Sylvester Comprehensive Cancer Center and Department of Epidemiology and Public Health, University of Miami Miller School of Medicine, Miami, Florida, United States of America
| | - Stephen J. Chanock
- Epidemiology and Biostatistics Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, United States of America
| | - Peggy Wan
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Xin Sheng
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Loreall C. Pooler
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - David J. Van Den Berg
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
- Epigenome Center, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Laurence N. Kolonel
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America
| | - Brian E. Henderson
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Chris A. Haiman
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Daniel O. Stram
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
10
|
Hancock DB, Levy JL, Gaddis NC, Bierut LJ, Saccone NL, Page GP, Johnson EO. Assessment of genotype imputation performance using 1000 Genomes in African American studies. PLoS One 2012; 7:e50610. [PMID: 23226329 PMCID: PMC3511547 DOI: 10.1371/journal.pone.0050610] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 10/26/2012] [Indexed: 11/19/2022] Open
Abstract
Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina's HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%-93%), but IMPUTE2 had the highest IQS (81%-83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.
Collapse
Affiliation(s)
- Dana B Hancock
- Behavioral Health Epidemiology Program, Research Triangle Institute International, Research Triangle Park, North Carolina, United States of America.
| | | | | | | | | | | | | |
Collapse
|
11
|
Anasagasti A, Irigoyen C, Barandika O, López de Munain A, Ruiz-Ederra J. Current mutation discovery approaches in Retinitis Pigmentosa. Vision Res 2012; 75:117-29. [PMID: 23022136 DOI: 10.1016/j.visres.2012.09.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Revised: 09/08/2012] [Accepted: 09/13/2012] [Indexed: 12/22/2022]
Abstract
With a worldwide prevalence of about 1 in 3500-5000 individuals, Retinitis Pigmentosa (RP) is the most common form of hereditary retinal degeneration. It is an extremely heterogeneous group of genetically determined retinal diseases leading to progressive loss of vision due to impairment of rod and cone photoreceptors. RP can be inherited as an autosomal-recessive, autosomal-dominant, or X-linked trait. Non-Mendelian inheritance patterns such as digenic, maternal (mitochondrial) or compound heterozygosity have also been reported. To date, more than 65 genes have been implicated in syndromic and non-syndromic forms of RP, which account for only about 60% of all RP cases. Due to this high heterogeneity and diversity of inheritance patterns, the molecular diagnosis of syndromic and non-syndromic RP is very challenging, and the heritability of 40% of total RP cases worldwide remains unknown. However new sequencing methodologies, boosted by the human genome project, have contributed to exponential plummeting in sequencing costs, thereby making it feasible to include molecular testing for RP patients in routine clinical practice within the coming years. Here, we summarize the most widely used state-of-the-art technologies currently applied for the molecular diagnosis of RP, and address their strengths and weaknesses for the molecular diagnosis of such a complex genetic disease.
Collapse
Affiliation(s)
- Ander Anasagasti
- Division of Neurosciences, Instituto Biodonostia, San Sebastián, Gipuzkoa, Spain
| | | | | | | | | |
Collapse
|