1
|
Ko S, Sobel EM, Zhou H, Lange K. Estimation of genetic admixture proportions via haplotypes. Comput Struct Biotechnol J 2024; 23:4384-4395. [PMID: 39737076 PMCID: PMC11683265 DOI: 10.1016/j.csbj.2024.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 11/26/2024] [Accepted: 11/26/2024] [Indexed: 01/01/2025] Open
Abstract
Estimation of ancestral admixture is essential for creating personal genealogies, studying human history, and conducting genome-wide association studies (GWAS). The following three primary methods exist for estimating admixture coefficients. The frequentist approach directly maximizes the binomial loglikelihood. The Bayesian approach adds a reasonable prior and samples the posterior distribution. Finally, the nonparametric approach decomposes the genotype matrix algebraically. Each approach scales successfully to datasets with a million individuals and a million single nucleotide polymorphisms (SNPs). Despite their variety, all current approaches assume independence between SNPs. To achieve independence requires performing LD (linkage disequilibrium) filtering before analysis. Unfortunately, this tactic loses valuable information and usually retains many SNPs still in LD. The present paper explores the option of explicitly incorporating haplotypes in ancestry estimation. Our program, HaploADMIXTURE, operates on adjacent SNP pairs and jointly estimates their haplotype frequencies along with admixture coefficients. This more complex strategy takes advantage of the rich information available in haplotypes and ultimately yields better admixture estimates and better clustering of real populations in curated datasets.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Mathematics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Eric M. Sobel
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kenneth Lange
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Fu H, Shi G. Local Ancestry Inference Based on Population-Specific Single-Nucleotide Polymorphisms-A Study of Admixed Populations in the 1000 Genomes Project. Genes (Basel) 2024; 15:1099. [PMID: 39202458 PMCID: PMC11353365 DOI: 10.3390/genes15081099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 08/09/2024] [Accepted: 08/19/2024] [Indexed: 09/03/2024] Open
Abstract
Human populations have interacted throughout history, and a considerable portion of modern human populations show evidence of admixture. Local ancestry inference (LAI) is focused on detecting the genetic ancestry of chromosomal segments in admixed individuals and has wide applications. In this work, we proposed a new LAI method based on population-specific single-nucleotide polymorphisms (SNPs) and applied it in the analysis of admixed populations in the 1000 Genomes Project (1KGP). Based on population-specific SNPs in a sliding window, we computed local ancestry information vectors, which are moment estimators of local ancestral proportions, for two haplotypes of an admixed individual and inferred the local ancestral origins. Then we used African (AFR), East Asian (EAS), European (EUR) and South Asian (SAS) populations from the 1KGP and indigenous American (AMR) populations from the Human Genome Diversity Project (HGDP) as reference populations and conducted the proposed LAI analysis on African American populations and American populations in the 1KGP. The results were compared with those obtained by RFMix, G-Nomix and FLARE. We demonstrated that the existence of alleles in a chromosomal region that are specific to a particular reference population and the absence of alleles specific to the other reference populations provide reasonable evidence for determining the ancestral origin of the region. Contemporary AFR, AMR and EUR populations approximate ancestral populations of the admixed populations well, and the results from RFMix, G-Nomix and FLARE largely agree with those from the Ancestral Spectrum Analyzer (ASA), in which the proposed method was implemented. When admixtures are ancient and contemporary reference populations do not satisfactorily approximate ancestral populations, the performances of RFMix, G-Nomix and FLARE deteriorate with increased error rates and fragmented chromosomal segments. In contrast, our method provides fair results.
Collapse
Affiliation(s)
| | - Gang Shi
- School of Telecommunications Engineering, Xidian University, 2 South Taibai Road, Xi’an 710071, China;
| |
Collapse
|
3
|
Ko S, Chu BB, Peterson D, Okenwa C, Papp JC, Alexander DH, Sobel EM, Zhou H, Lange KL. Unsupervised discovery of ancestry-informative markers and genetic admixture proportions in biobank-scale datasets. Am J Hum Genet 2023; 110:314-325. [PMID: 36610401 PMCID: PMC9943729 DOI: 10.1016/j.ajhg.2022.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 12/12/2022] [Indexed: 01/09/2023] Open
Abstract
Admixture estimation plays a crucial role in ancestry inference and genome-wide association studies (GWASs). Computer programs such as ADMIXTURE and STRUCTURE are commonly employed to estimate the admixture proportions of sample individuals. However, these programs can be overwhelmed by the computational burdens imposed by the 105 to 106 samples and millions of markers commonly found in modern biobanks. An attractive strategy is to run these programs on a set of ancestry-informative SNP markers (AIMs) that exhibit substantially different frequencies across populations. Unfortunately, existing methods for identifying AIMs require knowing ancestry labels for a subset of the sample. This supervised learning approach creates a chicken and the egg scenario. In this paper, we present an unsupervised, scalable framework that seamlessly carries out AIM selection and likelihood-based estimation of admixture proportions. Our simulated and real data examples show that this approach is scalable to modern biobank datasets. OpenADMIXTURE, our Julia implementation of the method, is open source and available for free.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Benjamin B. Chu
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Daniel Peterson
- Department of Mathematics, Brigham Young University, Provo, UT 84602, USA
| | - Chidera Okenwa
- Department of Mathematics, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jeanette C. Papp
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | | | - Eric M. Sobel
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Corresponding author
| | - Hua Zhou
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kenneth L. Lange
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA,Department of Statistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
4
|
Chu BB, Sobel EM, Wasiolek R, Ko S, Sinsheimer JS, Zhou H, Lange K. A fast Data-Driven method for genotype imputation, phasing, and local ancestry inference: MendelImpute.jl. Bioinformatics 2021; 37:4756-4763. [PMID: 34289008 PMCID: PMC8665755 DOI: 10.1093/bioinformatics/btab489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/18/2021] [Accepted: 07/19/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Current methods for genotype imputation and phasing exploit the volume of data in haplotype reference panels and rely on hidden Markov models. Existing programs all have essentially the same imputation accuracy, are computationally intensive, and generally require pre-phasing the typed markers. RESULTS We introduce a novel data-mining method for genotype imputation and phasing that substitutes highly efficient linear algebra routines for hidden Markov model calculations. This strategy, embodied in our Julia program MendelImpute.jl, avoids explicit assumptions about recombination and population structure while delivering similar prediction accuracy, better memory usage, and an order of magnitude or better run-times compared to the fastest competing method. MendelImpute operates on both dosage data and unphased genotype data and simultaneously imputes missing genotypes and phase at both the typed and untyped SNPs. Finally, MendelImpute naturally extends to global and local ancestry estimation and lends itself to new strategies for data compression and hence faster data transport and sharing. AVAILABILITY Software, documentation, and scripts to reproduce our results are available from https://github.com/OpenMendel/MendelImpute.jl. SUPPLEMENTARY INFORMATION Supplementary data are available from Bioinformatics online.
Collapse
Affiliation(s)
- Benjamin B Chu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Eric M Sobel
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Rory Wasiolek
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Seyoon Ko
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Janet S Sinsheimer
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Hua Zhou
- Department of Biostatistics, Fielding School of Public Health at UCLA, Los Angeles, USA
| | - Kenneth Lange
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, USA
| |
Collapse
|
5
|
Arriaga-MacKenzie IS, Matesi G, Chen S, Ronco A, Marker KM, Hall JR, Scherenberg R, Khajeh-Sharafabadi M, Wu Y, Gignoux CR, Null M, Hendricks AE. Summix: A method for detecting and adjusting for population structure in genetic summary data. Am J Hum Genet 2021; 108:1270-1282. [PMID: 34157305 PMCID: PMC8322937 DOI: 10.1016/j.ajhg.2021.05.016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 05/26/2021] [Indexed: 12/11/2022] Open
Abstract
Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.
Collapse
Affiliation(s)
| | - Gregory Matesi
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Samuel Chen
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Alexandria Ronco
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Katie M Marker
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jordan R Hall
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Ryan Scherenberg
- Business School, University of Colorado Denver, Denver, CO 80204, USA
| | | | - Yinfei Wu
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA
| | - Christopher R Gignoux
- Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA
| | - Megan Null
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Mathematics and Physical Sciences, The College of Idaho, Caldwell, ID 83605, USA
| | - Audrey E Hendricks
- Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO 80204, USA; Human Medical Genetics and Genomics Program, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA; Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO 80045, USA.
| |
Collapse
|
6
|
Genetic Ancestry Inference and Its Application for the Genetic Mapping of Human Diseases. Int J Mol Sci 2021; 22:ijms22136962. [PMID: 34203440 PMCID: PMC8269095 DOI: 10.3390/ijms22136962] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 12/21/2022] Open
Abstract
Admixed populations arise when two or more ancestral populations interbreed. As a result of this admixture, the genome of admixed populations is defined by tracts of variable size inherited from these parental groups and has particular genetic features that provide valuable information about their demographic history. Diverse methods can be used to derive the ancestry apportionment of admixed individuals, and such inferences can be leveraged for the discovery of genetic loci associated with diseases and traits, therefore having important biomedical implications. In this review article, we summarize the most common methods of global and local genetic ancestry estimation and discuss the use of admixture mapping studies in human diseases.
Collapse
|
7
|
Gebrehiwot NZ, Aliloo H, Strucken EM, Marshall K, Al Kalaldeh M, Missohou A, Gibson JP. Inference of Ancestries and Heterozygosity Proportion and Genotype Imputation in West African Cattle Populations. Front Genet 2021; 12:584355. [PMID: 33841491 PMCID: PMC8025404 DOI: 10.3389/fgene.2021.584355] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 02/22/2021] [Indexed: 11/24/2022] Open
Abstract
Several studies have evaluated computational methods that infer the haplotypes from population genotype data in European cattle populations. However, little is known about how well they perform in African indigenous and crossbred populations. This study investigates: (1) global and local ancestry inference; (2) heterozygosity proportion estimation; and (3) genotype imputation in West African indigenous and crossbred cattle populations. Principal component analysis (PCA), ADMIXTURE, and LAMP-LD were used to analyse a medium-density single nucleotide polymorphism (SNP) dataset from Senegalese crossbred cattle. Reference SNP data of East and West African indigenous and crossbred cattle populations were used to investigate the accuracy of imputation from low to medium-density and from medium to high-density SNP datasets using Minimac v3. The first two principal components differentiated Bos indicus from European Bos taurus and African Bos taurus from other breeds. Irrespective of assuming two or three ancestral breeds for the Senegalese crossbreds, breed proportion estimates from ADMIXTURE and LAMP-LD showed a high correlation (r ≥ 0.981). The observed ancestral origin heterozygosity proportion in putative F1 crosses was close to the expected value of 1.0, and clearly differentiated F1 from all other crosses. The imputation accuracies (estimated as correlation) between imputed and the real data in crossbred animals ranged from 0.142 to 0.717 when imputing from low to medium-density, and from 0.478 to 0.899 for imputation from medium to high-density. The imputation accuracy was generally higher when the reference data came from the same geographical region as the target population, and when crossbred reference data was used to impute crossbred genotypes. The lowest imputation accuracies were observed for indigenous breed genotypes. This study shows that ancestral origin heterozygosity can be estimated with high accuracy and will be far superior to the use of observed individual heterozygosity for estimating heterosis in African crossbred populations. It was not possible to achieve high imputation accuracy in West African crossbred or indigenous populations based on reference data sets from East Africa, and population-specific genotyping with high-density SNP assays is required to improve imputation.
Collapse
Affiliation(s)
- Netsanet Z Gebrehiwot
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Hassan Aliloo
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Eva M Strucken
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Karen Marshall
- International Livestock Research Institute and Centre for Tropical Livestock Genetics and Health, Nairobi, Kenya
| | - Mohammad Al Kalaldeh
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| | - Ayao Missohou
- L'École Inter-États des Sciences et Médecine Vétérinaires de Dakar (EISMV), Dakar, Senegal
| | - John P Gibson
- Centre for Genetic Analysis and Applications, School of Environmental and Rural Science, University of New England, Armidale, NSW, Australia
| |
Collapse
|
8
|
Uren C, Hoal EG, Möller M. Putting RFMix and ADMIXTURE to the test in a complex admixed population. BMC Genet 2020; 21:40. [PMID: 32264823 PMCID: PMC7140372 DOI: 10.1186/s12863-020-00845-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 03/24/2020] [Indexed: 12/02/2022] Open
Abstract
Background Global and local ancestry inference in admixed human populations can be performed using computational tools implementing distinct algorithms. The development and resulting accuracy of these tools has been tested largely on populations with relatively straightforward admixture histories but little is known about how well they perform in more complex admixture scenarios. Results Using simulations, we show that RFMix outperforms ADMIXTURE in determining global ancestry proportions even in a complex 5-way admixed population, in addition to assigning local ancestry with an accuracy of 89%. The ability of RFMix to determine global and local ancestry to a high degree of accuracy, particularly in admixed populations provides the opportunity for more accurate association analyses. Conclusion This study highlights the utility of the extension of computational tools to become more compatible to genetically structured populations, as well as the need to expand the sampling of diverse world-wide populations. This is particularly noteworthy as modern-day societies are becoming increasingly genetically complex and some genetic tools and commonly used ancestral populations are less appropriate. Based on these caveats and the results presented here, we suggest that RFMix be used for both global and local ancestry estimation in world-wide complex admixture scenarios particularly when including these estimates in association studies.
Collapse
Affiliation(s)
- Caitlin Uren
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Room 4036, 4th Floor Education Building, Francie van Zijl Drive, Cape Town, 8000, South Africa.
| | - Eileen G Hoal
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Room 4036, 4th Floor Education Building, Francie van Zijl Drive, Cape Town, 8000, South Africa
| | - Marlo Möller
- DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Room 4036, 4th Floor Education Building, Francie van Zijl Drive, Cape Town, 8000, South Africa
| |
Collapse
|
9
|
Fitak RR, Rinkevich SE, Culver M. Genome-Wide Analysis of SNPs Is Consistent with No Domestic Dog Ancestry in the Endangered Mexican Wolf (Canis lupus baileyi). J Hered 2019; 109:372-383. [PMID: 29757430 DOI: 10.1093/jhered/esy009] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Accepted: 02/28/2018] [Indexed: 11/13/2022] Open
Abstract
The Mexican gray wolf (Canis lupus baileyi) was historically distributed throughout the southwestern United States and northern Mexico. Extensive predator removal campaigns during the early 20th century, however, resulted in its eventual extirpation by the mid 1980s. At this time, the Mexican wolf existed only in 3 separate captive lineages (McBride, Ghost Ranch, and Aragón) descended from 3, 2, and 2 founders, respectively. These lineages were merged in 1995 to increase the available genetic variation, and Mexican wolves were reintroduced into Arizona and New Mexico in 1998. Despite the ongoing management of the Mexican wolf population, it has been suggested that a proportion of the Mexican wolf ancestry may be recently derived from hybridization with domestic dogs. In this study, we genotyped 87 Mexican wolves, including individuals from all 3 captive lineages and cross-lineage wolves, for more than 172000 single nucleotide polymorphisms. We identified levels of genetic variation consistent with the pedigree record and effects of genetic rescue. To identify the potential to detect hybridization with domestic dogs, we compared our Mexican wolf genotypes with those from studies of domestic dogs and other gray wolves. The proportion of Mexican wolf ancestry assigned to domestic dogs was only between 0.06% (SD 0.23%) and 7.8% (SD 1.0%) for global and local ancestry estimates, respectively; and was consistent with simulated levels of incomplete lineage sorting. Overall, our results suggested that Mexican wolves lack biologically significant ancestry with dogs and have useful implications for the conservation and management of this endangered wolf subspecies.
Collapse
Affiliation(s)
| | | | - Melanie Culver
- US Geological Survey Arizona Cooperative Fish and Wildlife Research Unit, School of Natural Resources and the Environment, University of Arizona, Tucson, AZ
| |
Collapse
|
10
|
Qin H, Zhao J, Zhu X. Identifying Rare Variant Associations in Admixed Populations. Sci Rep 2019; 9:5458. [PMID: 30931973 PMCID: PMC6443736 DOI: 10.1038/s41598-019-41845-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 03/12/2019] [Indexed: 12/27/2022] Open
Abstract
An admixed population and its ancestral populations bear different burdens of a complex disease. The ancestral populations may have different haplotypes of deleterious alleles and thus ancestry-gene interaction can influence disease risk in the admixed population. Among admixed individuals, deleterious haplotypes and their ancestries are dependent and can provide non-redundant association information. Herein we propose a local ancestry boosted sum test (LABST) for identifying chromosomal blocks that harbor rare variants but have no ancestry switches. For such a stable ancestral block, our LABST exploits ancestry-gene interaction and the number of rare alleles therein. Under the null of no genetic association, the test statistic asymptotically follows a chi-square distribution with one degree of freedom (1-df). Our LABST properly controlled type I error rates under extensive simulations, suggesting that the asymptotic approximation was accurate for the null distribution of the test statistic. In terms of power for identifying rare variant associations, our LABST uniformly outperformed several famed methods under four important modes of disease genetics over a large range of relative risks. In conclusion, exploiting ancestry-gene interaction can boost statistical power for rare variant association mapping in admixed populations.
Collapse
Affiliation(s)
- Huaizhen Qin
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
- Department of Global Biostatistics and Data Science, Tulane University School of Public Health and Tropical Medicine, 1440 Canal Street, New Orleans, LA, 70112, USA
| | - Jinying Zhao
- Department of Epidemiology, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, FL, 32611, USA
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, 10900 Euclid Avenue, Cleveland, Ohio, 44106, USA.
| |
Collapse
|
11
|
Li R, Qin Z, Tang J, Han P, Xing Q, Wang F, Si S, Wu X, Tang M, Wang W, Zhang W. Association between 8q24 Gene Polymorphisms and the Risk of Prostate Cancer: A Systematic Review and Meta-Analysis. J Cancer 2017; 8:3198-3211. [PMID: 29158792 PMCID: PMC5665036 DOI: 10.7150/jca.20456] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/07/2017] [Indexed: 12/22/2022] Open
Abstract
Though numerous studies have been conducted to investigate the associations between five 8q24 polymorphisms (rs6983267 T>G, rs1447295 C>A, rs16901979 C>A, rs6983561 A>C and rs10090154 C>T) and prostate cancer (PCa) risk, the available results remained contradictory. Therefore, we performed a comprehensive meta-analysis to derive a precise estimation of such associations. We searched electronic databases PubMed, EMBASE, Web of Science and Wan Fang for the relevant available studies up to February 1st, 2017, and 39 articles were ultimately adopted in this meta-analysis. All data were extracted independently by two investigators and recorded in a unified form. The strength of association between 8q24 polymorphisms and PCa susceptibility was evaluated by the pooled odds ratios (ORs) with 95% confidence intervals (CIs). Subgroup analysis was conducted based on ethnicity, source of controls and genotypic method. Overall, a total of 39 articles containing 80 studies were adopted in this meta-analysis. The results of this meta-analysis indicated that five 8q24 polymorphisms above were all related to PCa susceptibility. Besides, in the subgroup analysis by ethnicity, all selected 8q24 polymorphisms were significantly associated with PCa risk in Asian population. In addition, stratification analysis by source of controls showed that significant results were mostly concentrated in the studies' controls from general population. Moreover, when stratified by genotypic method, significant increased PCa risks were found by TaqMan method. Therefore, this meta-analysis demonstrated that 8q24 polymorphisms (rs6983267 T>G, rs1447295 C>A, rs16901979 C>A, rs6983561 A>C and rs10090154 C>T) were associated with the susceptibility to PCa, which held the potential biomarkers for PCa risk.
Collapse
Affiliation(s)
- Ran Li
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Zhiqiang Qin
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Jingyuan Tang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Peng Han
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Qianwei Xing
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China.,Department of Urology, Affiliated Hospital of Nantong University, Nantong, 226001, China
| | - Feng Wang
- Department of Radiation Oncology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Shuhui Si
- Research Division of Clinical Pharmacology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Xiaolu Wu
- Department of Pediatrics, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Min Tang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Wei Wang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| | - Wei Zhang
- Department of Urology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, 210029, China
| |
Collapse
|
12
|
|
13
|
Massey SE. Strong Amerindian Mitonuclear Discordance in Puerto Rican Genomes Suggests Amerindian Mitochondrial Benefit. Ann Hum Genet 2017; 81:59-77. [DOI: 10.1111/ahg.12185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 01/06/2017] [Indexed: 12/24/2022]
Affiliation(s)
- Steven E. Massey
- Biology Department; University of Puerto Rico - Rio Piedras; PO Box 23360 San Juan Puerto Rico 00931
| |
Collapse
|
14
|
Kirkpatrick BE, Rashkin MD. Ancestry Testing and the Practice of Genetic Counseling. J Genet Couns 2016; 26:6-20. [DOI: 10.1007/s10897-016-0014-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Accepted: 08/30/2016] [Indexed: 12/20/2022]
|
15
|
Novembre J, Peter BM. Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 2016; 41:98-105. [PMID: 27662060 DOI: 10.1016/j.gde.2016.08.007] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 01/17/2023]
Abstract
Empowered by modern genotyping and large samples, population structure can be accurately described and quantified even when it only explains a fraction of a percent of total genetic variance. This is especially relevant and interesting for humans, where fine-scale population structure can both confound disease-mapping studies and reveal the history of migration and divergence that shaped our species' diversity. Here we review notable recent advances in the detection, use, and understanding of population structure. Our work addresses multiple areas where substantial progress is being made: improved statistics and models for better capturing differentiation, admixture, and the spatial distribution of variation; computational speed-ups that allow methods to scale to modern data; and advances in haplotypic modeling that have wide ranging consequences for the analysis of population structure. We conclude by outlining four important open challenges: the limitations of discrete population models, uncertainty in individual origins, the incorporation of both fine-scale structure and ancient DNA in parametric models, and the development of efficient computational tools, particularly for haplotype-based methods.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, IL 60636, United States; Department of Ecology and Evolutionary Biology, University of Chicago, IL 60636, United States
| | - Benjamin M Peter
- Department of Human Genetics, University of Chicago, IL 60636, United States
| |
Collapse
|
16
|
Aschard H, Gusev A, Brown R, Pasaniuc B. Leveraging local ancestry to detect gene-gene interactions in genome-wide data. BMC Genet 2015; 16:124. [PMID: 26498930 PMCID: PMC4619349 DOI: 10.1186/s12863-015-0283-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 10/19/2015] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Although genome-wide association studies have successfully identified thousands of variants associated to complex traits, these variants only explain a small amount of the entire heritability of the trait. Gene-gene interactions have been proposed as a source to explain a significant percentage of the missing heritability. However, detecting gene-gene interactions has proven to be very difficult due to computational and statistical challenges. The vast number of possible interactions that can be tested induces very stringent multiple hypotheses corrections that limit the power of detection. These issues have been mostly highlighted for the identification of pairwise effects and are even more challenging when addressing higher order interaction effects. In this work we explore the use of local ancestry in recently admixed individuals to find signals of gene-gene interaction on human traits and diseases. RESULTS We introduce statistical methods that leverage the correlation between local ancestry and the hidden unknown causal variants to find distant gene-gene interactions. We show that the power of this test increases with the number of causal variants per locus and the degree of differentiation of these variants between the ancestral populations. Overall, our simulations confirm that local ancestry can be used to detect gene-gene interactions, solving the computational bottleneck. When compared to a single nucleotide polymorphism (SNP)-based interaction screening of the same sample size, the power of our test was lower on all settings we considered. However, accounting for the dramatic increase in sample size that can be achieve when genotyping only a set of ancestry informative markers instead of the whole genome, we observe substantial gain in power in several scenarios. CONCLUSION Local ancestry-based interaction tests offer a new path to the detection of gene-gene interaction effects. It would be particularly useful in scenarios where multiple differentiated variants at the interacting loci act in a synergistic manner.
Collapse
Affiliation(s)
- Hugues Aschard
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA.
| | - Alexander Gusev
- Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA.
| | - Robert Brown
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Human Genetics, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
17
|
Kozlov K, Chebotarev D, Hassan M, Triska M, Triska P, Flegontov P, Tatarinova TV. Differential Evolution approach to detect recent admixture. BMC Genomics 2015; 16 Suppl 8:S9. [PMID: 26111206 PMCID: PMC4480842 DOI: 10.1186/1471-2164-16-s8-s9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix can incorporate individual's knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.
Collapse
|
18
|
Reconstructing Past Admixture Processes from Local Genomic Ancestry Using Wavelet Transformation. Genetics 2015; 200:469-81. [PMID: 25852078 PMCID: PMC4492373 DOI: 10.1534/genetics.115.176842] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2014] [Accepted: 04/03/2015] [Indexed: 11/18/2022] Open
Abstract
Admixture between long-separated populations is a defining feature of the genomes of many species. The mosaic block structure of admixed genomes can provide information about past contact events, including the time and extent of admixture. Here, we describe an improved wavelet-based technique that better characterizes ancestry block structure from observed genomic patterns. principal components analysis is first applied to genomic data to identify the primary population structure, followed by wavelet decomposition to develop a new characterization of local ancestry information along the chromosomes. For testing purposes, this method is applied to human genome-wide genotype data from Indonesia, as well as virtual genetic data generated using genome-scale sequential coalescent simulations under a wide range of admixture scenarios. Time of admixture is inferred using an approximate Bayesian computation framework, providing robust estimates of both admixture times and their associated levels of uncertainty. Crucially, we demonstrate that this revised wavelet approach, which we have released as the R package adwave, provides improved statistical power over existing wavelet-based techniques and can be used to address a broad range of admixture questions.
Collapse
|
19
|
Bansal V, Libiger O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinformatics 2015; 16:4. [PMID: 25592880 PMCID: PMC4301802 DOI: 10.1186/s12859-014-0418-7] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 12/10/2014] [Indexed: 01/18/2023] Open
Abstract
Background Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. Results We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. Conclusions Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0418-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Vikas Bansal
- Department of Pediatrics, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA. .,Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA.
| | - Ondrej Libiger
- Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA. .,Current address: MD Revolution, San Diego, CA, USA.
| |
Collapse
|
20
|
Accurate inference of local phased ancestry of modern admixed populations. Sci Rep 2014; 4:5800. [PMID: 25052506 PMCID: PMC4107375 DOI: 10.1038/srep05800] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2014] [Accepted: 07/07/2014] [Indexed: 01/10/2023] Open
Abstract
Population stratification is a growing concern in genetic-association studies. Averaged ancestry at the genome level (global ancestry) is insufficient for detecting the population substructures and correcting population stratifications in association studies. Local and phase stratification are needed for human genetic studies, but current technologies cannot be applied on the entire genome data due to various technical caveats. Here we developed a novel approach (aMAP, ancestry of Modern Admixed Populations) for inferring local phased ancestry. It took about 3 seconds on a desktop computer to finish a local ancestry analysis for each human genome with 1.4-million SNPs. This method also exhibits the scalability to larger datasets with respect to the number of SNPs, the number of samples, and the size of reference panels. It can detect the lack of the proxy of reference panels. The accuracy was 99.4%. The aMAP software has a capacity for analyzing 6-way admixed individuals. As the biomedical community continues to expand its efforts to increase the representation of diverse populations, and as the number of large whole-genome sequence datasets continues to grow rapidly, there is an increasing demand on rapid and accurate local ancestry analysis in genetics, pharmacogenomics, population genetics, and clinical diagnosis.
Collapse
|
21
|
Padhukasahasram B. Inferring ancestry from population genomic data and its applications. Front Genet 2014; 5:204. [PMID: 25071832 PMCID: PMC4080679 DOI: 10.3389/fgene.2014.00204] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Accepted: 06/17/2014] [Indexed: 12/26/2022] Open
Abstract
Ancestry inference is a frequently encountered problem and has many applications such as forensic analyses, genetic association studies, and personal genomics. The main goal of ancestry inference is to identify an individual’s population of origin based on our knowledge of natural populations. Because both self-reported ancestry in humans or the sampling location of an organism can be inaccurate for this purpose, the use of genetic markers can facilitate accurate and reliable inference of an individual’s ancestral origins. At a higher level, there are two different paradigms in ancestry inference: global ancestry inference which tries to compute the genome-wide average of the population contributions and local ancestry inference which tries to identify the regional ancestry of a genomic segment. In this mini review, I describe the numerous approaches that are currently available for both kinds of ancestry inference from population genomic datasets. I first describe the general ideas underlying such inference methods and their relationship to one another. Then, I describe practical applications in which inference of ancestry has proven useful. Lastly, I discuss challenges and directions for future research work in this area.
Collapse
Affiliation(s)
- Badri Padhukasahasram
- Center for Health Policy and Health Services Research, Henry Ford Health System, Detroit, MI USA
| |
Collapse
|