1
|
Chan TF, Rui X, Conti DV, Fornage M, Graff M, Haessler J, Haiman C, Highland HM, Jung SY, Kenny EE, Kooperberg C, Le Marchand L, North KE, Tao R, Wojcik G, Gignoux CR, Chiang CWK, Mancuso N. Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics. Am J Hum Genet 2023; 110:1853-1862. [PMID: 37875120 PMCID: PMC10645552 DOI: 10.1016/j.ajhg.2023.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/26/2023] Open
Abstract
The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Collapse
Affiliation(s)
- Tsz Fung Chan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinyue Rui
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - David V Conti
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, TX, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Christopher Haiman
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Su Yon Jung
- Translational Sciences Section, School of Nursing, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA; Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, John Hopkins University, Baltimore, MD, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Mooney JA, Agranat-Tamir L, Pritchard JK, Rosenberg NA. On the number of genealogical ancestors tracing to the source groups of an admixed population. Genetics 2023; 224:iyad079. [PMID: 37410594 PMCID: PMC10324943 DOI: 10.1093/genetics/iyad079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 04/05/2023] [Indexed: 07/08/2023] Open
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
Collapse
Affiliation(s)
- Jazlyn A Mooney
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | | | - Jonathan K Pritchard
- Department of Biology, Stanford University, Stanford, CA 94305, USA
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Noah A Rosenberg
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
3
|
Chan TF, Rui X, Conti DV, Fornage M, Graff M, Haessler J, Haiman C, Highland HM, Jung SY, Kenny E, Kooperberg C, Marchland LL, North KE, Tao R, Wojcik G, Gignoux CR, Chiang CWK, Mancuso N. Estimating heritability explained by local ancestry and evaluating stratification bias in admixture mapping from summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.10.536252. [PMID: 37131817 PMCID: PMC10153181 DOI: 10.1101/2023.04.10.536252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The heritability explained by local ancestry markers in an admixed population h γ 2 provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of h γ 2 can be susceptible to biases due to population structure in ancestral populations. Here, we present a novel approach, Heritability estimation from Admixture Mapping Summary STAtistics (HAMSTA), which uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA h γ 2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ~5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe h ˆ γ 2 in the 20 phenotypes range from 0.0025 to 0.033 (mean h ˆ γ 2 = 0.012 + / - 9.2 × 10 - 4 ), which translates to h ˆ 2 ranging from 0.062 to 0.85 (mean h ˆ 2 = 0.30 + / - 0.023 ). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 +/- 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Collapse
Affiliation(s)
- Tsz Fung Chan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Xinyue Rui
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - David V Conti
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, TX, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Christopher Haiman
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Su Yon Jung
- Translational Sciences Section, School of Nursing, Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, United States
| | - Eimear Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Loic Le Marchland
- Epidemiology Program, University of Hawaii Cancer Center, Honolulu, HI, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Ran Tao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, John Hopkins University, Baltimore, MD, USA
| | - Christopher R Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California
- Department of Quantitative and Computational Biology, University of Southern California
| |
Collapse
|
4
|
Hamid I, Korunes KL, Schrider DR, Goldberg A. Localizing Post-Admixture Adaptive Variants with Object Detection on Ancestry-Painted Chromosomes. Mol Biol Evol 2023; 40:msad074. [PMID: 36947126 PMCID: PMC10116606 DOI: 10.1093/molbev/msad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2022] [Revised: 03/14/2023] [Accepted: 03/20/2023] [Indexed: 03/23/2023] Open
Abstract
Gene flow between previously differentiated populations during the founding of an admixed or hybrid population has the potential to introduce adaptive alleles into the new population. If the adaptive allele is common in one source population, but not the other, then as the adaptive allele rises in frequency in the admixed population, genetic ancestry from the source containing the adaptive allele will increase nearby as well. Patterns of genetic ancestry have therefore been used to identify post-admixture positive selection in humans and other animals, including examples in immunity, metabolism, and animal coloration. A common method identifies regions of the genome that have local ancestry "outliers" compared with the distribution across the rest of the genome, considering each locus independently. However, we lack theoretical models for expected distributions of ancestry under various demographic scenarios, resulting in potential false positives and false negatives. Further, ancestry patterns between distant sites are often not independent. As a result, current methods tend to infer wide genomic regions containing many genes as under selection, limiting biological interpretation. Instead, we develop a deep learning object detection method applied to images generated from local ancestry-painted genomes. This approach preserves information from the surrounding genomic context and avoids potential pitfalls of user-defined summary statistics. We find the method is robust to a variety of demographic misspecifications using simulated data. Applied to human genotype data from Cabo Verde, we localize a known adaptive locus to a single narrow region compared with multiple or long windows obtained using two other ancestry-based methods.
Collapse
Affiliation(s)
- Iman Hamid
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| | | | - Daniel R Schrider
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Amy Goldberg
- Department of Evolutionary Anthropology, Duke University, Durham, NC
| |
Collapse
|
5
|
Lea AJ, Garcia A, Arevalo J, Ayroles JF, Buetow K, Cole SW, Eid Rodriguez D, Gutierrez M, Highland HM, Hooper PL, Justice A, Kraft T, North KE, Stieglitz J, Kaplan H, Trumble BC, Gurven MD. Natural selection of immune and metabolic genes associated with health in two lowland Bolivian populations. Proc Natl Acad Sci U S A 2023; 120:e2207544120. [PMID: 36574663 PMCID: PMC9910614 DOI: 10.1073/pnas.2207544120] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 09/21/2022] [Indexed: 12/28/2022] Open
Abstract
A growing body of work has addressed human adaptations to diverse environments using genomic data, but few studies have connected putatively selected alleles to phenotypes, much less among underrepresented populations such as Amerindians. Studies of natural selection and genotype-phenotype relationships in underrepresented populations hold potential to uncover previously undescribed loci underlying evolutionarily and biomedically relevant traits. Here, we worked with the Tsimane and the Moseten, two Amerindian populations inhabiting the Bolivian lowlands. We focused most intensively on the Tsimane, because long-term anthropological work with this group has shown that they have a high burden of both macro and microparasites, as well as minimal cardiometabolic disease or dementia. We therefore generated genome-wide genotype data for Tsimane individuals to study natural selection, and paired this with blood mRNA-seq as well as cardiometabolic and immune biomarker data generated from a larger sample that included both populations. In the Tsimane, we identified 21 regions that are candidates for selective sweeps, as well as 5 immune traits that show evidence for polygenic selection (e.g., C-reactive protein levels and the response to coronaviruses). Genes overlapping candidate regions were strongly enriched for known involvement in immune-related traits, such as abundance of lymphocytes and eosinophils. Importantly, we were also able to draw on extensive phenotype information for the Tsimane and Moseten and link five regions (containing PSD4, MUC21 and MUC22, TOX2, ANXA6, and ABCA1) with biomarkers of immune and metabolic function. Together, our work highlights the utility of pairing evolutionary analyses with anthropological and biomedical data to gain insight into the genetic basis of health-related traits.
Collapse
Affiliation(s)
- Amanda J. Lea
- Department of Biological Sciences, Vanderbilt University, Nashville, TN37235
| | - Angela Garcia
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ85287
| | - Jesusa Arevalo
- Department of Medicine, University of California, Los Angeles, CA90095
| | - Julien F. Ayroles
- Department of Ecology and Evolution, Princeton University, Princeton, NJ08544
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ08544
| | - Kenneth Buetow
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ85287
- School of Life Sciences, Arizona State University, Tempe, AZ85287
| | - Steve W. Cole
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, CA90095
- Department of Medicine, University of California, Los Angeles, CA90095
| | | | | | - Heather M. Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC27516
| | - Paul L. Hooper
- Economic Science Institute, Chapman University, Orange, CA92866
| | | | - Thomas Kraft
- Department of Anthropology, University of Utah, Salt Lake City, UT84112
| | - Kari E. North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC27516
| | | | - Hillard Kaplan
- Institute for Economics and Society, Chapman University, Orange, CA92866
| | - Benjamin C. Trumble
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ85287
- School of Human Evolution and Social Change, Arizona State University, Tempe, AZ85287
| | - Michael D. Gurven
- Department of Anthropology, University of California, Santa Barbara, CA93106
| |
Collapse
|
6
|
Sharko FS, Zhur KV, Trifonov VA, Prokhortchouk EB. Distortion of Population Statistics due to the Use of Different Methodological Approaches to the Construction of Genomic DNA Libraries. Acta Naturae 2023; 15:87-96. [PMID: 37153511 PMCID: PMC10154772 DOI: 10.32607/actanaturae.11898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/03/2023] [Indexed: 05/09/2023] Open
Abstract
Several different methods of DNA library preparation for paleogenetic studies are now available. However, the chemical reactions underlying each of them can affect the primary sequence of ancient DNA (aDNA) in the libraries and taint the results of a statistical analysis. In this paper, we compare the results of a sequencing of the aDNA libraries of a Bronze Age sample from burials of the Caucasian burial ground Klady, prepared using three different approaches: (1) shotgun sequencing, (2) strategies for selecting target genomic regions, and (3) strategies for selecting target genomic regions, including DNA pre-treatment with a mixture of uracil-DNA glycosylase (UDG) and endonuclease VIII. The impact of the studied approaches to genomic library preparation on the results of a secondary analysis of the statistical data, namely F4 statistics, ADMIXTURE, and principal component analysis (PCA), was analyzed. It was shown that preparation of genomic libraries without the use of UDG can result in distorted statistical data due to postmortem chemical modifications of the aDNA. This distortion can be alleviated by analyzing only the single nucleotide polymorphisms caused by transversions in the genome.
Collapse
Affiliation(s)
- F. S. Sharko
- Laboratory of vertebrate genomics and epigenomics, Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences, Moscow, 119071 Russian Federation
| | - K. V. Zhur
- Laboratory of vertebrate genomics and epigenomics, Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences, Moscow, 119071 Russian Federation
| | - V. A. Trifonov
- Laboratory of vertebrate genomics and epigenomics, Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences, Moscow, 119071 Russian Federation
- Institute for the History of Material Culture of the Russian Academy of Sciences, Saint Petersburg, 191186 Russian Federation
| | - E. B. Prokhortchouk
- Laboratory of vertebrate genomics and epigenomics, Federal Research Centre “Fundamentals of Biotechnology” of the Russian Academy of Sciences, Moscow, 119071 Russian Federation
| |
Collapse
|
7
|
Peter BM. A geometric relationship of
F
2
,
F
3
and
F
4
-statistics with principal component analysis. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200413. [PMID: 35430884 PMCID: PMC9014194 DOI: 10.1098/rstb.2020.0413] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Principal component analysis (PCA) and
F
-statistics
sensu
Patterson are two of the most widely used population genetic tools to study human genetic variation. Here, I derive explicit connections between the two approaches and show that these two methods are closely related.
F
-statistics have a simple geometrical interpretation in the context of PCA, and orthogonal projections are a key concept to establish this link. I show that for any pair of populations, any population that is admixed as determined by an
F
3
-statistic will lie inside a circle on a PCA plot. Furthermore, the
F
4
-statistic is closely related to an angle measurement, and will be zero if the differences between pairs of populations intersect at a right angle in PCA space. I illustrate my results on two examples, one of Western Eurasian, and one of global human diversity. In both examples, I find that the first few PCs are sufficient to approximate most
F
-statistics, and that PCA plots are effective at predicting
F
-statistics. Thus, while
F
-statistics are commonly understood in terms of discrete populations, the geometric perspective illustrates that they can be viewed in a framework of populations that vary in a more continuous manner.
This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
- Benjamin M. Peter
- Max-Planck-Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| |
Collapse
|
8
|
Edge MD, Ramachandran S, Rosenberg NA. Celebrating 50 years since Lewontin's apportionment of human diversity. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200405. [PMID: 35430889 PMCID: PMC9014183 DOI: 10.1098/rstb.2020.0405] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Sohini Ramachandran
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA
| | | |
Collapse
|