1
|
Añorve-Garibay V, Huerta-Sanchez E, Sohail M, Ortega-Del Vecchyo D. Natural selection acting on complex traits hampers the predictive accuracy of polygenic scores in ancient samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.10.612181. [PMID: 39314439 PMCID: PMC11419050 DOI: 10.1101/2024.09.10.612181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
The prediction of phenotypes from ancient humans has gained interest due to its potential to investigate the evolution of complex traits. These predictions are commonly performed using polygenic scores computed with DNA information from ancient humans along with genome-wide association studies (GWAS) data from present-day humans. However, numerous evolutionary processes could impact the prediction of phenotypes from ancient humans based on polygenic scores. In this work we investigate how natural selection impacts phenotypic predictions on ancient individuals using polygenic scores. We use simulations of an additive trait to analyze how natural selection impacts phenotypic predictions with polygenic scores. We simulate a trait evolving under neutrality, stabilizing selection and directional selection. We find that stabilizing and directional selection have contrasting effects on ancient phenotypic predictions. Stabilizing selection accelerates the loss of large-effect alleles contributing to trait variation. Conversely, directional selection accelerates the loss of small and large-effect alleles that drive individuals farther away from the optimal phenotypic value. These effects result in specific shared genetic variation patterns between ancient and modern populations which hamper the accuracy of polygenic scores to predict phenotypes. Furthermore, we conducted simulations that include realistic strengths of stabilizing selection and heritability estimates to show how natural selection could impact the predictive accuracy of ancient polygenic scores for two widely studied traits: height and body mass index. We emphasize the importance of considering how natural selection can decrease the reliability of ancient polygenic scores to perform phenotypic predictions on an ancient population.
Collapse
Affiliation(s)
- Valeria Añorve-Garibay
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| | - Emilia Huerta-Sanchez
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, RI, USA
| | - Mashaal Sohail
- Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos, México
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| |
Collapse
|
2
|
Wang JY, Lin N, Zietz M, Mares J, Narasimhan VM, Rathouz PJ, Harpak A. Three Open Questions in Polygenic Score Portability. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.20.608703. [PMID: 39229140 PMCID: PMC11370354 DOI: 10.1101/2024.08.20.608703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/05/2024]
Abstract
A major obstacle hindering the broad adoption of polygenic scores (PGS) is their lack of "portability" to people that differ-in genetic ancestry or other characteristics-from the GWAS samples in which genetic effects were estimated. Here, we use the UK Biobank to measure the change in PGS prediction accuracy as a continuous function of individuals' genome-wide genetic dissimilarity to the GWAS sample ("genetic distance"). Our results highlight three gaps in our understanding of PGS portability. First, prediction accuracy is extremely noisy at the individual level and not well predicted by genetic distance. In fact, variance in prediction accuracy is explained comparably well by socioeconomic measures. Second, trends of portability vary across traits. For several immunity-related traits, prediction accuracy drops near zero quickly even at intermediate levels of genetic distance. This quick drop may reflect GWAS associations being more ancestry-specific in immunity-related traits than in other traits. Third, we show that even qualitative trends of portability can depend on the measure of prediction accuracy used. For instance, for white blood cell count, a measure of prediction accuracy at the individual level (reduction in mean squared error) increases with genetic distance. Together, our results show that portability cannot be understood through global ancestry groupings alone. There are other, understudied factors influencing portability, such as the specifics of the evolution of the trait and its genetic architecture, social context, and the construction of the polygenic score. Addressing these gaps can aid in the development and application of PGS and inform more equitable genomic research.
Collapse
Affiliation(s)
- Joyce Y Wang
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX
| | - Neeka Lin
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX
| | - Michael Zietz
- Department of Biomedical Informatics, Columbia University, New York, NY
| | - Jason Mares
- Department of Neurology, Columbia University, New York, NY
| | - Vagheesh M Narasimhan
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX
- Department of Statistics and Data Science, The University of Texas at Austin, Austin, TX
| | - Paul J Rathouz
- Department of Statistics and Data Science, The University of Texas at Austin, Austin, TX
- Department of Population Health, The University of Texas at Austin, Austin, TX
| | - Arbel Harpak
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX
- Department of Population Health, The University of Texas at Austin, Austin, TX
| |
Collapse
|
3
|
Pankratov V, Mezzavilla M, Aneli S, Kuznetsov IA, Fusco D, Wilson JF, Metspalu M, Provero P, Pagani L, Marnetto D. Ancestral genetic components are consistently associated with the complex trait landscape in European biobanks. Eur J Hum Genet 2024:10.1038/s41431-024-01678-9. [PMID: 39127804 DOI: 10.1038/s41431-024-01678-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 07/23/2024] [Accepted: 07/25/2024] [Indexed: 08/12/2024] Open
Abstract
The genetic structure in Europe was mostly shaped by admixture between the Western Hunter-Gatherers, Early European Farmers and Steppe Bronze Age ancestral components. Such structure is regarded as a confounder in GWAS and follow-up studies, and gold-standard methods exist to correct for it. However, it is still poorly understood to which extent these ancestral components contribute to complex trait variation in present-day Europe. In this work we harness the UK Biobank to address this question. By extensive demographic simulations, exploiting data on siblings and incorporating previous results we obtained from the Estonian Biobank, we carefully evaluate the significance and scope of our findings. Heart rate, platelet count, bone mineral density and many other traits show stratification similar to height and pigmentation traits, likely targets of selection and divergence across ancestral groups. We show that the reported ancestry-trait associations are not driven by environmental confounders by confirming our results when using between-sibling differences in ancestry. The consistency of our results across biobanks further supports this and indicates that these genetic predispositions that derive from post-Neolithic admixture events act as a source of variability and as potential confounders in Europe as a whole.
Collapse
Affiliation(s)
- Vasili Pankratov
- Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia.
| | | | - Serena Aneli
- Department of Public Health Sciences and Pediatrics, University of Turin, 10126, Turin, Italy
| | - Ivan A Kuznetsov
- Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Daniela Fusco
- Department of Neurosciences, University of Turin, 10126, Turin, Italy
| | - James F Wilson
- Centre for Global Health Research, Usher Institute, University of Edinburgh, Teviot Place, Edinburgh, EH8 9AG, Scotland
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, Scotland
- Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, Scotland
| | - Mait Metspalu
- Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Paolo Provero
- Department of Neurosciences, University of Turin, 10126, Turin, Italy
- Center for Omics Sciences, IRCCS San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Luca Pagani
- Department of Biology, University of Padua, Padua, Italy
- Institute of Genomics, University of Tartu, 51010, Tartu, Estonia
| | - Davide Marnetto
- Department of Neurosciences, University of Turin, 10126, Turin, Italy.
| |
Collapse
|
4
|
Huang J, Kleman N, Basu S, Shriver MD, Zaidi AA. Interpreting SNP heritability in admixed populations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.08.04.551959. [PMID: 37577588 PMCID: PMC10418213 DOI: 10.1101/2023.08.04.551959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
SNP heritabilityh s n p 2 is defined as the proportion of phenotypic variance explained by genotyped SNPs and is believed to be a lower bound of heritability (h 2 ), being equal to it if all causal variants are known. Despite the simple intuition behindh s n p 2 , its interpretation and equivalence toh 2 is unclear, particularly in the presence of population structure and assortative mating. It is well known that population structure can lead to inflation inh ˆ s n p 2 estimates because of confounding due to linkage disequilibrium (LD) or shared environment. Here we use analytical theory and simulations to demonstrate thath s n p 2 estimates can be biased in admixed populations, even in the absence of confounding and even if all causal variants are known. This is because admixture generates LD, which contributes to the genetic variance, and therefore to heritability. Genome-wide restricted maximum likelihood (GREML) does not capture this contribution leading to under- or over-estimates ofh s n p 2 relative toh 2 , depending on the genetic architecture. In contrast, Haseman-Elston (HE) regression exaggerates the LD contribution leading to biases in the opposite direction. For the same reason, GREML and HE estimates of local ancestry heritabilityh γ 2 are also biased. We describe this bias inh ˆ s n p 2 andh ˆ γ 2 as a function of admixture history and the genetic architecture of the trait and show that it can be recovered under some conditions. We clarify the interpretation ofh ˆ s n p 2 in admixed populations and discuss its implication for genome-wide association studies and polygenic prediction.
Collapse
Affiliation(s)
- Jinguo Huang
- Bioinformatics and Genomics, Huck Institutes of the Life Sciences, Pennsylvania State University
- Department of Anthropology, Pennsylvania State University
| | - Nicole Kleman
- Department of Genetics, Cell Biology, and Development, University of Minnesota
| | - Saonli Basu
- Department of Biostatistics, University of Minnesota
| | | | - Arslan A. Zaidi
- Department of Genetics, Cell Biology, and Development, University of Minnesota
- Institute of Health Informatics, University of Minnesota
| |
Collapse
|
5
|
Blanc J, Berg JJ. Testing for differences in polygenic scores in the presence of confounding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.12.532301. [PMID: 36993707 PMCID: PMC10055004 DOI: 10.1101/2023.03.12.532301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Polygenic scores have become an important tool in human genetics, enabling the prediction of individuals' phenotypes from their genotypes. Understanding how the pattern of differences in polygenic score predictions across individuals intersects with variation in ancestry can provide insights into the evolutionary forces acting on the trait in question, and is important for understanding health disparities. However, because most polygenic scores are computed using effect estimates from population samples, they are susceptible to confounding by both genetic and environmental effects that are correlated with ancestry. The extent to which this confounding drives patterns in the distribution of polygenic scores depends on patterns of population structure in both the original estimation panel and in the prediction/test panel. Here, we use theory from population and statistical genetics, together with simulations, to study the procedure of testing for an association between polygenic scores and axes of ancestry variation in the presence of confounding. We use a general model of genetic relatedness to describe how confounding in the estimation panel biases the distribution of polygenic scores in a way that depends on the degree of overlap in population structure between panels. We then show how this confounding can bias tests for associations between polygenic scores and important axes of ancestry variation in the test panel. Specifically, for any given test, there exists a single axis of population structure in the GWAS panel that needs to be controlled for in order to protect the test. Based on this result, we propose a new approach for directly estimating this axis of population structure in the GWAS panel. We then use simulations to compare the performance of this approach to the standard approach in which the principal components of the GWAS panel genotypes are used to control for stratification.
Collapse
Affiliation(s)
- Jennifer Blanc
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Jeremy J. Berg
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| |
Collapse
|
6
|
Patel RA, Weiß CL, Zhu H, Mostafavi H, Simons YB, Spence JP, Pritchard JK. Conditional frequency spectra as a tool for studying selection on complex traits in biobanks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.15.599126. [PMID: 38948697 PMCID: PMC11212903 DOI: 10.1101/2024.06.15.599126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
Collapse
Affiliation(s)
- Roshni A. Patel
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Clemens L. Weiß
- Stanford Cancer Institute Core, Stanford University School of Medicine, Stanford, CA
| | - Huisheng Zhu
- Department of Biology, Stanford University, Stanford, CA
| | - Hakhamanesh Mostafavi
- Center for Human Genetics and Genomics, New York University School of Medicine, New York, NY
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY
| | | | - Jeffrey P. Spence
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University School of Medicine, Stanford, CA
- Department of Biology, Stanford University, Stanford, CA
| |
Collapse
|
7
|
Anderson NW, Kirk L, Schraiber JG, Ragsdale AP. A Path Integral Approach for Allele Frequency Dynamics Under Polygenic Selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.14.599114. [PMID: 38915613 PMCID: PMC11195211 DOI: 10.1101/2024.06.14.599114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Many phenotypic traits have a polygenic genetic basis, making it challenging to learn their genetic architectures and predict individual phenotypes. One promising avenue to resolve the genetic basis of complex traits is through evolve-and-resequence experiments, in which laboratory populations are exposed to some selective pressure and trait-contributing loci are identified by extreme frequency changes over the course of the experiment. However, small laboratory populations will experience substantial random genetic drift, and it is difficult to determine whether selection played a roll in a given allele frequency change. Predicting how much allele frequencies change under drift and selection had remained an open problem well into the 21st century, even those contributing to simple, monogenic traits. Recently, there have been efforts to apply the path integral, a method borrowed from physics, to solve this problem. So far, this approach has been limited to genic selection, and is therefore inadequate to capture the complexity of quantitative, highly polygenic traits that are commonly studied. Here we extend one of these path integral methods, the perturbation approximation, to selection scenarios that are of interest to quantitative genetics. In particular, we derive analytic expressions for the transition probability (i.e., the probability that an allele will change in frequency from x , to y in time t ) of an allele contributing to a trait subject to stabilizing selection, as well as that of an allele contributing to a trait rapidly adapting to a new phenotypic optimum. We use these expressions to characterize the use of allele frequency change to test for selection, as well as explore optimal design choices for evolve-and-resequence experiments to uncover the genetic architecture of polygenic traits under selection.
Collapse
Affiliation(s)
- Nathan W. Anderson
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Lloyd Kirk
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| | - Joshua G. Schraiber
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, 90089, USA
| | - Aaron P. Ragsdale
- Department of Integrative Biology, University of Wisconsin-Madison, Madison, WI, 53706, USA
| |
Collapse
|
8
|
Lewis ACF, Chisholm RL, Connolly JJ, Esplin ED, Glessner J, Gordon A, Green RC, Hakonarson H, Harr M, Holm IA, Jarvik GP, Karlson E, Kenny EE, Kottyan L, Lennon N, Linder JE, Luo Y, Martin LJ, Perez E, Puckelwartz MJ, Rasmussen-Torvik LJ, Sabatello M, Sharp RR, Smoller JW, Sterling R, Terek S, Wei WQ, Fullerton SM. Managing differential performance of polygenic risk scores across groups: Real-world experience of the eMERGE Network. Am J Hum Genet 2024; 111:999-1005. [PMID: 38688278 PMCID: PMC11179244 DOI: 10.1016/j.ajhg.2024.04.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 05/02/2024] Open
Abstract
The differential performance of polygenic risk scores (PRSs) by group is one of the major ethical barriers to their clinical use. It is also one of the main practical challenges for any implementation effort. The social repercussions of how people are grouped in PRS research must be considered in communications with research participants, including return of results. Here, we outline the decisions faced and choices made by a large multi-site clinical implementation study returning PRSs to diverse participants in handling this issue of differential performance. Our approach to managing the complexities associated with the differential performance of PRSs serves as a case study that can help future implementers of PRSs to plot an anticipatory course in response to this issue.
Collapse
Affiliation(s)
- Anna C F Lewis
- Edmond and Lily Safra Center for Ethics, Harvard University, Cambridge, MA, USA; Department of Genetics, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Harvard Medical School, Boston, MA, USA.
| | - Rex L Chisholm
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA
| | - John J Connolly
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Joe Glessner
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Adam Gordon
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Pharmacology, Northwestern University, Evanston, IL, USA
| | - Robert C Green
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA; Ariadne Labs, Boston, MA, USA; Harvard Medical School, Boston, MA, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Department of Pediatrics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Division of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA, USA; Division of Pulmonary Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Margaret Harr
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingrid A Holm
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA; Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Gail P Jarvik
- Division of Medical Genetics, Department of Medicine and Department of Genome Science, University of Washington Medical Center, Seattle, WA, USA
| | - Elizabeth Karlson
- Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA; Mass General Brigham Personalized Medicine, Boston, MA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine, New York City, NY, USA; Center for Clinical Translational Genomics, Icahn School of Medicine, New York City, NY, USA; Division of Genomic Medicine, Department of Medicine, Icahn School of Medicine, New York City, NY, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine, New York City, NY, USA
| | - Leah Kottyan
- Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Niall Lennon
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jodell E Linder
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Evanston, IL, USA
| | - Lisa J Martin
- Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Emma Perez
- Mass General Brigham Personalized Medicine, Boston, MA, USA
| | - Megan J Puckelwartz
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Pharmacology, Northwestern University, Evanston, IL, USA
| | - Laura J Rasmussen-Torvik
- Center for Genetic Medicine, Northwestern University, Evanston, IL, USA; Department of Preventive Medicine, Northwestern University, Evanston, IL, USA
| | - Maya Sabatello
- Center for Precision Medicine and Genomics, Department of Medicine, Columbia University Irving Medical Center, New York City, NY, USA; Division of Ethics, Department of Medical Humanities and Ethics, Columbia University Irving Medical Center, New York City, NY, USA
| | | | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA; Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA; Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| | - Rene Sterling
- Division of Genomics and Society, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shannon Terek
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| |
Collapse
|
9
|
Gokhman D, Harris KD, Carmi S, Greenbaum G. Predicting the direction of phenotypic difference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.22.581566. [PMID: 38895291 PMCID: PMC11185551 DOI: 10.1101/2024.02.22.581566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Predicting phenotypes from genomic data is a key goal in genetics, but for most complex phenotypes, predictions are hampered by incomplete genotype-to-phenotype mapping. Here, we describe a more attainable approach than quantitative predictions, which is aimed at qualitatively predicting phenotypic differences. Despite incomplete genotype-to-phenotype mapping, we show that it is relatively easy to determine which of two individuals has a greater phenotypic value. This question is central in many scenarios, e.g., comparing disease risk between individuals, the yield of crop strains, or the anatomy of extinct vs extant species. To evaluate prediction accuracy, i.e., the probability that the individual with the greater predicted phenotype indeed has a greater phenotypic value, we developed an estimator of the ratio between known and unknown effects on the phenotype. We evaluated prediction accuracy using human data from tens of thousands of individuals from either the same family or the same population, as well as data from different species. We found that, in many cases, even when only a small fraction of the loci affecting a phenotype is known, the individual with the greater phenotypic value can be identified with over 90% accuracy. Our approach also circumvents some of the limitations in transferring genetic association results across populations. Overall, we introduce an approach that enables accurate predictions of key information on phenotypes - the direction of phenotypic difference - and suggest that more phenotypic information can be extracted from genomic data than previously appreciated.
Collapse
Affiliation(s)
- David Gokhman
- Department of Molecular Genetics, The Weizmann Institute of Science, Rehovot 76100, Israel
| | - Keith D Harris
- Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Gili Greenbaum
- Department of Ecology, Evolution and Behavior, The Hebrew University of Jerusalem, Jerusalem 91904, Israel
| |
Collapse
|
10
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
11
|
Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell 2024; 187:1059-1075. [PMID: 38428388 DOI: 10.1016/j.cell.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 01/16/2024] [Indexed: 03/03/2024]
Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sohini Ramachandran
- Ecology, Evolution and Organismal Biology, Center for Computational Molecular Biology, and the Data Science Institute, Brown University, Providence, RI 029129, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
12
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. Proc Natl Acad Sci U S A 2024; 121:e2312377121. [PMID: 38363870 PMCID: PMC10907250 DOI: 10.1073/pnas.2312377121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 y, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA95616
- Department of Evolution and Ecology, University of California, Davis, CA95616
| |
Collapse
|
13
|
Janivara R, Hazra U, Pfennig A, Harlemon M, Kim MS, Eaaswarkhanth M, Chen WC, Ogunbiyi A, Kachambwa P, Petersen LN, Jalloh M, Mensah JE, Adjei AA, Adusei B, Joffe M, Gueye SM, Aisuodionoe-Shadrach OI, Fernandez PW, Rohan TE, Andrews C, Rebbeck TR, Adebiyi AO, Agalliu I, Lachance J. Uncovering the genetic architecture and evolutionary roots of androgenetic alopecia in African men. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575396. [PMID: 38293167 PMCID: PMC10827056 DOI: 10.1101/2024.01.12.575396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Androgenetic alopecia is a highly heritable trait. However, much of our understanding about the genetics of male pattern baldness comes from individuals of European descent. Here, we examined a novel dataset comprising 2,136 men from Ghana, Nigeria, Senegal, and South Africa that were genotyped using a custom array. We first tested how genetic predictions of baldness generalize from Europe to Africa, finding that polygenic scores from European GWAS yielded AUC statistics that ranged from 0.513 to 0.546, indicating that genetic predictions of baldness in African populations performed notably worse than in European populations. Subsequently, we conducted the first African GWAS of androgenetic alopecia, focusing on self-reported baldness patterns at age 45. After correcting for present age, population structure, and study site, we identified 266 moderately significant associations, 51 of which were independent (p-value < 10-5, r2 < 0.2). Most baldness associations were autosomal, and the X chromosomes does not appear to have a large impact on baldness in African men. Finally, we examined the evolutionary causes of continental differences in genetic architecture. Although Neanderthal alleles have previously been associated with skin and hair phenotypes, we did not find evidence that European-ascertained baldness hits were enriched for signatures of ancient introgression. Most loci that are associated with androgenetic alopecia are evolving neutrally. However, multiple baldness-associated SNPs near the EDA2R and AR genes have large allele frequency differences between continents. Collectively, our findings illustrate how evolutionary history contributes to the limited portability of genetic predictions across ancestries.
Collapse
Affiliation(s)
- Rohini Janivara
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Ujani Hazra
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Maxine Harlemon
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
- Department of Biology, Morgan State University, Baltimore, Maryland, USA
| | - Michelle S Kim
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
- Department of Human Genetics University of Michigan, Ann Arbor, Michigan, USA
| | | | - Wenlong C Chen
- Strengthening Oncology Services Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- National Cancer Registry, National Institute for Communicable Diseases a Division of the National Health Laboratory Service, Johannesburg, South Africa
| | | | - Paidamoyo Kachambwa
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
- Mediclinic Precise Southern Africa, Cape Town, South Africa
| | - Lindsay N Petersen
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
- Mediclinic Precise Southern Africa, Cape Town, South Africa
| | - Mohamed Jalloh
- Université Cheikh Anta Diop de Dakar, Dakar, Senegal
- Université Iba Der Thiam de Thiès, Thiès, Senegal
| | - James E Mensah
- Korle-Bu Teaching Hospital and University of Ghana Medical School, Accra, Ghana
| | - Andrew A Adjei
- Department of Pathology, University of Ghana Medical School, Accra, Ghana
| | | | - Maureen Joffe
- Strengthening Oncology Services Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Oseremen I Aisuodionoe-Shadrach
- College of Health Sciences, University of Abuja, University of Abuja Teaching Hospital and Cancer Science Centre, Abuja, Nigeria
| | - Pedro W Fernandez
- Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Thomas E Rohan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | | | - Timothy R Rebbeck
- Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | | | - Ilir Agalliu
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Joseph Lachance
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
14
|
Simon A, Coop G. The contribution of gene flow, selection, and genetic drift to five thousand years of human allele frequency change. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.11.548607. [PMID: 37503227 PMCID: PMC10370008 DOI: 10.1101/2023.07.11.548607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Genomic time series from experimental evolution studies and ancient DNA datasets offer us a chance to directly observe the interplay of various evolutionary forces. We show how the genome-wide variance in allele frequency change between two time points can be decomposed into the contributions of gene flow, genetic drift, and linked selection. In closed populations, the contribution of linked selection is identifiable because it creates covariances between time intervals, and genetic drift does not. However, repeated gene flow between populations can also produce directionality in allele frequency change, creating covariances. We show how to accurately separate the fraction of variance in allele frequency change due to admixture and linked selection in a population receiving gene flow. We use two human ancient DNA datasets, spanning around 5,000 years, as time transects to quantify the contributions to the genome-wide variance in allele frequency change. We find that a large fraction of genome-wide change is due to gene flow. In both cases, after correcting for known major gene flow events, we do not observe a signal of genome-wide linked selection. Thus despite the known role of selection in shaping long-term polymorphism levels, and an increasing number of examples of strong selection on single loci and polygenic scores from ancient DNA, it appears to be gene flow and drift, and not selection, that are the main determinants of recent genome-wide allele frequency change. Our approach should be applicable to the growing number of contemporary and ancient temporal population genomics datasets.
Collapse
Affiliation(s)
- Alexis Simon
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| | - Graham Coop
- Center for Population Biology, University of California, Davis, CA 95616
- Department of Evolution and Ecology, University of California, Davis, CA 95616
| |
Collapse
|
15
|
Gao Z. Unveiling recent and ongoing adaptive selection in human populations. PLoS Biol 2024; 22:e3002469. [PMID: 38236800 PMCID: PMC10796035 DOI: 10.1371/journal.pbio.3002469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2024] Open
Abstract
Genome-wide scans for signals of selection have become a routine part of the analysis of population genomic variation datasets and have resulted in compelling evidence of selection during recent human evolution. This Essay spotlights methodological innovations that have enabled the detection of selection over very recent timescales, even in contemporary human populations. By harnessing large-scale genomic and phenotypic datasets, these new methods use different strategies to uncover connections between genotype, phenotype, and fitness. This Essay outlines the rationale and key findings of each strategy, discusses challenges in interpretation, and describes opportunities to improve detection and understanding of ongoing selection in human populations.
Collapse
Affiliation(s)
- Ziyue Gao
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
16
|
Dutta A, McDonald BA, Croll D. Combined reference-free and multi-reference based GWAS uncover cryptic variation underlying rapid adaptation in a fungal plant pathogen. PLoS Pathog 2023; 19:e1011801. [PMID: 37972199 PMCID: PMC10688896 DOI: 10.1371/journal.ppat.1011801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 11/30/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
Microbial pathogens often harbor substantial functional diversity driven by structural genetic variation. Rapid adaptation from such standing variation threatens global food security and human health. Genome-wide association studies (GWAS) provide a powerful approach to identify genetic variants underlying recent pathogen adaptation. However, the reliance on single reference genomes and single nucleotide polymorphisms (SNPs) obscures the true extent of adaptive genetic variation. Here, we show quantitatively how a combination of multiple reference genomes and reference-free approaches captures substantially more relevant genetic variation compared to single reference mapping. We performed reference-genome based association mapping across 19 reference-quality genomes covering the diversity of the species. We contrasted the results with a reference-free (i.e., k-mer) approach using raw whole-genome sequencing data in a panel of 145 strains collected across the global distribution range of the fungal wheat pathogen Zymoseptoria tritici. We mapped the genetic architecture of 49 life history traits including virulence, reproduction and growth in multiple stressful environments. The inclusion of additional reference genome SNP datasets provides a nearly linear increase in additional loci mapped through GWAS. Variants detected through the k-mer approach explained a higher proportion of phenotypic variation than a reference genome-based approach and revealed functionally confirmed loci that classic GWAS approaches failed to map. The power of GWAS in microbial pathogens can be significantly enhanced by comprehensively capturing structural genetic variation. Our approach is generalizable to a large number of species and will uncover novel mechanisms driving rapid adaptation of pathogens.
Collapse
Affiliation(s)
- Anik Dutta
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Bruce A. McDonald
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
17
|
Konner M, Eaton SB. Hunter-gatherer diets and activity as a model for health promotion: Challenges, responses, and confirmations. Evol Anthropol 2023; 32:206-222. [PMID: 37417918 DOI: 10.1002/evan.21987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 07/27/2022] [Accepted: 04/17/2023] [Indexed: 07/08/2023]
Abstract
Beginning in 1985, we and others presented estimates of hunter-gatherer (and ultimately ancestral) diet and physical activity, hoping to provide a model for health promotion. The Hunter-Gatherer Model was designed to offset the apparent mismatch between our genes and the current Western-type lifestyle, a mismatch that arguably affects prevalence of many chronic degenerative diseases. The effort has always been controversial and subject to both scientific and popular critiques. The present article (1) addresses eight such challenges, presenting for each how the model has been modified in response, or how the criticism can be rebutted; (2) reviews new epidemiological and experimental evidence (including especially randomized controlled clinical trials); and (3) shows how official recommendations put forth by governments and health authorities have converged toward the model. Such convergence suggests that evolutionary anthropology can make significant contributions to human health.
Collapse
Affiliation(s)
- Melvin Konner
- Department of Anthropology, Program in Anthropology and Human Biology, Emory University, Atlanta, Georgia, USA
| | - S Boyd Eaton
- Department of Radiology, Emory University School of Medicine (Emeritus), Adjunct Lecturer, Department of Anthropology, Emory University, Atlanta, Georgia, USA
| |
Collapse
|
18
|
Raben TG, Lello L, Widen E, Hsu SDH. Biobank-scale methods and projections for sparse polygenic prediction from machine learning. Sci Rep 2023; 13:11662. [PMID: 37468507 PMCID: PMC10356957 DOI: 10.1038/s41598-023-37580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, Michigan, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
19
|
Reid BN, Star B, Pinsky ML. Detecting parallel polygenic adaptation to novel evolutionary pressure in wild populations: a case study in Atlantic cod ( Gadus morhua). Philos Trans R Soc Lond B Biol Sci 2023; 378:20220190. [PMID: 37246382 DOI: 10.1098/rstb.2022.0190] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 02/13/2023] [Indexed: 05/30/2023] Open
Abstract
Populations can adapt to novel selection pressures through dramatic frequency changes in a few genes of large effect or subtle shifts in many genes of small effect. The latter (polygenic adaptation) is expected to be the primary mode of evolution for many life-history traits but tends to be more difficult to detect than changes in genes of large effect. Atlantic cod (Gadus morhua) were subjected to intense fishing pressure over the twentieth century, leading to abundance crashes and a phenotypic shift toward earlier maturation across many populations. Here, we use spatially replicated temporal genomic data to test for a shared polygenic adaptive response to fishing using methods previously applied to evolve-and-resequence experiments. Cod populations on either side of the Atlantic show covariance in allele frequency change across the genome that are characteristic of recent polygenic adaptation. Using simulations, we demonstrate that the degree of covariance in allele frequency change observed in cod is unlikely to be explained by neutral processes or background selection. As human pressures on wild populations continue to increase, understanding and attributing modes of adaptation using methods similar to those demonstrated here will be important in identifying the capacity for adaptive responses and evolutionary rescue. This article is part of the theme issue 'Detecting and attributing the causes of biodiversity change: needs, gaps and solutions'.
Collapse
Affiliation(s)
- Brendan N Reid
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ 08540, USA
| | - Bastiaan Star
- Center for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, PO Box 1066, Blindern, 0316 Oslo, Norway
| | - Malin L Pinsky
- Department of Ecology, Evolution, and Natural Resources, Rutgers University, New Brunswick, NJ 08540, USA
| |
Collapse
|
20
|
Veller C, Coop G. Interpreting population and family-based genome-wide association studies in the presence of confounding. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.26.530052. [PMID: 36909521 PMCID: PMC10002712 DOI: 10.1101/2023.02.26.530052] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/02/2023]
Abstract
A central aim of genome-wide association studies (GWASs) is to estimate direct genetic effects: the causal effects on an individual's phenotype of the alleles that they carry. However, estimates of direct effects can be subject to genetic and environmental confounding, and can also absorb the 'indirect' genetic effects of relatives' genotypes. Recently, an important development in controlling for these confounds has been the use of within-family GWASs, which, because of the randomness of Mendelian segregation within pedigrees, are often interpreted as producing unbiased estimates of direct effects. Here, we present a general theoretical analysis of the influence of confounding in standard population-based and within-family GWASs. We show that, contrary to common interpretation, family-based estimates of direct effects can be biased by genetic confounding. In humans, such biases will often be small per-locus, but can be compounded when effect size estimates are used in polygenic scores. We illustrate the influence of genetic confounding on population- and family-based estimates of direct effects using models of assortative mating, population stratification, and stabilizing selection on GWAS traits. We further show how family-based estimates of indirect genetic effects, based on comparisons of parentally transmitted and untransmitted alleles, can suffer substantial genetic confounding. In addition to known biases that can arise in family-based GWASs when interactions between family members are ignored, we show that biases can also arise from gene-by-environment (G×E) interactions when parental genotypes are not distributed identically across interacting environmental and genetic backgrounds. We conclude that, while family-based studies have placed GWAS estimation on a more rigorous footing, they carry subtle issues of interpretation that arise from confounding and interactions.
Collapse
Affiliation(s)
- Carl Veller
- Department of Evolution and Ecology, and Center for Population Biology, University of California, Davis, CA 95616
| | - Graham Coop
- Department of Evolution and Ecology, and Center for Population Biology, University of California, Davis, CA 95616
| |
Collapse
|
21
|
Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet 2023; 110:179-194. [PMID: 36634672 PMCID: PMC9943775 DOI: 10.1016/j.ajhg.2022.12.011] [Citation(s) in RCA: 90] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
It has been 15 years since the advent of the genome-wide association study (GWAS) era. Here, we review how this experimental design has realized its promise by facilitating an impressive range of discoveries with remarkable impact on multiple fields, including population genetics, complex trait genetics, epidemiology, social science, and medicine. We predict that the emergence of large-scale biobanks will continue to expand to more diverse populations and capture more of the allele frequency spectrum through whole-genome sequencing, which will further improve our ability to investigate the causes and consequences of human genetic variation for complex traits and diseases.
Collapse
Affiliation(s)
- Abdel Abdellaoui
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands.
| | - Loic Yengo
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Karin J H Verweij
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
22
|
Novembre J, Stein C, Asgari S, Gonzaga-Jauregui C, Landstrom A, Lemke A, Li J, Mighton C, Taylor M, Tishkoff S. Addressing the challenges of polygenic scores in human genetic research. Am J Hum Genet 2022; 109:2095-2100. [PMID: 36459976 PMCID: PMC9808501 DOI: 10.1016/j.ajhg.2022.10.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The genotyping of millions of human samples has made it possible to evaluate variants across the human genome for their possible association with risks for numerous diseases and other traits by using genome-wide association studies (GWASs). The associations between phenotype and genotype found in GWASs make possible the construction of polygenic scores (PGSs), which aim to predict a trait or disease outcome in an individual on the basis of their genotype (in the disease case, the term polygenic risk score [PRS] is often used). PGSs have shown promise for studying the biology of complex traits and as a tool for evaluating individual disease risks in clinical settings. Although the quantity and quality of data to compute PGSs are increasing, challenges remain in the technical aspects of developing PGSs and in the ethical and social issues that might arise from their use. This ASHG Guidance emphasizes three major themes for researchers working with or interested in the application of PGSs in their own research: (1) developing diverse research cohorts; (2) fostering robustness in the development, application, and interpretation of PGSs; and (3) improving the communication of PGS results and their implications to broad audiences.
Collapse
Affiliation(s)
- John Novembre
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Chicago, Chicago, IL, USA,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA,Corresponding author
| | - Catherine Stein
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA,Corresponding author
| | - Samira Asgari
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Claudia Gonzaga-Jauregui
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - Andrew Landstrom
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Pediatrics, Division of Cardiology, Duke University School of Medicine, Durham, NC, USA
| | - Amy Lemke
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Norton Children’s Research Institute, affiliated with the University of Louisville School of Medicine, Louisville, KY, USA
| | - Jun Li
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Chloe Mighton
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Genomics Health Services Research Program, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada,Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Matthew Taylor
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Adult Medical Genetics Program, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Sarah Tishkoff
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Genetics, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA,Department of Biology, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
23
|
Novembre J. The background and legacy of Lewontin's apportionment of human genetic diversity. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200406. [PMID: 35430890 PMCID: PMC9014184 DOI: 10.1098/rstb.2020.0406] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/18/2022] [Indexed: 12/18/2022] Open
Abstract
Lewontin's 1972 article 'The apportionment of human diversity' described a key feature of human genetic diversity that would have profound impacts on conversations regarding genetics and race: the typical genetic locus varies much less between classical human race groupings than one might infer from inspecting the features historically used to define those races, like skin pigmentation. From this, Lewontin concluded: 'Human racial classification … is now seen to be of virtually no genetic or taxonomic significance' (p. 397). Here, 50 years after the paper's publication, the goal is to understand the origins and legacy of the paper. Aided by insights from published papers and interviews with several of Lewontin's contemporaries, I review the 1972 paper, asking about the intellectual background that led to the publication of the paper, the development of its impact, the critiques of the work and the work's application and limitations today. The hope is that by gaining a clearer understanding of the origin and reasoning of the paper, we might dispel various confusions about the result and sharpen an understanding of the enduring value and insight the result provides. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- John Novembre
- Department of Human Genetics, University of Chicago, Chicago, 60637, IL
- Department of Ecology and Evolution, University of Chicago, Chicago, 60637, IL
| |
Collapse
|
24
|
Edge MD, Ramachandran S, Rosenberg NA. Celebrating 50 years since Lewontin's apportionment of human diversity. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200405. [PMID: 35430889 PMCID: PMC9014183 DOI: 10.1098/rstb.2020.0405] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Michael D. Edge
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Sohini Ramachandran
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI 02912, USA
| | | |
Collapse
|
25
|
Kaplan JM, Fullerton SM. Polygenic risk, population structure and ongoing difficulties with race in human genetics. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200427. [PMID: 35430888 PMCID: PMC9014185 DOI: 10.1098/rstb.2020.0427] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
‘The Apportionment of Human Diversity’ stands as a noteworthy intervention, both for the field of human population genetics as well as in the annals of public communication of science. Despite the widespread uptake of Lewontin's conclusion that racial classification is of ‘virtually no genetic or taxonomic significance’, the biomedical research community continues to grapple with whether and how best to account for race in its work. Nowhere is this struggle more apparent than in the latest attempts to translate genetic associations with complex disease risk to clinical use in the form of polygenic risk scores, or PRS. In this perspective piece, we trace current challenges surrounding the appropriate development and clinical application of PRS in diverse patient cohorts to ongoing difficulties deciding which facets of population structure matter, and for what reasons, to human health. Despite numerous analytical innovations, there are reasons that emerge from Lewontin's work to remain sceptical that accounting for population structure in the context of polygenic risk estimation will allow us to more effectively identify and intervene on the significant health disparities which plague marginalized populations around the world. This article is part of the theme issue ‘Celebrating 50 years since Lewontin's apportionment of human diversity’.
Collapse
Affiliation(s)
| | - Stephanie M. Fullerton
- Department of Bioethics and Humanities, University of Washington School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|