1
|
Ojima T, Namba S, Suzuki K, Yamamoto K, Sonehara K, Narita A, Kamatani Y, Tamiya G, Yamamoto M, Yamauchi T, Kadowaki T, Okada Y. Body mass index stratification optimizes polygenic prediction of type 2 diabetes in cross-biobank analyses. Nat Genet 2024; 56:1100-1109. [PMID: 38862855 DOI: 10.1038/s41588-024-01782-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Accepted: 04/26/2024] [Indexed: 06/13/2024]
Abstract
Type 2 diabetes (T2D) shows heterogeneous body mass index (BMI) sensitivity. Here, we performed stratification based on BMI to optimize predictions for BMI-related diseases. We obtained BMI-stratified datasets using data from more than 195,000 individuals (nT2D = 55,284) from BioBank Japan (BBJ) and UK Biobank. T2D heritability in the low-BMI group was greater than that in the high-BMI group. Polygenic predictions of T2D toward low-BMI targets had pseudo-R2 values that were more than 22% higher than BMI-unstratified targets. Polygenic risk scores (PRSs) from low-BMI discovery outperformed PRSs from high BMI, while PRSs from BMI-unstratified discovery performed best. Pathway-specific PRSs demonstrated the biological contributions of pathogenic pathways. Low-BMI T2D cases showed higher rates of neuropathy and retinopathy. Combining BMI stratification and a method integrating cross-population effects, T2D predictions showed greater than 37% improvements over unstratified-matched-population prediction. We replicated findings in the Tohoku Medical Megabank (n = 26,000) and the second BBJ cohort (n = 33,096). Our findings suggest that target stratification based on existing traits can improve the polygenic prediction of heterogeneous diseases.
Collapse
Affiliation(s)
- Takafumi Ojima
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
| | - Shinichi Namba
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Ken Suzuki
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Kenichi Yamamoto
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Pediatrics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
- Laboratory of Children's Health and Genetics, Division of Health Science, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Kyuto Sonehara
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Akira Narita
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Yoichiro Kamatani
- Laboratory of Complex Trait Genomics, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Gen Tamiya
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Center for Advanced Intelligence Project, RIKEN, Tokyo, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Masayuki Yamamoto
- Graduate School of Medicine, Tohoku University, Sendai, Japan
- Tohoku Medical Megabank Organization, Tohoku University, Sendai, Japan
| | - Toshimasa Yamauchi
- Department of Diabetes and Metabolic Diseases, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | | | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan.
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
- Department of Genome Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan.
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Osaka, Japan.
| |
Collapse
|
2
|
Lu Z, Wang X, Carr M, Kim A, Gazal S, Mohammadi P, Wu L, Gusev A, Pirruccello J, Kachuri L, Mancuso N. Improved multi-ancestry fine-mapping identifies cis-regulatory variants underlying molecular traits and disease risk. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305836. [PMID: 38699369 PMCID: PMC11065034 DOI: 10.1101/2024.04.15.24305836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Multi-ancestry statistical fine-mapping of cis-molecular quantitative trait loci (cis-molQTL) aims to improve the precision of distinguishing causal cis-molQTLs from tagging variants. However, existing approaches fail to reflect shared genetic architectures. To solve this limitation, we present the Sum of Shared Single Effects (SuShiE) model, which leverages LD heterogeneity to improve fine-mapping precision, infer cross-ancestry effect size correlations, and estimate ancestry-specific expression prediction weights. We apply SuShiE to mRNA expression measured in PBMCs (n=956) and LCLs (n=814) together with plasma protein levels (n=854) from individuals of diverse ancestries in the TOPMed MESA and GENOA studies. We find SuShiE fine-maps cis-molQTLs for 16% more genes compared with baselines while prioritizing fewer variants with greater functional enrichment. SuShiE infers highly consistent cis-molQTL architectures across ancestries on average; however, we also find evidence of heterogeneity at genes with predicted loss-of-function intolerance, suggesting that environmental interactions may partially explain differences in cis-molQTL effect sizes across ancestries. Lastly, we leverage estimated cis-molQTL effect-sizes to perform individual-level TWAS and PWAS on six white blood cell-related traits in AOU Biobank individuals (n=86k), and identify 44 more genes compared with baselines, further highlighting its benefits in identifying genes relevant for complex disease risk. Overall, SuShiE provides new insights into the cis-genetic architecture of molecular traits.
Collapse
Affiliation(s)
- Zeyun Lu
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Xinran Wang
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Matthew Carr
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Artem Kim
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Steven Gazal
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children’s Research Institute, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Lang Wu
- Cancer Epidemiology Division, Population Sciences in the Pacific Program, University of Hawaiʻi Cancer Center, University of Hawaiʻi at Mānoa, Honolulu, HI, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| | - James Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA
| | - Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas Mancuso
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA
| |
Collapse
|
3
|
Taylor CS, Lawson DJ. Heritability of complex traits in sub-populations experiencing bottlenecks and growth. J Hum Genet 2024:10.1038/s10038-024-01249-2. [PMID: 38589509 DOI: 10.1038/s10038-024-01249-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2023] [Revised: 03/20/2024] [Accepted: 03/23/2024] [Indexed: 04/10/2024]
Abstract
Populations that have experienced a bottleneck are regularly used in Genome Wide Association Studies (GWAS) to investigate variants associated with complex traits. It is generally understood that these isolated sub-populations may experience high frequency of otherwise rare variants with large effect size, and therefore provide a unique opportunity to study said trait. However, the demographic history of the population under investigation affects all SNPs that determine the complex trait genome-wide, changing its heritability and genetic architecture. We use a simulation based approach to identify the impact of the demographic processes of drift, expansion, and migration on the heritability of complex trait. We show that demography has considerable impact on complex traits. We then investigate the power to resolve heritability of complex traits in GWAS studies subjected to demographic effects. We find that demography is an important component for interpreting inference of complex traits and has a nuanced impact on the power of GWAS. We conclude that demographic histories need to be explicitly modelled to properly quantify the history of selection on a complex trait.
Collapse
Affiliation(s)
| | - Daniel J Lawson
- School of Mathematics, University of Bristol, Bristol, UK.
- MRC Integrative Epidemiology Unit, Bristol Medical School, University of Bristol, Bristol, UK.
| |
Collapse
|
4
|
Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell 2024; 187:1059-1075. [PMID: 38428388 DOI: 10.1016/j.cell.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 01/16/2024] [Indexed: 03/03/2024]
Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sohini Ramachandran
- Ecology, Evolution and Organismal Biology, Center for Computational Molecular Biology, and the Data Science Institute, Brown University, Providence, RI 029129, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
5
|
Kolobkov D, Mishra Sharma S, Medvedev A, Lebedev M, Kosaretskiy E, Vakhitov R. Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project. Front Big Data 2024; 7:1266031. [PMID: 38487517 PMCID: PMC10937521 DOI: 10.3389/fdata.2024.1266031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 01/31/2024] [Indexed: 03/17/2024] Open
Abstract
Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on individual-level genomic data has not been studied. This study lays the groundwork for the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. Additionally, we investigate how federated model accuracy is affected by communication frequency and suggest approaches to reduce computational complexity or communication costs.
Collapse
Affiliation(s)
- Dmitry Kolobkov
- GENXT, Hinxton, United Kingdom
- Laboratory of Ecological Genetics, Vavilov Institute of General Genetics, Moscow, Russia
| | - Satyarth Mishra Sharma
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Aleksandr Medvedev
- GENXT, Hinxton, United Kingdom
- Center for Artificial Intelligence Technology, Skolkovo Institute of Science and Technology, Moscow, Russia
| | | | | | | |
Collapse
|
6
|
Janivara R, Hazra U, Pfennig A, Harlemon M, Kim MS, Eaaswarkhanth M, Chen WC, Ogunbiyi A, Kachambwa P, Petersen LN, Jalloh M, Mensah JE, Adjei AA, Adusei B, Joffe M, Gueye SM, Aisuodionoe-Shadrach OI, Fernandez PW, Rohan TE, Andrews C, Rebbeck TR, Adebiyi AO, Agalliu I, Lachance J. Uncovering the genetic architecture and evolutionary roots of androgenetic alopecia in African men. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.12.575396. [PMID: 38293167 PMCID: PMC10827056 DOI: 10.1101/2024.01.12.575396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Androgenetic alopecia is a highly heritable trait. However, much of our understanding about the genetics of male pattern baldness comes from individuals of European descent. Here, we examined a novel dataset comprising 2,136 men from Ghana, Nigeria, Senegal, and South Africa that were genotyped using a custom array. We first tested how genetic predictions of baldness generalize from Europe to Africa, finding that polygenic scores from European GWAS yielded AUC statistics that ranged from 0.513 to 0.546, indicating that genetic predictions of baldness in African populations performed notably worse than in European populations. Subsequently, we conducted the first African GWAS of androgenetic alopecia, focusing on self-reported baldness patterns at age 45. After correcting for present age, population structure, and study site, we identified 266 moderately significant associations, 51 of which were independent (p-value < 10-5, r2 < 0.2). Most baldness associations were autosomal, and the X chromosomes does not appear to have a large impact on baldness in African men. Finally, we examined the evolutionary causes of continental differences in genetic architecture. Although Neanderthal alleles have previously been associated with skin and hair phenotypes, we did not find evidence that European-ascertained baldness hits were enriched for signatures of ancient introgression. Most loci that are associated with androgenetic alopecia are evolving neutrally. However, multiple baldness-associated SNPs near the EDA2R and AR genes have large allele frequency differences between continents. Collectively, our findings illustrate how evolutionary history contributes to the limited portability of genetic predictions across ancestries.
Collapse
Affiliation(s)
- Rohini Janivara
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Ujani Hazra
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Aaron Pfennig
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Maxine Harlemon
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
- Department of Biology, Morgan State University, Baltimore, Maryland, USA
| | - Michelle S Kim
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
- Department of Human Genetics University of Michigan, Ann Arbor, Michigan, USA
| | | | - Wenlong C Chen
- Strengthening Oncology Services Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- National Cancer Registry, National Institute for Communicable Diseases a Division of the National Health Laboratory Service, Johannesburg, South Africa
| | | | - Paidamoyo Kachambwa
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
- Mediclinic Precise Southern Africa, Cape Town, South Africa
| | - Lindsay N Petersen
- Centre for Proteomic and Genomic Research, Cape Town, South Africa
- Mediclinic Precise Southern Africa, Cape Town, South Africa
| | - Mohamed Jalloh
- Université Cheikh Anta Diop de Dakar, Dakar, Senegal
- Université Iba Der Thiam de Thiès, Thiès, Senegal
| | - James E Mensah
- Korle-Bu Teaching Hospital and University of Ghana Medical School, Accra, Ghana
| | - Andrew A Adjei
- Department of Pathology, University of Ghana Medical School, Accra, Ghana
| | | | - Maureen Joffe
- Strengthening Oncology Services Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Oseremen I Aisuodionoe-Shadrach
- College of Health Sciences, University of Abuja, University of Abuja Teaching Hospital and Cancer Science Centre, Abuja, Nigeria
| | - Pedro W Fernandez
- Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| | - Thomas E Rohan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | | | - Timothy R Rebbeck
- Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | | | - Ilir Agalliu
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Joseph Lachance
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| |
Collapse
|
7
|
Chapman CR. Ethical, legal, and social implications of genetic risk prediction for multifactorial disease: a narrative review identifying concerns about interpretation and use of polygenic scores. J Community Genet 2023; 14:441-452. [PMID: 36529843 PMCID: PMC10576696 DOI: 10.1007/s12687-022-00625-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 12/04/2022] [Indexed: 12/23/2022] Open
Abstract
Advances in genomics have enabled the development of polygenic scores (PGS), sometimes called polygenic risk scores, in the context of multifactorial diseases and disorders such as cancer, cardiovascular disease, and schizophrenia. PGS estimate an individual's genetic predisposition, as compared to other members of a population, for conditions which are influenced by both genetic and environmental factors. There is significant interest in using genetic risk prediction afforded through PGS in public health, clinical care, and research settings, yet many acknowledge the need to thoughtfully consider and address ethical, legal, and social implications (ELSI). To contribute to this effort, this paper reports on a narrative review of the literature, with the aim of identifying and categorizing ELSI relating to genetic risk prediction in the context of multifactorial disease, which have been raised by scholars in the field. Ninety-two articles, spanning from 1977 to 2021, met the inclusion criteria for this study. Identified ELSI included potential benefits, challenges and risks that focused on concerns about interpretation and use, and ethical obligations to maximize benefits, minimize risks, promote justice, and support autonomy. This research will support geneticists, clinicians, genetic counselors, patients, patient advocates, and policymakers in recognizing and addressing ethical concerns associated with PGS; it will also guide future empirical and normative research.
Collapse
Affiliation(s)
- Carolyn Riley Chapman
- Department of Population Health (Division of Medical Ethics), NYU Grossman School of Medicine, New York, NY, USA.
- Center for Human Genetics and Genomics, NYU Grossman School of Medicine, Science Building, 435 E. 30th St, 8th Floor, New York, NY, 10016, USA.
| |
Collapse
|
8
|
Abdellaoui A, Yengo L, Verweij KJH, Visscher PM. 15 years of GWAS discovery: Realizing the promise. Am J Hum Genet 2023; 110:179-194. [PMID: 36634672 PMCID: PMC9943775 DOI: 10.1016/j.ajhg.2022.12.011] [Citation(s) in RCA: 50] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
It has been 15 years since the advent of the genome-wide association study (GWAS) era. Here, we review how this experimental design has realized its promise by facilitating an impressive range of discoveries with remarkable impact on multiple fields, including population genetics, complex trait genetics, epidemiology, social science, and medicine. We predict that the emergence of large-scale biobanks will continue to expand to more diverse populations and capture more of the allele frequency spectrum through whole-genome sequencing, which will further improve our ability to investigate the causes and consequences of human genetic variation for complex traits and diseases.
Collapse
Affiliation(s)
- Abdel Abdellaoui
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands.
| | - Loic Yengo
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| | - Karin J H Verweij
- Department of Psychiatry, Amsterdam UMC, University of Amsterdam, Amsterdam, the Netherlands
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
9
|
Xia X, Zhang Y, Wei Y, Wang MH. Statistical Methods for Disease Risk Prediction with Genotype Data. Methods Mol Biol 2023; 2629:331-347. [PMID: 36929084 DOI: 10.1007/978-1-0716-2986-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Single-nucleotide polymorphism (SNP) is the basic unit to understand the heritability of complex traits. One attractive application of the susceptible SNPs is to construct prediction models for assessing disease risk. Here, we introduce prediction methods for human traits using SNPs data, including the polygenic risk score (PRS), linear mixed models (LMMs), penalized regressions, and methods for controlling population stratification.
Collapse
Affiliation(s)
- Xiaoxuan Xia
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | | | - Yingying Wei
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | - Maggie Haitian Wang
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong.
- CUHK Shenzhen Institute, Shenzhen, China.
| |
Collapse
|
10
|
O'Sullivan JW, Raghavan S, Marquez-Luna C, Luzum JA, Damrauer SM, Ashley EA, O'Donnell CJ, Willer CJ, Natarajan P. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation 2022; 146:e93-e118. [PMID: 35862132 PMCID: PMC9847481 DOI: 10.1161/cir.0000000000001077] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Cardiovascular disease is the leading contributor to years lost due to disability or premature death among adults. Current efforts focus on risk prediction and risk factor mitigation' which have been recognized for the past half-century. However, despite advances, risk prediction remains imprecise with persistently high rates of incident cardiovascular disease. Genetic characterization has been proposed as an approach to enable earlier and potentially tailored prevention. Rare mendelian pathogenic variants predisposing to cardiometabolic conditions have long been known to contribute to disease risk in some families. However, twin and familial aggregation studies imply that diverse cardiovascular conditions are heritable in the general population. Significant technological and methodological advances since the Human Genome Project are facilitating population-based comprehensive genetic profiling at decreasing costs. Genome-wide association studies from such endeavors continue to elucidate causal mechanisms for cardiovascular diseases. Systematic cataloging for cardiovascular risk alleles also enabled the development of polygenic risk scores. Genetic profiling is becoming widespread in large-scale research, including in health care-associated biobanks, randomized controlled trials, and direct-to-consumer profiling in tens of millions of people. Thus, individuals and their physicians are increasingly presented with polygenic risk scores for cardiovascular conditions in clinical encounters. In this scientific statement, we review the contemporary science, clinical considerations, and future challenges for polygenic risk scores for cardiovascular diseases. We selected 5 cardiometabolic diseases (coronary artery disease, hypercholesterolemia, type 2 diabetes, atrial fibrillation, and venous thromboembolic disease) and response to drug therapy and offer provisional guidance to health care professionals, researchers, policymakers, and patients.
Collapse
|
11
|
Yair S, Coop G. Population differentiation of polygenic score predictions under stabilizing selection. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200416. [PMID: 35430887 PMCID: PMC9014188 DOI: 10.1098/rstb.2020.0416] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 03/08/2022] [Indexed: 12/15/2022] Open
Abstract
Given the many small-effect loci uncovered by genome-wide association studies (GWAS), polygenic scores have become central to genomic medicine, and have found application in diverse settings including evolutionary studies of adaptation. Despite their promise, polygenic scores have been found to suffer from limited portability across human populations. This at first seems in conflict with the observation that most common genetic variation is shared among populations. We investigate one potential cause of this discrepancy: stabilizing selection on complex traits. Counterintuitively, while stabilizing selection constrains phenotypic evolution, it accelerates the loss and fixation of alleles underlying trait variation within populations (GWAS loci). Thus even when populations share an optimum phenotype, stabilizing selection erodes the variance contributed by their shared GWAS loci, such that predictions from GWAS in one population explain less of the phenotypic variation in another. We develop theory to quantify how stabilizing selection is expected to reduce the prediction accuracy of polygenic scores in populations not represented in GWAS samples. In addition, we find that polygenic scores can substantially overstate average genetic differences of phenotypes among populations. We emphasize stabilizing selection around a common optimum as a useful null model to connect patterns of allele frequency and polygenic score differentiation. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- Sivan Yair
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Graham Coop
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| |
Collapse
|
12
|
Hayeck TJ, Stong N, Baugh E, Dhindsa R, Turner TN, Malakar A, Mosbruger TL, Shaw GTW, Duan Y, Ionita-Laza I, Goldstein D, Allen AS. Ancestry adjustment improves genome-wide estimates of regional intolerance. Genetics 2022; 221:iyac050. [PMID: 35385101 PMCID: PMC9157129 DOI: 10.1093/genetics/iyac050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Accepted: 02/24/2022] [Indexed: 11/12/2022] Open
Abstract
Genomic regions subject to purifying selection are more likely to carry disease-causing mutations than regions not under selection. Cross species conservation is often used to identify such regions but with limited resolution to detect selection on short evolutionary timescales such as that occurring in only one species. In contrast, genetic intolerance looks for depletion of variation relative to expectation within a species, allowing species-specific features to be identified. When estimating the intolerance of noncoding sequence, methods strongly leverage variant frequency distributions. As the expected distributions depend on ancestry, if not properly controlled for, ancestral population source may obfuscate signals of selection. We demonstrate that properly incorporating ancestry in intolerance estimation greatly improved variant classification. We provide a genome-wide intolerance map that is conditional on ancestry and likely to be particularly valuable for variant prioritization.
Collapse
Affiliation(s)
- Tristan J Hayeck
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nicholas Stong
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Evan Baugh
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Ryan Dhindsa
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Tychele N Turner
- Department of Genetics, Washington University in St. Louis, St. Louis, MO 63110, USA
| | - Ayan Malakar
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Timothy L Mosbruger
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Grace Tzun-Wen Shaw
- Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Yuncheng Duan
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| | | | - David Goldstein
- Institute for Genomic Medicine, Columbia University Medical Center, New York, NY 10032, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710, USA
| |
Collapse
|
13
|
Polygenic score accuracy in ancient samples: Quantifying the effects of allelic turnover. PLoS Genet 2022; 18:e1010170. [PMID: 35522704 PMCID: PMC9116686 DOI: 10.1371/journal.pgen.1010170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/18/2022] [Accepted: 03/26/2022] [Indexed: 11/19/2022] Open
Abstract
Polygenic scores link the genotypes of ancient individuals to their phenotypes, which are often unobservable, offering a tantalizing opportunity to reconstruct complex trait evolution. In practice, however, interpretation of ancient polygenic scores is subject to numerous assumptions. For one, the genome-wide association (GWA) studies from which polygenic scores are derived, can only estimate effect sizes for loci segregating in contemporary populations. Therefore, a GWA study may not correctly identify all loci relevant to trait variation in the ancient population. In addition, the frequencies of trait-associated loci may have changed in the intervening years. Here, we devise a theoretical framework to quantify the effect of this allelic turnover on the statistical properties of polygenic scores as functions of population genetic dynamics, trait architecture, power to detect significant loci, and the age of the ancient sample. We model the allele frequencies of loci underlying trait variation using the Wright-Fisher diffusion, and employ the spectral representation of its transition density to find analytical expressions for several error metrics, including the expected sample correlation between the polygenic scores of ancient individuals and their true phenotypes, referred to as polygenic score accuracy. Our theory also applies to a two-population scenario and demonstrates that allelic turnover alone may explain a substantial percentage of the reduced accuracy observed in cross-population predictions, akin to those performed in human genetics. Finally, we use simulations to explore the effects of recent directional selection, a bias-inducing process, on the statistics of interest. We find that even in the presence of bias, weak selection induces minimal deviations from our neutral expectations for the decay of polygenic score accuracy. By quantifying the limitations of polygenic scores in an explicit evolutionary context, our work lays the foundation for the development of more sophisticated statistical procedures to analyze both temporally and geographically resolved polygenic scores.
Collapse
|
14
|
Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, Kenny E, Gignoux C, Wojcik G, Crawford L, Ramachandran S. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am J Hum Genet 2022; 109:871-884. [PMID: 35349783 PMCID: PMC9118115 DOI: 10.1016/j.ajhg.2022.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/02/2022] [Indexed: 12/12/2022] Open
Abstract
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Selena Zhang
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Joseph Paik
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Misa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher Haiman
- Department of Preventative Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - T C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eimear Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO 80204, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Biostatistics, Brown University, Providence, RI 02906, USA; Microsoft Research New England, Cambridge, MA 02142, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA; Data Science Initiative, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
15
|
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Martin AR, Finucane HK, Price AL. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 2022; 54:450-458. [PMID: 35393596 PMCID: PMC9009299 DOI: 10.1038/s41588-022-01036-9] [Citation(s) in RCA: 98] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/25/2022] [Indexed: 01/25/2023]
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Collapse
Affiliation(s)
- Omer Weissbrod
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Huwenbo Shi
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- OMNI Bioinformatics, San Francisco, CA, USA
| | - Steven Gazal
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wouter J Peyrot
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
| | - Amit V Khera
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alkes L Price
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
16
|
de Miguel M, Rodríguez-Quilón I, Heuertz M, Hurel A, Grivet D, Jaramillo-Correa JP, Vendramin GG, Plomion C, Majada J, Alía R, Eckert AJ, González-Martínez SC. Polygenic adaptation and negative selection across traits, years and environments in a long-lived plant species (Pinus pinaster Ait., Pinaceae). Mol Ecol 2022; 31:2089-2105. [PMID: 35075727 DOI: 10.1111/mec.16367] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 11/30/2021] [Accepted: 01/11/2022] [Indexed: 11/26/2022]
Abstract
A decade of genetic association studies in multiple organisms suggests that most complex traits are polygenic, i.e., they have a genetic architecture determined by numerous loci each with small effect-size. Thus, determining the degree of polygenicity and its variation across traits, environments and time is crucial to understand the genetic basis of phenotypic variation. We applied multilocus approaches to estimate the degree of polygenicity of fitness-related traits in a long-lived plant (Pinus pinaster Ait., maritime pine) and to analyze this variation across environments and years. We evaluated five categories of fitness-related traits (survival, height, phenology, functional, and biotic-stress response traits) in a clonal common-garden network, planted in contrasted environments (over 12,500 trees). Most of the analyzed traits showed evidence of local adaptation based on Qst -Fst comparisons. We further observed a remarkably stable degree of polygenicity, averaging 6% (range of 0-27%), across traits, environments and years. We detected evidence of negative selection, which could explain, at least partially, the high degree of polygenicity. Because polygenic adaptation can occur rapidly, our results suggest that current predictions on the capacity of natural forest tree populations to adapt to new environments should be revised, especially in the current context of climate change.
Collapse
Affiliation(s)
- Marina de Miguel
- INRAE, Univ. Bordeaux, BIOGECO, F-33610, Cestas, France.,EGFV, Univ. Bordeaux, Bordeaux Sciences Agro, INRAE, ISVV, F-33882, Villenave d'Ornon, France
| | - Isabel Rodríguez-Quilón
- Department of Forest Ecology and Genetics, Forest Research Centre, INIA, Carretera de la Coruña km 7.5, 28040, Madrid, Spain
| | | | - Agathe Hurel
- INRAE, Univ. Bordeaux, BIOGECO, F-33610, Cestas, France
| | - Delphine Grivet
- Department of Forest Ecology and Genetics, Forest Research Centre, INIA, Carretera de la Coruña km 7.5, 28040, Madrid, Spain
| | - Juan-Pablo Jaramillo-Correa
- Department of Evolutionary Ecology, Institute of Ecology, Universidad Nacional Autónoma de México, AP 70-275, México City, CDMX 04510, Mexico
| | - Giovanni G Vendramin
- Institute of Biosciences and Bioresources, Division of Florence, National Research Council, 50019, Sesto Fiorentino (FI), Italy
| | | | - Juan Majada
- Sección Forestal, SERIDA, Finca Experimental ''La Mata'', 33820, Grado, Principado de Asturias, Spain
| | - Ricardo Alía
- EGFV, Univ. Bordeaux, Bordeaux Sciences Agro, INRAE, ISVV, F-33882, Villenave d'Ornon, France
| | - Andrew J Eckert
- Department of Biology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | | |
Collapse
|
17
|
Natri HM, Hudjashov G, Jacobs G, Kusuma P, Saag L, Darusallam CC, Metspalu M, Sudoyo H, Cox MP, Gallego Romero I, Banovich NE. Genetic architecture of gene regulation in Indonesian populations identifies QTLs associated with global and local ancestries. Am J Hum Genet 2022; 109:50-65. [PMID: 34919805 PMCID: PMC8764200 DOI: 10.1016/j.ajhg.2021.11.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 11/16/2021] [Indexed: 02/07/2023] Open
Abstract
Lack of diversity in human genomics limits our understanding of the genetic underpinnings of complex traits, hinders precision medicine, and contributes to health disparities. To map genetic effects on gene regulation in the underrepresented Indonesian population, we have integrated genotype, gene expression, and CpG methylation data from 115 participants across three island populations that capture the major sources of genomic diversity in the region. In a comparison with European datasets, we identify eQTLs shared between Indonesia and Europe as well as population-specific eQTLs that exhibit differences in allele frequencies and/or overall expression levels between populations. By combining local ancestry and archaic introgression inference with eQTLs and methylQTLs, we identify regulatory loci driven by modern Papuan ancestry as well as introgressed Denisovan and Neanderthal variation. GWAS colocalization connects QTLs detected here to hematological traits, and further comparison with European datasets reflects the poor overall transferability of GWAS statistics across diverse populations. Our findings illustrate how population-specific genetic architecture, local ancestry, and archaic introgression drive variation in gene regulation across genetically distinct and in admixed populations and highlight the need for performing association studies on non-European populations.
Collapse
Affiliation(s)
- Heini M Natri
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ 85281, USA; The Translational Genomics Research Institute, Phoenix, AZ 85004, USA
| | - Georgi Hudjashov
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North 4410, New Zealand; Centre for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Guy Jacobs
- Leverhulme Centre for Human Evolutionary Studies, Department of Archaeology, University of Cambridge, Cambridge CB2 1QH, UK; Complexity Institute, Nanyang Technological University, Singapore, 637460
| | - Pradiptajati Kusuma
- Complexity Institute, Nanyang Technological University, Singapore, 637460; Laboratory of Genome Diversity and Disease, Eijkman Institute for Molecular Biology, Jakarta 10430, Indonesia
| | - Lauri Saag
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Chelzie Crenna Darusallam
- Laboratory of Genome Diversity and Disease, Eijkman Institute for Molecular Biology, Jakarta 10430, Indonesia
| | - Mait Metspalu
- Institute of Genomics, University of Tartu, Tartu 51010, Estonia
| | - Herawati Sudoyo
- Laboratory of Genome Diversity and Disease, Eijkman Institute for Molecular Biology, Jakarta 10430, Indonesia
| | - Murray P Cox
- Statistics and Bioinformatics Group, School of Fundamental Sciences, Massey University, Palmerston North 4410, New Zealand
| | - Irene Gallego Romero
- Centre for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu 51010, Estonia; Melbourne Integrative Genomics, University of Melbourne, Parkville, VIC 3010, Australia; School of BioSciences, University of Melbourne, Parkville, VIC 3010, Australia; Centre for Stem Cell Systems, University of Melbourne, Parkville, VIC 3010, Australia
| | | |
Collapse
|
18
|
Sohail M, Izarraras-Gomez A, Ortega-Del Vecchyo D. Populations, Traits, and Their Spatial Structure in Humans. Genome Biol Evol 2021; 13:evab272. [PMID: 34894236 PMCID: PMC8715524 DOI: 10.1093/gbe/evab272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2021] [Indexed: 11/16/2022] Open
Abstract
The spatial distribution of genetic variants is jointly determined by geography, past demographic processes, natural selection, and its interplay with environmental variation. A fraction of these genetic variants are "causal alleles" that affect the manifestation of a complex trait. The effect exerted by these causal alleles on complex traits can be independent or dependent on the environment. Understanding the evolutionary processes that shape the spatial structure of causal alleles is key to comprehend the spatial distribution of complex traits. Natural selection, past population size changes, range expansions, consanguinity, assortative mating, archaic introgression, admixture, and the environment can alter the frequencies, effect sizes, and heterozygosities of causal alleles. This provides a genetic axis along which complex traits can vary. However, complex traits also vary along biogeographical and sociocultural axes which are often correlated with genetic axes in complex ways. The purpose of this review is to consider these genetic and environmental axes in concert and examine the ways they can help us decipher the variation in complex traits that is visible in humans today. This initiative necessarily implies a discussion of populations, traits, the ability to infer and interpret "genetic" components of complex traits, and how these have been impacted by adaptive events. In this review, we provide a history-aware discussion on these topics using both the recent and more distant past of our academic discipline and its relevant contexts.
Collapse
Affiliation(s)
- Mashaal Sohail
- Department of Human Genetics, University of Chicago, USA
- Centro de Ciencias Genómicas (CCG), Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Morelos, México
| | - Alan Izarraras-Gomez
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| | - Diego Ortega-Del Vecchyo
- Laboratorio Internacional de Investigación sobre el Genoma Humano (LIIGH), Universidad Nacional Autónoma de México (UNAM), Juriquilla, Querétaro, México
| |
Collapse
|
19
|
Irving-Pease EK, Muktupavela R, Dannemann M, Racimo F. Quantitative Human Paleogenetics: What can Ancient DNA Tell us About Complex Trait Evolution? Front Genet 2021; 12:703541. [PMID: 34422004 PMCID: PMC8371751 DOI: 10.3389/fgene.2021.703541] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/08/2021] [Indexed: 12/13/2022] Open
Abstract
Genetic association data from national biobanks and large-scale association studies have provided new prospects for understanding the genetic evolution of complex traits and diseases in humans. In turn, genomes from ancient human archaeological remains are now easier than ever to obtain, and provide a direct window into changes in frequencies of trait-associated alleles in the past. This has generated a new wave of studies aiming to analyse the genetic component of traits in historic and prehistoric times using ancient DNA, and to determine whether any such traits were subject to natural selection. In humans, however, issues about the portability and robustness of complex trait inference across different populations are particularly concerning when predictions are extended to individuals that died thousands of years ago, and for which little, if any, phenotypic validation is possible. In this review, we discuss the advantages of incorporating ancient genomes into studies of trait-associated variants, the need for models that can better accommodate ancient genomes into quantitative genetic frameworks, and the existing limits to inferences about complex trait evolution, particularly with respect to past populations.
Collapse
Affiliation(s)
- Evan K. Irving-Pease
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Rasa Muktupavela
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Michael Dannemann
- Center for Genomics, Evolution and Medicine, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Fernando Racimo
- Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
20
|
Tellier LCAM, Eccles J, Treff NR, Lello L, Fishel S, Hsu S. Embryo Screening for Polygenic Disease Risk: Recent Advances and Ethical Considerations. Genes (Basel) 2021; 12:1105. [PMID: 34440279 PMCID: PMC8393569 DOI: 10.3390/genes12081105] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 06/25/2021] [Accepted: 07/06/2021] [Indexed: 11/16/2022] Open
Abstract
Machine learning methods applied to large genomic datasets (such as those used in GWAS) have led to the creation of polygenic risk scores (PRSs) that can be used identify individuals who are at highly elevated risk for important disease conditions, such as coronary artery disease (CAD), diabetes, hypertension, breast cancer, and many more. PRSs have been validated in large population groups across multiple continents and are under evaluation for widespread clinical use in adult health. It has been shown that PRSs can be used to identify which of two individuals is at a lower disease risk, even when these two individuals are siblings from a shared family environment. The relative risk reduction (RRR) from choosing an embryo with a lower PRS (with respect to one chosen at random) can be quantified by using these sibling results. New technology for precise embryo genotyping allows more sophisticated preimplantation ranking with better results than the current method of selection that is based on morphology. We review the advances described above and discuss related ethical considerations.
Collapse
Affiliation(s)
- Laurent C. A. M. Tellier
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Jennifer Eccles
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Nathan R. Treff
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| | - Simon Fishel
- CARE Fertility Group, Nottingham NG8 6PZ, UK;
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool L2 2QP, UK
| | - Stephen Hsu
- Department of Physics and Astronomy, Michigan State University, East Lansing, MI 48824, USA; (L.C.A.M.T.); (S.H.)
- Genomic Prediction, Inc., North Brunswick, NJ 08902, USA; (J.E.); (N.R.T.)
| |
Collapse
|
21
|
Davies RW, Kucka M, Su D, Shi S, Flanagan M, Cunniff CM, Chan YF, Myers S. Rapid genotype imputation from sequence with reference panels. Nat Genet 2021; 53:1104-1111. [PMID: 34083788 PMCID: PMC7611184 DOI: 10.1038/s41588-021-00877-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 04/23/2021] [Indexed: 12/30/2022]
Abstract
Inexpensive genotyping methods are essential to modern genomics. Here we present QUILT, which performs diploid genotype imputation using low-coverage whole genome sequence data. QUILT employs Gibbs sampling to partition reads into maternal and paternal sets, facilitating rapid haploid imputation using large reference panels. We show this partitioning to be accurate over many megabases, enabling highly accurate imputation close to theoretical limits and outperforming existing methods. Moreover, QUILT can impute accurately using diverse technologies, including using long reads from Oxford Nanopore Technologies, and a novel form of low-cost barcoded Illumina sequencing called haplotagging, with the latter showing improved accuracy at low coverages. Relative to DNA genotyping microarrays, QUILT offers improved accuracy at reduced cost, particularly for diverse populations that are traditionally underserved in modern genomic analyses, with accuracy nearly doubling at rare SNPs. Finally, QUILT can accurately impute (4-digit) HLA types, the first such method from low-coverage sequence data.
Collapse
Affiliation(s)
| | - Marek Kucka
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Dingwen Su
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Sinan Shi
- Department of Statistics, University of Oxford, Oxford, UK
| | - Maeve Flanagan
- Department of Pediatrics, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Simon Myers
- Department of Statistics, University of Oxford, Oxford, UK.,The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
22
|
Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM, Gravel S, Daly MJ, Bustamante CD, Kenny EE. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet 2020; 107:788-789. [PMID: 33007199 PMCID: PMC7536609 DOI: 10.1016/j.ajhg.2020.08.020] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
|