1
|
Moreno-Grau S, Vernekar M, Lopez-Pineda A, Mas-Montserrat D, Barrabés M, Quinto-Cortés CD, Moatamed B, Lee MTM, Yu Z, Numakura K, Matsuda Y, Wall JD, Ioannidis AG, Katsanis N, Takano T, Bustamante CD. Polygenic risk score portability for common diseases across genetically diverse populations. Hum Genomics 2024; 18:93. [PMID: 39218908 PMCID: PMC11367857 DOI: 10.1186/s40246-024-00664-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 08/19/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Polygenic risk scores (PRS) derived from European individuals have reduced portability across global populations, limiting their clinical implementation at worldwide scale. Here, we investigate the performance of a wide range of PRS models across four ancestry groups (Africans, Europeans, East Asians, and South Asians) for 14 conditions of high-medical interest. METHODS To select the best-performing model per trait, we first compared PRS performances for publicly available scores, and constructed new models using different methods (LDpred2, PRS-CSx and SNPnet). We used 285 K European individuals from the UK Biobank (UKBB) for training and 18 K, including diverse ancestries, for testing. We then evaluated PRS portability for the best models in Europeans and compared their accuracies with respect to the best PRS per ancestry. Finally, we validated the selected PRS models using an independent set of 8,417 individuals from Biobank of the Americas-Genomelink (BbofA-GL); and performed a PRS-Phewas. RESULTS We confirmed a decay in PRS performances relative to Europeans when the evaluation was conducted using the best-PRS model for Europeans (51.3% for South Asians, 46.6% for East Asians and 39.4% for Africans). We observed an improvement in the PRS performances when specifically selecting ancestry specific PRS models (phenotype variance increase: 1.62 for Africans, 1.40 for South Asians and 0.96 for East Asians). Additionally, when we selected the optimal model conditional on ancestry for CAD, HDL-C and LDL-C, hypertension, hypothyroidism and T2D, PRS performance for studied populations was more comparable to what was observed in Europeans. Finally, we were able to independently validate tested models for Europeans, and conducted a PRS-Phewas, identifying cross-trait interplay between cardiometabolic conditions, and between immune-mediated components. CONCLUSION Our work comprehensively evaluated PRS accuracy across a wide range of phenotypes, reducing the uncertainty with respect to which PRS model to choose and in which ancestry group. This evaluation has let us identify specific conditions where implementing risk-prioritization strategies could have practical utility across diverse ancestral groups, contributing to democratizing the implementation of PRS.
Collapse
Affiliation(s)
- Sonia Moreno-Grau
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road, Stanford, CA, 94305, USA
| | - Manvi Vernekar
- Genomelink, Inc, 2150 Shattuck Avenue, Berkeley, CA, 94704, USA
| | - Arturo Lopez-Pineda
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
- , Amphora Health. Batallon Independencia 80, Morelia, Michoacan, 58260, Mexico
- Escuela Nacional de Estudios Superiores, Unidad Morelia, Universidad Nacional Autonoma de México, Antigua Carretera a Pátzcuaro No. 8701, Col. Ex Hacienda de San José de la Huerta, Morelia, Michoacán, C.P. 58190, Mexico
| | | | - Míriam Barrabés
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
| | | | - Babak Moatamed
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
| | | | - Zhenning Yu
- Genomelink, Inc, 2150 Shattuck Avenue, Berkeley, CA, 94704, USA
| | | | - Yuta Matsuda
- Genomelink, Inc, 2150 Shattuck Avenue, Berkeley, CA, 94704, USA
| | - Jeffrey D Wall
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
| | - Alexander G Ioannidis
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road, Stanford, CA, 94305, USA
- University of California Santa Cruz, 1156 High Street, Santa Cruz, CA, 95064, USA
| | | | - Tomohiro Takano
- Genomelink, Inc, 2150 Shattuck Avenue, Berkeley, CA, 94704, USA.
- Japan: Awakens Japan K.K. (Japanese subsidiary of Genomelink, Inc.), 2-11-3, Meguro, Meguro-ku, 1530063, Tokyo, Japan.
| | - Carlos D Bustamante
- Galatea Bio, Inc, 14350 Commerce Way, Miami Lakes, FL, 33146, USA.
- Department of Biomedical Data Science, Stanford University School of Medicine, 1265 Welch Road, Stanford, CA, 94305, USA.
| |
Collapse
|
2
|
Barreiro RAS, de Almeida TF, Gomes C, Monfardini F, de Farias AA, Tunes GC, de Souza GM, Duim E, de Sá Correia J, Campos Coelho AV, Caraciolo MP, Oliveira Duarte YA, Zatz M, Amaro E, Oliveira JB, Bitarello BD, Brentani H, Naslavsky MS. Assessing the Risk Stratification of Breast Cancer Polygenic Risk Scores in a Brazilian Cohort. J Mol Diagn 2024; 26:825-831. [PMID: 38972593 DOI: 10.1016/j.jmoldx.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 05/10/2024] [Accepted: 06/11/2024] [Indexed: 07/09/2024] Open
Abstract
Polygenic risk scores (PRSs) for breast cancer have a clear clinical utility in risk prediction. PRS transferability across populations and ancestry groups is hampered by population-specific factors, ultimately leading to differences in variant effects, such as linkage disequilibrium and differences in variant frequency (allele frequency differences). Thus, locally sourced population-based phenotypic and genomic data sets are essential to assess the validity of PRSs derived from signals detected across populations. This study assesses the transferability of a breast cancer PRS composed of 313 risk variants (313-PRS) in a Brazilian trihybrid admixed ancestries (European, African, and Native American) whole-genome sequenced cohort, the Rare Genomes Project. 313-PRS was computed in the Rare Genomes Project (n = 853) using the UK Biobank (UKBB; n = 264,307) as reference. The Brazilian cohorts have a high European ancestry (EA) component, with allele frequency differences and to a lesser extent linkage disequilibrium patterns similar to those found in EA populations. The 313-PRS distribution was found to be inflated when compared with that of the UKBB, leading to potential overestimation of PRS-based risk if EA is taken as a standard. However, case controls lead to equivalent predictive power when compared with UKBB-EA samples with area under the receiver operating characteristic curve values of 0.66 to 0.62 compared with 0.63 for UKBB.
Collapse
Affiliation(s)
- Rodrigo A S Barreiro
- Departament of Biochemistry, University of São Paulo, São Paulo, Brazil; Hospital Israelita Albert Einstein, São Paulo, Brazil
| | | | - Catarina Gomes
- Hospital Israelita Albert Einstein, São Paulo, Brazil; Institute of Psychiatry, University of São Paulo, Medical School, São Paulo, Brazil
| | | | | | | | | | - Etienne Duim
- Big Data and Analytics Department, Hospital Israelita Albert Einstein, São Paulo, Brazil
| | | | | | | | - Yeda A Oliveira Duarte
- Medical-Surgical Nursing Department, School of Nursing, University of São Paulo, São Paulo, Brazil; Epidemiology Department, Public Health School, University of São Paulo, São Paulo, Brazil
| | - Mayana Zatz
- Human Genome and Stem Cell Research Center, University of São Paulo, São Paulo, Brazil; Department of Genetics and Evolutionary Biology, Biosciences Institute, University of São Paulo, São Paulo, Brazil
| | - Edson Amaro
- Hospital Israelita Albert Einstein, São Paulo, Brazil
| | | | | | - Helena Brentani
- Hospital Israelita Albert Einstein, São Paulo, Brazil; Institute of Psychiatry, University of São Paulo, Medical School, São Paulo, Brazil
| | - Michel S Naslavsky
- Hospital Israelita Albert Einstein, São Paulo, Brazil; Human Genome and Stem Cell Research Center, University of São Paulo, São Paulo, Brazil; Department of Genetics and Evolutionary Biology, Biosciences Institute, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
3
|
Cho HW, Jin HS, Kim SS, Eom YB. Forensic height estimation using polygenic score in Korean population. Mol Genet Genomics 2024; 299:78. [PMID: 39120737 DOI: 10.1007/s00438-024-02172-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 07/30/2024] [Indexed: 08/10/2024]
Abstract
Height is known to be a classically heritable trait controlled by complex polygenic factors. Numerous height-associated genetic variants across the genome have been identified so far. It is also a representative of externally visible characteristics (EVC) for predicting appearance in forensic science. When biological evidence at a crime scene is deficient in identifying an individual, the examination of forensic DNA phenotyping using some genetic variants could be considered. In this study, we aimed to predict 'height', a representative forensic phenotype, by using a small number of genetic variants when short tandem repeat (STR) analysis is hard with insufficient biological samples. Our results not only replicated previous genetic signals but also indicated an upward trend in polygenic score (PGS) with increasing height in the validation and replication stages for both genders. These results demonstrate that the established SNP sets in this study could be used for height estimation in the Korean population. Specifically, since the PGS model constructed in this study targets only a small number of SNPs, it contributes to enabling forensic DNA phenotyping even at crime scenes with a minimal amount of biological evidence. To the best of our knowledge, this was the first study to evaluate a PGS model for height estimation in the Korean population using GWAS signals. Our study offers insight into the polygenic effect of height in East Asians, incorporating genetic variants from non-Asian populations.
Collapse
Affiliation(s)
- Hye-Won Cho
- Department of Medical Sciences, Graduate School, Soonchunhyang University, Asan, 31538, Chungnam, Republic of Korea
| | - Hyun-Seok Jin
- Department of Biomedical Laboratory Science, College of Life and Health Sciences, Hoseo University, Asan, 31499, Chungnam, Republic of Korea
| | - Sung-Soo Kim
- Department of Biomedical Laboratory Science, College of Life and Health Sciences, Hoseo University, Asan, 31499, Chungnam, Republic of Korea
| | - Yong-Bin Eom
- Department of Medical Sciences, Graduate School, Soonchunhyang University, Asan, 31538, Chungnam, Republic of Korea.
- Department of Biomedical Laboratory Science, College of Medical Sciences, Soonchunhyang University, 22 Soonchunhyang-ro, Sinchang-myeon, Asan-si, 31538, Chungcheongnam-do, Republic of Korea.
| |
Collapse
|
4
|
Hou K, Xu Z, Ding Y, Mandla R, Shi Z, Boulier K, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. Nat Genet 2024; 56:1386-1396. [PMID: 38886587 DOI: 10.1038/s41588-024-01792-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 05/08/2024] [Indexed: 06/20/2024]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields. We show that PGS performance varies broadly across contexts and biobanks. Contexts such as age, sex and income can impact PGS accuracy with similar magnitudes as genetic ancestry. Here we introduce an approach (CalPred) that models all contexts jointly to produce prediction intervals that vary across contexts to achieve calibration (include the trait with 90% probability), whereas existing methods are miscalibrated. In analyses of 72 traits across large and diverse biobanks (All of Us and UK Biobank), we find that prediction intervals required adjustment by up to 80% for quantitative traits. For disease traits, PGS-based predictions were miscalibrated across socioeconomic contexts such as annual household income levels, further highlighting the need of accounting for context information in PGS-based prediction across diverse populations.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
| | - Ziqi Xu
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Ravi Mandla
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Zhuozheng Shi
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
- Institute for Precision Health, University of California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
5
|
Schwantes-An TH, Whitfield JB, Aithal GP, Atkinson SR, Bataller R, Botwin G, Chalasani NP, Cordell HJ, Daly AK, Darlay R, Day CP, Eyer F, Foroud T, Gawrieh S, Gleeson D, Goldman D, Haber PS, Jacquet JM, Lammert CS, Liang T, Liangpunsakul S, Masson S, Mathurin P, Moirand R, McQuillin A, Moreno C, Morgan MY, Mueller S, Müllhaupt B, Nagy LE, Nahon P, Nalpas B, Naveau S, Perney P, Pirmohamed M, Seitz HK, Soyka M, Stickel F, Thompson A, Thursz MR, Trépo E, Morgan TR, Seth D. A polygenic risk score for alcohol-associated cirrhosis among heavy drinkers with European ancestry. Hepatol Commun 2024; 8:e0431. [PMID: 38727677 PMCID: PMC11093576 DOI: 10.1097/hc9.0000000000000431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/01/2023] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Polygenic Risk Scores (PRS) based on results from genome-wide association studies offer the prospect of risk stratification for many common and complex diseases. We developed a PRS for alcohol-associated cirrhosis by comparing single-nucleotide polymorphisms among patients with alcohol-associated cirrhosis (ALC) versus drinkers who did not have evidence of liver fibrosis/cirrhosis. METHODS Using a data-driven approach, a PRS for ALC was generated using a meta-genome-wide association study of ALC (N=4305) and an independent cohort of heavy drinkers with ALC and without significant liver disease (N=3037). It was validated in 2 additional independent cohorts from the UK Biobank with diagnosed ALC (N=467) and high-risk drinking controls (N=8981) and participants in the Indiana Biobank Liver cohort with alcohol-associated liver disease (N=121) and controls without liver disease (N=3239). RESULTS A 20-single-nucleotide polymorphisms PRS for ALC (PRSALC) was generated that stratified risk for ALC comparing the top and bottom deciles of PRS in the 2 validation cohorts (ORs: 2.83 [95% CI: 1.82 -4.39] in UK Biobank; 4.40 [1.56 -12.44] in Indiana Biobank Liver cohort). Furthermore, PRSALC improved the prediction of ALC risk when added to the models of clinically known predictors of ALC risk. It also stratified the risk for metabolic dysfunction -associated steatotic liver disease -cirrhosis (3.94 [2.23 -6.95]) in the Indiana Biobank Liver cohort -based exploratory analysis. CONCLUSIONS PRSALC incorporates 20 single-nucleotide polymorphisms, predicts increased risk for ALC, and improves risk stratification for ALC compared with the models that only include clinical risk factors. This new score has the potential for early detection of heavy drinking patients who are at high risk for ALC.
Collapse
Affiliation(s)
- Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis IN, USA
| | - John B. Whitfield
- Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Queensland 4029, Australia
| | - Guruprasad P. Aithal
- NIHR Nottingham Biomedical Research Centre, Nottingham University Hospitals and the University of Nottingham, Nottingham NG7 2UH, UK
| | - Stephen R. Atkinson
- Department of Metabolism, Digestion & Reproduction, Imperial College London, UK
| | - Ramon Bataller
- Center for Liver Diseases, University of Pittsburgh Medical Center, 3471 Fifth Avenue, Pittsburgh, PA 15213, USA
| | - Greg Botwin
- Department of Veterans Affairs, VA Long Beach Healthcare System, 5901 East Seventh Street, Long Beach, CA 90822, USA
- F. Widjaja Family Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, California CA 90048, USA
| | - Naga P. Chalasani
- Department of Medicine, Indiana University, Indianapolis, IN 46202-5175, USA
| | - Heather J. Cordell
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne NE1 3BZ, UK
| | - Ann K. Daly
- Faculty of Medical Sciences, Newcastle University Medical School, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
| | - Rebecca Darlay
- Population Health Sciences Institute, Faculty of Medical Sciences, Newcastle University, International Centre for Life, Central Parkway, Newcastle upon Tyne NE1 3BZ, UK
| | - Christopher P. Day
- Newcastle University, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
| | - Florian Eyer
- Division of Clinical Toxicology, Department of Internal Medicine 2, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Ismaninger Str. 22, 81675 Munich, Germany
| | - Tatiana Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis IN, USA
| | - Samer Gawrieh
- Department of Medicine, Indiana University, Indianapolis, IN 46202-5175, USA
| | - Dermot Gleeson
- Liver Unit, Sheffield Teaching Hospitals, AO Floor Robert Hadfield Building, Northern General Hospital, Sheffield S5 7AU, UK
| | - David Goldman
- Office of the Clinical Director and Laboratory of Neurogenetics, NIAAA, Bethesda, MD 20952, USA
| | - Paul S. Haber
- Edith Collins Centre (Translational Research in Alcohol Drugs and Toxicology), Sydney Local Health District, Missenden Road, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, the University of Sydney, Sydney, NSW 2006, Australia
| | | | - Craig S. Lammert
- Department of Medicine, Indiana University, Indianapolis, IN 46202-5175, USA
| | - Tiebing Liang
- Department of Medicine, Indiana University, Indianapolis, IN 46202-5175, USA
| | - Suthat Liangpunsakul
- Division of Gastroenterology and Hepatology, Department of Medicine, Indiana University and Roudebush Veterans Administration Medical Center, Indianapolis, USA
| | - Steven Masson
- Faculty of Medical Sciences, Newcastle University Medical School, Framlington Place, Newcastle upon Tyne NE2 4HH, UK
| | - Philippe Mathurin
- CHRU de Lille, Hôpital Claude Huriez, Rue M. Polonovski CS 70001, 59 037 Lille Cedex, France
| | - Romain Moirand
- Univ Rennes, INRA, INSERM, CHU Rennes, Institut NUMECAN (Nutrition Metabolisms and Cancer), F-35000 Rennes, France
| | - Andrew McQuillin
- Molecular Psychiatry Laboratory, Division of Psychiatry, University College London, London WC1E 6DE, UK
| | - Christophe Moreno
- CUB Hôpital Erasme, Université Libre de Bruxelles, clinique d’Hépatologie, Brussels, Belgium; Laboratory of Experimental Gastroenterology, Université Libre de Bruxelles, Brussels, Belgium
| | - Marsha Y. Morgan
- UCL Institute for Liver & Digestive Health, Division of Medicine, Royal Free Campus, University College London, London NW3 2PF, UK
| | - Sebastian Mueller
- Department of Internal Medicine, Salem Medical Center and Center for Alcohol Research, University of Heidelberg, Zeppelinstraße 11-33, 69121 Heidelberg, Germany
| | - Beat Müllhaupt
- Department of Gastroenterology and Hepatology, University Hospital Zurich, Rämistrasse 100, CH-8901 Zurich, Switzerland
| | - Laura E. Nagy
- Lerner Research Institute, 9500 Euclid Avenue, Cleveland, Ohio, OH 44195, USA
| | - Pierre Nahon
- Service d'Hépatologie, APHP Hôpital Avicenne et Université Paris 13, Bobigny, France
- University Paris 13, Bobigny, France
- Inserm U1162 Génomique fonctionnelle des tumeurs solides, Paris, France
| | - Bertrand Nalpas
- Service Addictologie, CHRU Caremeau, 30029 Nîmes, France
- DISC, Inserm, 75013 Paris, France
| | - Sylvie Naveau
- Hôpital Antoine-Béclère, 157 Rue de la Porte de Trivaux, 92140 Clamart, France
| | - Pascal Perney
- Hôpital Universitaire Caremeau, Place du Pr. Robert Debre, 30029 Nîmes, France
| | - Munir Pirmohamed
- MRC Centre for Drug Safety Science, Liverpool Centre for Alcohol Research, University of Liverpool, The Royal Liverpool and Broadgreen University Hospitals NHS Trust, and Liverpool Health Partners, Liverpool, L69 3GL, UK
| | - Helmut K. Seitz
- Department of Internal Medicine, Salem Medical Center and Center for Alcohol Research, University of Heidelberg, Zeppelinstraße 11-33, 69121 Heidelberg, Germany
| | - Michael Soyka
- Psychiatric Hospital University of Munich, Nussbaumsstr.7, 80336 Munich, Germany
| | - Felix Stickel
- Department of Gastroenterology and Hepatology, University Hospital Zurich, Rämistrasse 100, CH-8901 Zurich, Switzerland
| | - Andrew Thompson
- MRC Centre for Drug Safety Science, Liverpool Centre for Alcohol Research, University of Liverpool, The Royal Liverpool and Broadgreen University Hospitals NHS Trust, and Liverpool Health Partners, Liverpool, L69 3GL, UK
- Health Analytics, Lane Clark & Peacock LLP, London, UK
| | - Mark R. Thursz
- Department of Metabolism, Digestion & Reproduction, Imperial College London, UK
| | - Eric Trépo
- CUB Hôpital Erasme, Université Libre de Bruxelles, clinique d’Hépatologie, Brussels, Belgium; Laboratory of Experimental Gastroenterology, Université Libre de Bruxelles, Brussels, Belgium
| | - Timothy R. Morgan
- Department of Medicine, University of California, Irvine, USA
- Department of Veterans Affairs, VA Long Beach Healthcare System, 5901 East Seventh Street, Long Beach, CA 90822, USA
| | - Devanshi Seth
- Edith Collins Centre (Translational Research in Alcohol Drugs and Toxicology), Sydney Local Health District, Missenden Road, Camperdown, NSW 2050, Australia
- Faculty of Medicine and Health, the University of Sydney, Sydney, NSW 2006, Australia
- Centenary Institute of Cancer Medicine and Cell Biology, the University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
6
|
Garro-Núñez D, Picado-Martínez MJ, Espinoza-Campos E, Ugalde-Araya D, Macaya G, Raventós H, Chavarría-Soley G. Systematic exploration of a decade of publications on psychiatric genetics in Latin America. Am J Med Genet B Neuropsychiatr Genet 2024; 195:e32960. [PMID: 37860990 DOI: 10.1002/ajmg.b.32960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 08/08/2023] [Accepted: 09/29/2023] [Indexed: 10/21/2023]
Abstract
Psychiatric disorders have a great impact in terms of mortality, morbidity, and disability across the lifespan. Considerable effort has been devoted to understanding their complex and heterogeneous genetic architecture, including diverse ancestry populations. Our aim was to review the psychiatric genetics research published with Latin American populations from 2010 to 2019, and classify it according to country of origin, type of analysis, source of funding, and other variables. We found that most publications came from Brazil, Mexico, and Colombia. Also, local funds are generally not large enough for genome-wide studies in Latin America, with the exception of Brazil and Mexico; larger studies are often done in collaboration with international partners, mostly funded by US agencies. In most of the larger studies, the participants are individuals of Latin American ancestry living in the United States, which limits the potential for exploring the complex gene-environment interaction. Family studies, traditionally strong in Latin America, represent about 30% of the total research publications. Scarce local resources for research in Latin America have probably been an important limitation for conducting bigger and more complex studies, contributing to the reduced representation of these populations in global psychiatric genetics studies. Increasing diversity must be a goal to improve generalizability and applicability in clinical settings.
Collapse
Affiliation(s)
| | | | | | - Daniela Ugalde-Araya
- Center for Research in Cellular and Molecular Biology, Universidad de Costa Rica, San José, Costa Rica
| | - Gabriel Macaya
- Center for Research in Cellular and Molecular Biology, Universidad de Costa Rica, San José, Costa Rica
| | - Henriette Raventós
- Biology School, Universidad de Costa Rica, San José, Costa Rica
- Center for Research in Cellular and Molecular Biology, Universidad de Costa Rica, San José, Costa Rica
| | - Gabriela Chavarría-Soley
- Biology School, Universidad de Costa Rica, San José, Costa Rica
- Center for Research in Cellular and Molecular Biology, Universidad de Costa Rica, San José, Costa Rica
| |
Collapse
|
7
|
Hou K, Gogarten S, Kim J, Hua X, Dias JA, Sun Q, Wang Y, Tan T, Atkinson EG, Martin A, Shortt J, Hirbo J, Li Y, Pasaniuc B, Zhang H. Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations. Bioinformatics 2024; 40:btae148. [PMID: 38490256 PMCID: PMC10980565 DOI: 10.1093/bioinformatics/btae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 02/08/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024] Open
Abstract
SUMMARY Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Stephanie Gogarten
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, United States
| | - Joohyun Kim
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| | - Julie-Alexia Dias
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02120, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Alicia Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Jonathan Shortt
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, United States
| | - Jibril Hirbo
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| |
Collapse
|
8
|
Lappalainen T, Li YI, Ramachandran S, Gusev A. Genetic and molecular architecture of complex traits. Cell 2024; 187:1059-1075. [PMID: 38428388 DOI: 10.1016/j.cell.2024.01.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 12/20/2023] [Accepted: 01/16/2024] [Indexed: 03/03/2024]
Abstract
Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.
Collapse
Affiliation(s)
- Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Sohini Ramachandran
- Ecology, Evolution and Organismal Biology, Center for Computational Molecular Biology, and the Data Science Institute, Brown University, Providence, RI 029129, USA
| | - Alexander Gusev
- Harvard Medical School and Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
9
|
Sun Q, Rowland BT, Chen J, Mikhaylova AV, Avery C, Peters U, Lundin J, Matise T, Buyske S, Tao R, Mathias RA, Reiner AP, Auer PL, Cox NJ, Kooperberg C, Thornton TA, Raffield LM, Li Y. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat Commun 2024; 15:1016. [PMID: 38310129 PMCID: PMC10838303 DOI: 10.1038/s41467-024-45135-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Bryce T Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Tara Matise
- Department of Genetics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
10
|
Liao K, Zöllner S. A Stacking Framework for Polygenic Risk Prediction in Admixed Individuals. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.31.24302103. [PMID: 38434717 PMCID: PMC10907988 DOI: 10.1101/2024.01.31.24302103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Polygenic risk scores (PRS) are summaries of an individual's personalized genetic risk for a trait or disease. However, PRS often perform poorly for phenotype prediction when the ancestry of the target population does not match the population in which GWAS effect sizes were estimated. For many populations this can be addressed by performing GWAS in the target population. However, admixed individuals (whose genomes can be traced to multiple ancestral populations) lie on an ancestry continuum and are not easily represented as a discrete population. Here, we propose slaPRS (stacking local ancestry PRS), which incorporates multiple ancestry GWAS to alleviate the ancestry dependence of PRS in admixed samples. slaPRS uses ensemble learning (stacking) to combine local population specific PRS in regions across the genome. We compare slaPRS to single population PRS and a method that combines single population PRS globally. In simulations, slaPRS outperformed existing approaches and reduced the ancestry dependence of PRS in African Americans. In lipid traits from African British individuals (UK Biobank), slaPRS again improved on single population PRS while performing comparably to the globally combined PRS. slaPRS provides a data-driven and flexible framework to incorporate multiple population-specific GWAS and local ancestry in samples of admixed ancestry.
Collapse
Affiliation(s)
- Kevin Liao
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Sebastian Zöllner
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
- University of Michigan, Department of Psychiatry, Ann Arbor, MI, 48109, USA
| |
Collapse
|
11
|
Aw AJ, McRae J, Rahmani E, Song YS. Highly parameterized polygenic scores tend to overfit to population stratification via random effects. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.27.577589. [PMID: 38352303 PMCID: PMC10862757 DOI: 10.1101/2024.01.27.577589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Polygenic scores (PGSs), increasingly used in clinical settings, frequently include many genetic variants, with performance typically peaking at thousands of variants. Such highly parameterized PGSs often include variants that do not pass a genome-wide significance threshold. We propose a mathematical perspective that renders the effects of many of these non-significant variants random rather than causal, with the randomness capturing population structure. We devise methods to assess variant effect randomness and population stratification bias. Applying these methods to 141 traits from the UK Biobank, we find that, for many PGSs, the effects of non-significant variants are considerably random, with the extent of randomness associated with the degree of overfitting to population structure of the discovery cohort. Our findings explain why highly parameterized PGSs simultaneously have superior cohort-specific performance and limited generalizability, suggesting the critical need for variant randomness tests in PGS evaluation. Supporting code and a dashboard are available at https://github.com/songlab-cal/StratPGS.
Collapse
Affiliation(s)
- Alan J. Aw
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Artificial Intelligence Laboratory, Illumina Inc
| | - Jeremy McRae
- Artificial Intelligence Laboratory, Illumina Inc
| | - Elior Rahmani
- Department of Computational Medicine, University of California, Los Angeles
| | - Yun S. Song
- Department of Statistics, University of California, Berkeley
- Center for Computational Biology, University of California, Berkeley
- Computer Science Division, University of California, Berkeley
| |
Collapse
|
12
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 45] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
13
|
Maldonado BL, Piqué DG, Kaplan RC, Claw KG, Gignoux CR. Genetic risk prediction in Hispanics/Latinos: milestones, challenges, and social-ethical considerations. J Community Genet 2023; 14:543-553. [PMID: 37962783 PMCID: PMC10725387 DOI: 10.1007/s12687-023-00686-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 10/18/2023] [Indexed: 11/15/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the identification of disease-associated variants, which can be leveraged to build polygenic scores (PGSs). Even though PGSs can be a valuable tool in personalized medicine, their predictive power is limited in populations of non-European ancestry, particularly in admixed populations. Recent efforts have focused on increasing racial and ethnic diversity in GWAS, thus, addressing some of the limitations of genetic risk prediction in these populations. Even with these efforts, few studies focus exclusively on Hispanics/Latinos. Additionally, Hispanic/Latino populations are often considered a single population despite varying admixture proportions between and within ethnic groups, diverse genetic heterogeneity, and demographic history. Combined with highly heterogeneous environmental and socioeconomic exposures, this diversity can reduce the transferability of genetic risk prediction models. Given the recent increase of genomic studies that include Hispanics/Latinos, we review the milestones and efforts that focus on genetic risk prediction, summarize the potential for improving PGS transferability, and highlight the challenges yet to be addressed. Additionally, we summarize social-ethical considerations and provide ideas to promote genetic risk prediction models that can be implemented equitably.
Collapse
Affiliation(s)
- Betzaida L Maldonado
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA.
| | - Daniel G Piqué
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Section of Genetics and Metabolism, Department of Pediatrics, Children's Hospital Colorado, Aurora, CO, USA
| | - Robert C Kaplan
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Katrina G Claw
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher R Gignoux
- Human Medical Genetics & Genomics Graduate Program, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Colorado Center for Personalized Medicine, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
- Department of Biomedical Informatics, University of Colorado-Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
14
|
Gouveia MH, Bentley AR, Leal TP, Tarazona-Santos E, Bustamante CD, Adeyemo AA, Rotimi CN, Shriner D. Unappreciated subcontinental admixture in Europeans and European Americans and implications for genetic epidemiology studies. Nat Commun 2023; 14:6802. [PMID: 37935687 PMCID: PMC10630423 DOI: 10.1038/s41467-023-42491-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 10/12/2023] [Indexed: 11/09/2023] Open
Abstract
European-ancestry populations are recognized as stratified but not as admixed, implying that residual confounding by locus-specific ancestry can affect studies of association, polygenic adaptation, and polygenic risk scores. We integrate individual-level genome-wide data from ~19,000 European-ancestry individuals across 79 European populations and five European American cohorts. We generate a new reference panel that captures ancestral diversity missed by both the 1000 Genomes and Human Genome Diversity Projects. Both Europeans and European Americans are admixed at the subcontinental level, with admixture dates differing among subgroups of European Americans. After adjustment for both genome-wide and locus-specific ancestry, associations between a highly differentiated variant in LCT (rs4988235) and height or LDL-cholesterol were confirmed to be false positives whereas the association between LCT and body mass index was genuine. We provide formal evidence of subcontinental admixture in individuals with European ancestry, which, if not properly accounted for, can produce spurious results in genetic epidemiology studies.
Collapse
Affiliation(s)
- Mateus H Gouveia
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Amy R Bentley
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Thiago P Leal
- Department of Genomic Medicine, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, 44197, USA
| | - Eduardo Tarazona-Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, 31270-910, Brazil
| | - Carlos D Bustamante
- Center for Computational, Evolutionary and Human Genomics (CEHG), Stanford University, Stanford, CA, 94305, USA
| | - Adebowale A Adeyemo
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Charles N Rotimi
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Daniel Shriner
- Center for Research on Genomics and Global Health, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
15
|
Tanigawa Y, Kellis M. Power of inclusion: Enhancing polygenic prediction with admixed individuals. Am J Hum Genet 2023; 110:1888-1902. [PMID: 37890495 PMCID: PMC10645553 DOI: 10.1016/j.ajhg.2023.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 09/22/2023] [Accepted: 09/22/2023] [Indexed: 10/29/2023] Open
Abstract
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA; Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
16
|
Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, Gaynor SM, Joseph T, Zou Y, Liu D, Wade R, Staples J, Panea R, Popov A, Bai X, Balasubramanian S, Habegger L, Lanche R, Lopez A, Maxwell E, Jones M, García-Ortiz H, Ramirez-Reyes R, Santacruz-Benítez R, Nag A, Smith KR, Damask A, Lin N, Paulding C, Reppell M, Zöllner S, Jorgenson E, Salerno W, Petrovski S, Overton J, Reid J, Thornton TA, Abecasis G, Berumen J, Orozco-Orozco L, Collins R, Baras A, Hill MR, Emberson JR, Marchini J, Kuri-Morales P, Tapia-Conyer R. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 2023; 622:784-793. [PMID: 37821707 PMCID: PMC10600010 DOI: 10.1038/s41586-023-06595-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/31/2023] [Indexed: 10/13/2023]
Abstract
The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.
Collapse
Affiliation(s)
| | - Jason Torres
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| | - Jesús Alegre-Díaz
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | | | - Michael Turner
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- Oxford Kidney Unit, Churchill Hospital, Oxford, UK
| | | | | | - Yuxin Zou
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Daren Liu
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Rachel Wade
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | | | - Alex Popov
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | | - Alex Lopez
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | - Raul Ramirez-Reyes
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Rogelio Santacruz-Benítez
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | - Abhishek Nag
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Katherine R Smith
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | - Amy Damask
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Nan Lin
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | | | | | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, Research and Development Biopharmaceuticals, AstraZeneca, Cambridge, UK
| | | | | | | | | | - Jaime Berumen
- Experimental Research Unit from the Faculty of Medicine (UIME), National Autonomous University of Mexico (UNAM), Mexico City, Mexico
| | | | - Rory Collins
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | - Michael R Hill
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Jonathan R Emberson
- Clinical Trial Service Unit and Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Health Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | | - Pablo Kuri-Morales
- Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico
| | - Roberto Tapia-Conyer
- Faculty of Medicine, National Autonomous University of Mexico, Mexico City, Mexico.
| |
Collapse
|
17
|
Zhao W, Zhang Z, Wang Z, Ma P, Pan Y, Wang Q, Zhang Z. Factors affecting the accuracy of genomic prediction in joint pig populations. Animal 2023; 17:100980. [PMID: 37797495 DOI: 10.1016/j.animal.2023.100980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 08/28/2023] [Accepted: 08/31/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic prediction (GP) has greatly advanced animal and plant breeding over the past two decades. GP in joint populations is a feasible method to improve the accuracy of genomic estimated breeding values in small populations. However, there is still a need to understand the factors that influence GP in joint populations. This study used simulated data and real data from Duroc pig populations to examine the impact of linkage disequilibrium (LD), causal variants effect sizes (CVESs), and minor allele frequencies (MAF) of SNPs on the accuracy of genomic prediction in joint populations. Three prediction methods were used: genomic best linear unbiased prediction (GBLUP), single-step GBLUP and multi-trait GBLUP. Results from the simulated datasets showed that the accuracies of GP in joint populations were always higher than those in a single population when only LD inconsistencies existed. However, single-step GBLUP accuracy in joint populations decreased as the correlation of MAF between populations decreased, while the accuracy of GBLUP is consistently higher in joint populations than in a single population. As the correlation of CVES between populations decreased, the accuracy of both GBLUP and single-step GBLUP in joint populations declined. Analysis of real Duroc populations showed low genetic correlation, similar to the simulated relationship between the most distant populations. In most cases in Duroc populations, GP have higher accuracies in joint populations than in individual population. In conclusion, the consistency of CVES plays a more important role in multi-population GP. The genetic relatedness of the Duroc populations is so weak that the prediction accuracy of GP in joint populations is reduced in some traits. Multi-trait GBLUP is a competitive method for the joint breeding evaluation.
Collapse
Affiliation(s)
- Wei Zhao
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Zhenyang Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Zhen Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China
| | - Peipei Ma
- Department of Animal Science, School of Agriculture and Biology, Shanghai Jiaotong University, 800# Dongchuan Road, Shang, East 200240, China
| | - Yuchun Pan
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Qishan Wang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China; Hainan Institute, Zhejiang University, Yongyou Industrial Park, Yazhou Bay Sci-Tech City, Sanya 572000, China
| | - Zhe Zhang
- Department of Animal Science, College of Animal Sciences, Zhejiang University, 866# Yuhangtang Road, Hangzhou, East 310058, China.
| |
Collapse
|
18
|
Gyawali PK, Le Guen Y, Liu X, Belloy ME, Tang H, Zou J, He Z. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun Biol 2023; 6:964. [PMID: 37736834 PMCID: PMC10517023 DOI: 10.1038/s42003-023-05352-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.
Collapse
Affiliation(s)
- Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Institut du Cerveau-Paris Brain Institute-ICM, Paris, France
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA.
| |
Collapse
|
19
|
Park DK, Chen M, Kim S, Joo YY, Loving RK, Kim HS, Cha J, Yoo S, Kim JH. Overestimated prediction using polygenic prediction derived from summary statistics. BMC Genom Data 2023; 24:52. [PMID: 37710206 PMCID: PMC10500750 DOI: 10.1186/s12863-023-01151-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 08/16/2023] [Indexed: 09/16/2023] Open
Abstract
BACKGROUND When polygenic risk score (PRS) is derived from summary statistics, independence between discovery and test sets cannot be monitored. We compared two types of PRS studies derived from raw genetic data (denoted as rPRS) and the summary statistics for IGAP (sPRS). RESULTS Two variables with the high heritability in UK Biobank, hypertension, and height, are used to derive an exemplary scale effect of PRS. sPRS without APOE is derived from International Genomics of Alzheimer's Project (IGAP), which records ΔAUC and ΔR2 of 0.051 ± 0.013 and 0.063 ± 0.015 for Alzheimer's Disease Sequencing Project (ADSP) and 0.060 and 0.086 for Accelerating Medicine Partnership - Alzheimer's Disease (AMP-AD). On UK Biobank, rPRS performances for hypertension assuming a similar size of discovery and test sets are 0.0036 ± 0.0027 (ΔAUC) and 0.0032 ± 0.0028 (ΔR2). For height, ΔR2 is 0.029 ± 0.0037. CONCLUSION Considering the high heritability of hypertension and height of UK Biobank and sample size of UK Biobank, sPRS results from AD databases are inflated. Independence between discovery and test sets is a well-known basic requirement for PRS studies. However, a lot of PRS studies cannot follow such requirements because of impossible direct comparisons when using summary statistics. Thus, for sPRS, potential duplications should be carefully considered within the same ethnic group.
Collapse
Affiliation(s)
- David Keetae Park
- Department of Biomedical Engineering, Columbia University, New York, USA
| | - Mingshen Chen
- Department of Applied Mathematics & Statistics, Stony Brook University, New York, USA
| | - Seungsoo Kim
- Department of Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY, USA
| | - Yoonjung Yoonie Joo
- Samsung Advanced Institute for Health Sciences & Technology (SAHIST), Sungkyunkwan University, Samsung Medical Center, Seoul, South Korea
| | - Rebekah K Loving
- Department of Biology, California Institute of Technology, Pasadena, USA
| | - Hyoung Seop Kim
- Department of Physical Medicine and Rehabilitation, Dementia Center, National Health Insurance Service Ilsan Hospital, Goyang, South Korea
| | - Jiook Cha
- Department of Psychology, Brain and Cognitive Sciences, AI Institute, Seoul National University, Seoul, South Korea
| | - Shinjae Yoo
- Computational Science Initiative, Brookhaven National Lab. Computer Science and Math, Building 725, Room 2-189, Upton, NY, 11973, USA.
| | - Jong Hun Kim
- Department of Neurology, Dementia Center, National Health Insurance Service Ilsan Hospital, 100 Ilsan-ro Ilsandong-gu, Goyang, Gyeonggi-Do, 10444, South Korea.
| |
Collapse
|
20
|
Tan T, Atkinson EG. Strategies for the Genomic Analysis of Admixed Populations. Annu Rev Biomed Data Sci 2023; 6:105-127. [PMID: 37127050 PMCID: PMC10871708 DOI: 10.1146/annurev-biodatasci-020722-014310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.
Collapse
Affiliation(s)
- Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| |
Collapse
|
21
|
Hou K, Xu Z, Ding Y, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.24.23293056. [PMID: 37546999 PMCID: PMC10402211 DOI: 10.1101/2023.07.24.23293056] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles
| |
Collapse
|
22
|
Raben TG, Lello L, Widen E, Hsu SDH. Biobank-scale methods and projections for sparse polygenic prediction from machine learning. Sci Rep 2023; 13:11662. [PMID: 37468507 PMCID: PMC10356957 DOI: 10.1038/s41598-023-37580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, Michigan, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
23
|
Smith JL, Schaid DJ, Kullo IJ. Implementing Reporting Standards for Polygenic Risk Scores for Atherosclerotic Cardiovascular Disease. Curr Atheroscler Rep 2023; 25:323-330. [PMID: 37223852 PMCID: PMC10495216 DOI: 10.1007/s11883-023-01104-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2023] [Indexed: 05/25/2023]
Abstract
PURPOSE OF REVIEW There is considerable interest in using polygenic risk scores (PRSs) for assessing risk of atherosclerotic cardiovascular disease (ASCVD). A barrier to the clinical use of PRSs is heterogeneity in how PRS studies are reported. In this review, we summarize approaches to establish a uniform reporting framework for PRSs for coronary heart disease (CHD), the most common form of ASCVD. RECENT FINDINGS Reporting standards for PRSs need to be contextualized for disease specific applications. In addition to metrics of predictive performance, reporting standards for PRSs for CHD should include how cases/control were ascertained, degree of adjustment for conventional CHD risk factors, portability to diverse genetic ancestry groups and admixed individuals, and quality control measures for clinical deployment. Such a framework will enable PRSs to be optimized and benchmarked for clinical use.
Collapse
Affiliation(s)
- Johanna L Smith
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
- Gonda Vascular Center, Rochester, MN, USA.
| |
Collapse
|
24
|
Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, Pasaniuc B. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 2023; 618:774-781. [PMID: 37198491 PMCID: PMC10284707 DOI: 10.1038/s41586-023-06079-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 65.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/12/2023] [Indexed: 05/19/2023]
Abstract
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Institute for Precision Health, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
25
|
Ahern J, Thompson W, Fan CC, Loughnan R. Comparing Pruning and Thresholding with Continuous Shrinkage Polygenic Score Methods in a Large Sample of Ancestrally Diverse Adolescents from the ABCD Study ®. Behav Genet 2023; 53:292-309. [PMID: 37017779 PMCID: PMC10655749 DOI: 10.1007/s10519-023-10139-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 02/28/2023] [Indexed: 04/06/2023]
Abstract
Using individuals' genetic data researchers can generate Polygenic Scores (PS) that are able to predict risk for diseases, variability in different behaviors as well as anthropomorphic measures. This is achieved by leveraging models learned from previously published large Genome-Wide Association Studies (GWASs) associating locations in the genome with a phenotype of interest. Previous GWASs have predominantly been performed in European ancestry individuals. This is of concern as PS generated in samples with a different ancestry to the original training GWAS have been shown to have lower performance and limited portability, and many efforts are now underway to collect genetic databases on individuals of diverse ancestries. In this study, we compare multiple methods of generating PS, including pruning and thresholding and Bayesian continuous shrinkage models, to determine which of them is best able to overcome these limitations. To do this we use the ABCD Study, a longitudinal cohort with deep phenotyping on individuals of diverse ancestry. We generate PS for anthropometric and psychiatric phenotypes using previously published GWAS summary statistics and examine their performance in three subsamples of ABCD: African ancestry individuals (n = 811), European ancestry Individuals (n = 6703), and admixed ancestry individuals (n = 3664). We find that the single ancestry continuous shrinkage method, PRScs (CS), and the multi ancestry meta method, PRScsx Meta (CSx Meta), show the best performance across ancestries and phenotypes.
Collapse
Affiliation(s)
- Jonathan Ahern
- Department of Cognitive Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
- Center for Human Development, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92161, USA.
| | - Wesley Thompson
- Herbert Wertheim School of Public Health and Human Longevity Science, University of California San Diego, 9500 Gilman Drive, La Jolla, San Diego, CA, 92161, USA
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, 74103, USA
| | - Chun Chieh Fan
- Center for Population Neuroscience and Genetics, Laureate Institute for Brain Research, Tulsa, OK, 74103, USA
- Department of Radiology, University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA, 92037, USA
| | - Robert Loughnan
- Department of Cognitive Science, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Center for Human Development, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92161, USA
| |
Collapse
|
26
|
Majara L, Kalungi A, Koen N, Tsuo K, Wang Y, Gupta R, Nkambule LL, Zar H, Stein DJ, Kinyanda E, Atkinson EG, Martin AR. Low and differential polygenic score generalizability among African populations due largely to genetic diversity. HGG ADVANCES 2023; 4:100184. [PMID: 36873096 PMCID: PMC9982687 DOI: 10.1016/j.xhgg.2023.100184] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 02/04/2023] [Indexed: 02/15/2023] Open
Abstract
African populations are vastly underrepresented in genetic studies but have the most genetic variation and face wide-ranging environmental exposures globally. Because systematic evaluations of genetic prediction had not yet been conducted in ancestries that span African diversity, we calculated polygenic risk scores (PRSs) in simulations across Africa and in empirical data from South Africa, Uganda, and the United Kingdom to better understand the generalizability of genetic studies. PRS accuracy improves with ancestry-matched discovery cohorts more than from ancestry-mismatched studies. Within ancestrally and ethnically diverse South African individuals, we find that PRS accuracy is low for all traits but varies across groups. Differences in African ancestries contribute more to variability in PRS accuracy than other large cohort differences considered between individuals in the United Kingdom versus Uganda. We computed PRS in African ancestry populations using existing European-only versus ancestrally diverse genetic studies; the increased diversity produced the largest accuracy gains for hemoglobin concentration and white blood cell count, reflecting large-effect ancestry-enriched variants in genes known to influence sickle cell anemia and the allergic response, respectively. Differences in PRS accuracy across African ancestries originating from diverse regions are as large as across out-of-Africa continental ancestries, requiring commensurate nuance.
Collapse
Affiliation(s)
- Lerato Majara
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- MRC Human Genetics Research Unit, Division of Human Genetics, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Observatory 7925, South Africa
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Allan Kalungi
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Department of Psychiatry, College of Health Sciences, Makerere University, Kampala, Uganda
- Department of Psychiatry, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
- Mental Health Project, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI) & London School of Hygiene and Tropical Medicine (LSHTM), Uganda Research Unit, Entebbe, Uganda
| | - Nastassja Koen
- Global Initiative for Neuropsychiatric Genetics Education in Research (GINGER), Harvard T.H. Chan School of Public Health, Department of Epidemiology, Boston, MA, USA
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, Cape Town, South Africa
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Rahul Gupta
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | - Lethukuthula L. Nkambule
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Heather Zar
- Department of Paediatrics and Child Health, Red Cross Children’s Hospital and Medical Research Council Unit on Child and Adolescent Health, University of Cape Town, Cape Town, South Africa
| | - Dan J. Stein
- Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC) Unit on Risk and Resilience in Mental Disorders, Cape Town, South Africa
| | - Eugene Kinyanda
- Mental Health Project, Medical Research Council/Uganda Virus Research Institute (MRC/UVRI) & London School of Hygiene and Tropical Medicine (LSHTM), Uganda Research Unit, Entebbe, Uganda
| | - Elizabeth G. Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
27
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
28
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
29
|
Hou K, Ding Y, Xu Z, Wu Y, Bhattacharya A, Mester R, Belbin GM, Buyske S, Conti DV, Darst BF, Fornage M, Gignoux C, Guo X, Haiman C, Kenny EE, Kim M, Kooperberg C, Lange L, Manichaikul A, North KE, Peters U, Rasmussen-Torvik LJ, Rich SS, Rotter JI, Wheeler HE, Wojcik GL, Zhou Y, Sankararaman S, Pasaniuc B. Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals. Nat Genet 2023; 55:549-558. [PMID: 36941441 PMCID: PMC11120833 DOI: 10.1038/s41588-023-01338-6] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 02/16/2023] [Indexed: 03/23/2023]
Abstract
Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Yue Wu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Rachel Mester
- Graduate Program in Biomathematics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Gillian M Belbin
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Charles Bronfman Institute of Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
| | - David V Conti
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Burcu F Darst
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Myriam Fornage
- Brown Foundation Institute for Molecular Medicine, The University of Texas Health Science Center, Houston, TX, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO, USA
| | - Xiuqing Guo
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Christopher Haiman
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michelle Kim
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Charles Kooperberg
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Leslie Lange
- Department of Medicine, University of Colorado, Aurora, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Kari E North
- Department of Statistics, Rutgers University, Piscataway, NJ, USA
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ulrike Peters
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Stephen S Rich
- Center for Public Health Genomics, Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Heather E Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA
| | - Ying Zhou
- Division of Public Health Science, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sriram Sankararaman
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
- Department of Computer Science, UCLA, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
| |
Collapse
|
30
|
Conery M, Grant SFA. Human height: a model common complex trait. Ann Hum Biol 2023; 50:258-266. [PMID: 37343163 PMCID: PMC10368389 DOI: 10.1080/03014460.2023.2215546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/10/2023] [Accepted: 05/09/2023] [Indexed: 06/23/2023]
Abstract
CONTEXT Like other complex phenotypes, human height reflects a combination of environmental and genetic factors, but is notable for being exceptionally easy to measure. Height has therefore been commonly used to make observations later generalised to other phenotypes though the appropriateness of such generalisations is not always considered. OBJECTIVES We aimed to assess height's suitability as a model for other complex phenotypes and review recent advances in height genetics with regard to their implications for complex phenotypes more broadly. METHODS We conducted a comprehensive literature search in PubMed and Google Scholar for articles relevant to the genetics of height and its comparatibility to other phenotypes. RESULTS Height is broadly similar to other phenotypes apart from its high heritability and ease of measurment. Recent genome-wide association studies (GWAS) have identified over 12,000 independent signals associated with height and saturated height's common single nucleotide polymorphism based heritability of height within a subset of the genome in individuals similar to European reference populations. CONCLUSIONS Given the similarity of height to other complex traits, the saturation of GWAS's ability to discover additional height-associated variants signals potential limitations to the omnigenic model of complex-phenotype inheritance, indicating the likely future power of polygenic scores and risk scores, and highlights the increasing need for large-scale variant-to-gene mapping efforts.
Collapse
Affiliation(s)
- Mitchell Conery
- Division of Human Genetics, Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine at the University of PA, Philadelphia, PA, USA
- Department of Pharmacology, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Struan F A Grant
- Division of Human Genetics, Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, Perelman School of Medicine at the University of PA, Philadelphia, PA, USA
- Division of Diabetes and Endocrinology, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Institute for Diabetes, Obesity, and Metabolism, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
31
|
Ndong Sima CAA, Smith D, Petersen DC, Schurz H, Uren C, Möller M. The immunogenetics of tuberculosis (TB) susceptibility. Immunogenetics 2022; 75:215-230. [DOI: 10.1007/s00251-022-01290-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022]
|
32
|
Pena SDJ, Tarazona-Santos E. Clinical genomics and precision medicine. Genet Mol Biol 2022; 45:e20220150. [PMID: 36218382 PMCID: PMC9555143 DOI: 10.1590/1678-4685-gmb-2022-0150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 07/12/2022] [Indexed: 11/04/2022] Open
Abstract
Precision Medicine emerges from the genomic paradigm of health and disease. For precise molecular diagnoses of genetic diseases, we must analyze the Whole Exome (WES) or the Whole Genome (WGS). By not needing exon capture, WGS is more powerful to detect single nucleotide variants and copy number variants. In healthy individuals, we can observe monogenic highly penetrant variants, which may be causally responsible for diseases, and also susceptibility variants, associated with common polygenic diseases. But there is the major problem of penetrance. Thus, there is the question of whether it is worthwhile to perform WGS in all healthy individuals as a step towards Precision Medicine. The genetic architecture of disease is consistent with the fact that they are all polygenic. Moreover, ancestry adds another layer of complexity. We are now capable of obtaining Polygenic Risk Scores for all complex diseases using only data from new generation sequencing. Yet, review of available evidence does not at present favor the idea that WGS analyses are sufficiently developed to allow reliable predictions of the risk components for monogenic and polygenic hereditary diseases in healthy individuals. Probably, it is still better for WGS to remain reserved for the diagnosis of pathogenic variants of Mendelian diseases.
Collapse
Affiliation(s)
- Sérgio D. J. Pena
- Universidade Federal de Minas Gerais, Instituto de Ciências Biológicas, Departamento de Bioquímica e Imunologia, Belo Horizonte, MG, Brazil. ,Núcleo de Genética Médica, Belo Horizonte, MG, Brazil
| | - Eduardo Tarazona-Santos
- Universidade Federal de Minas Gerais, Instituto de Ciências Biológicas, Departamento de Genética, Ecologia e Evolução, Belo Horizonte, MG, Brazil
| |
Collapse
|
33
|
O'Sullivan JW, Raghavan S, Marquez-Luna C, Luzum JA, Damrauer SM, Ashley EA, O'Donnell CJ, Willer CJ, Natarajan P. Polygenic Risk Scores for Cardiovascular Disease: A Scientific Statement From the American Heart Association. Circulation 2022; 146:e93-e118. [PMID: 35862132 PMCID: PMC9847481 DOI: 10.1161/cir.0000000000001077] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Cardiovascular disease is the leading contributor to years lost due to disability or premature death among adults. Current efforts focus on risk prediction and risk factor mitigation' which have been recognized for the past half-century. However, despite advances, risk prediction remains imprecise with persistently high rates of incident cardiovascular disease. Genetic characterization has been proposed as an approach to enable earlier and potentially tailored prevention. Rare mendelian pathogenic variants predisposing to cardiometabolic conditions have long been known to contribute to disease risk in some families. However, twin and familial aggregation studies imply that diverse cardiovascular conditions are heritable in the general population. Significant technological and methodological advances since the Human Genome Project are facilitating population-based comprehensive genetic profiling at decreasing costs. Genome-wide association studies from such endeavors continue to elucidate causal mechanisms for cardiovascular diseases. Systematic cataloging for cardiovascular risk alleles also enabled the development of polygenic risk scores. Genetic profiling is becoming widespread in large-scale research, including in health care-associated biobanks, randomized controlled trials, and direct-to-consumer profiling in tens of millions of people. Thus, individuals and their physicians are increasingly presented with polygenic risk scores for cardiovascular conditions in clinical encounters. In this scientific statement, we review the contemporary science, clinical considerations, and future challenges for polygenic risk scores for cardiovascular diseases. We selected 5 cardiometabolic diseases (coronary artery disease, hypercholesterolemia, type 2 diabetes, atrial fibrillation, and venous thromboembolic disease) and response to drug therapy and offer provisional guidance to health care professionals, researchers, policymakers, and patients.
Collapse
|
34
|
Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. Annu Rev Biomed Data Sci 2022; 5:293-320. [PMID: 35576555 PMCID: PMC9828290 DOI: 10.1146/annurev-biodatasci-111721-074830] [Citation(s) in RCA: 53] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| |
Collapse
|
35
|
Patel RA, Musharoff SA, Spence JP, Pimentel H, Tcheandjieu C, Mostafavi H, Sinnott-Armstrong N, Clarke SL, Smith CJ, Durda PP, Taylor KD, Tracy R, Liu Y, Johnson WC, Aguet F, Ardlie KG, Gabriel S, Smith J, Nickerson DA, Rich SS, Rotter JI, Tsao PS, Assimes TL, Pritchard JK. Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits. Am J Hum Genet 2022; 109:1286-1297. [PMID: 35716666 PMCID: PMC9300878 DOI: 10.1016/j.ajhg.2022.05.014] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 05/26/2022] [Indexed: 01/09/2023] Open
Abstract
Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.
Collapse
Affiliation(s)
- Roshni A Patel
- Genetics, Stanford University School of Medicine, Stanford, CA, USA.
| | - Shaila A Musharoff
- Genetics, Stanford University School of Medicine, Stanford, CA, USA; VA Palo Alto Health Care System, Palo Alto, CA, USA
| | - Jeffrey P Spence
- Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Harold Pimentel
- Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA
| | - Catherine Tcheandjieu
- VA Palo Alto Health Care System, Palo Alto, CA, USA; Stanford University School of Medicine, Stanford, CA, USA
| | | | - Nasa Sinnott-Armstrong
- Genetics, Stanford University School of Medicine, Stanford, CA, USA; VA Palo Alto Health Care System, Palo Alto, CA, USA
| | - Shoa L Clarke
- VA Palo Alto Health Care System, Palo Alto, CA, USA; Stanford University School of Medicine, Stanford, CA, USA
| | - Courtney J Smith
- Genetics, Stanford University School of Medicine, Stanford, CA, USA
| | - Peter P Durda
- The Robert Larner, M.D. College of Medicine at The University of Vermont, Burlington, VT, USA
| | - Kent D Taylor
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Russell Tracy
- The Robert Larner, M.D. College of Medicine at The University of Vermont, Burlington, VT, USA
| | - Yongmei Liu
- Duke University School of Medicine, Durham, NC, USA
| | | | | | | | | | - Josh Smith
- Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Philip S Tsao
- VA Palo Alto Health Care System, Palo Alto, CA, USA; Stanford University School of Medicine, Stanford, CA, USA
| | - Themistocles L Assimes
- VA Palo Alto Health Care System, Palo Alto, CA, USA; Stanford University School of Medicine, Stanford, CA, USA
| | - Jonathan K Pritchard
- Genetics, Stanford University School of Medicine, Stanford, CA, USA; Biology, Stanford University, Stanford, CA, USA.
| |
Collapse
|
36
|
Lai D, Schwantes-An TH, Abreu M, Chan G, Hesselbrock V, Kamarajan C, Liu Y, Meyers JL, Nurnberger JI, Plawecki MH, Wetherill L, Schuckit M, Zhang P, Edenberg HJ, Porjesz B, Agrawal A, Foroud T. Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans. Transl Psychiatry 2022; 12:266. [PMID: 35790736 PMCID: PMC9256707 DOI: 10.1038/s41398-022-02029-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 06/13/2022] [Accepted: 06/16/2022] [Indexed: 11/09/2022] Open
Abstract
Genome-wide association studies (GWAS) in admixed populations such as African Americans (AA) have limited sample sizes, resulting in poor performance of polygenic risk scores (PRS). Based on the observations that many disease-causing genes are shared between AA and European ancestry (EA) populations, and some disease-causing variants are located within the boundaries of these genes, we proposed a novel gene-based PRS framework (PRSgene) by using variants located within disease-associated genes. Using the AA GWAS of alcohol use disorder (AUD) from the Million Veteran Program and the EA GWAS of problematic alcohol use as the discovery GWAS, we identified 858 variants from 410 genes that were AUD-related in both AA and EA. PRSgene calculated using these variants were significantly associated with AUD in three AA target datasets (P-values ranged from 7.61E-05 to 6.27E-03; Betas ranged from 0.15 to 0.21) and outperformed PRS calculated using all variants (P-values ranged from 7.28E-03 to 0.16; Betas ranged from 0.06 to 0.18). PRSgene were also associated with AUD in an EA target dataset (P-value = 0.02, Beta = 0.11). In AA, individuals in the highest PRSgene decile had an odds ratio of 1.76 (95% CI: 1.32-2.34) to develop AUD compared to those in the lowest decile. The 410 genes were enriched in 54 Gene Ontology biological processes, including ethanol oxidation and processes involving the synaptic system, which are known to be AUD-related. In addition, 26 genes were targets of drugs used to treat AUD or other diseases that might be considered for repurposing to treat AUD. Our study demonstrated that the gene-based PRS had improved performance in evaluating AUD risk in AA and provided new insight into AUD genetics.
Collapse
Affiliation(s)
- Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA.
| | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Marco Abreu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Grace Chan
- Department of Psychiatry, University of Connecticut School of Medicine, Farmington, CT, USA
- Department of Psychiatry, University of Iowa, Carver College of Medicine, Iowa City, IA, USA
| | - Victor Hesselbrock
- Department of Psychiatry, University of Connecticut School of Medicine, Farmington, CT, USA
| | - Chella Kamarajan
- Henri Begleiter Neurodynamics Lab, Department of Psychiatry, State University of New York, Downstate Medical Center, Brooklyn, NY, USA
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Jacquelyn L Meyers
- Henri Begleiter Neurodynamics Lab, Department of Psychiatry, State University of New York, Downstate Medical Center, Brooklyn, NY, USA
| | - John I Nurnberger
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Martin H Plawecki
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Leah Wetherill
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Marc Schuckit
- Department of Psychiatry, University of California, San Diego Medical School, San Diego, CA, USA
| | - Pengyue Zhang
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Howard J Edenberg
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Bernice Porjesz
- Henri Begleiter Neurodynamics Lab, Department of Psychiatry, State University of New York, Downstate Medical Center, Brooklyn, NY, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, USA
| | - Tatiana Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
37
|
Yair S, Coop G. Population differentiation of polygenic score predictions under stabilizing selection. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200416. [PMID: 35430887 PMCID: PMC9014188 DOI: 10.1098/rstb.2020.0416] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 03/08/2022] [Indexed: 12/15/2022] Open
Abstract
Given the many small-effect loci uncovered by genome-wide association studies (GWAS), polygenic scores have become central to genomic medicine, and have found application in diverse settings including evolutionary studies of adaptation. Despite their promise, polygenic scores have been found to suffer from limited portability across human populations. This at first seems in conflict with the observation that most common genetic variation is shared among populations. We investigate one potential cause of this discrepancy: stabilizing selection on complex traits. Counterintuitively, while stabilizing selection constrains phenotypic evolution, it accelerates the loss and fixation of alleles underlying trait variation within populations (GWAS loci). Thus even when populations share an optimum phenotype, stabilizing selection erodes the variance contributed by their shared GWAS loci, such that predictions from GWAS in one population explain less of the phenotypic variation in another. We develop theory to quantify how stabilizing selection is expected to reduce the prediction accuracy of polygenic scores in populations not represented in GWAS samples. In addition, we find that polygenic scores can substantially overstate average genetic differences of phenotypes among populations. We emphasize stabilizing selection around a common optimum as a useful null model to connect patterns of allele frequency and polygenic score differentiation. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- Sivan Yair
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Graham Coop
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| |
Collapse
|
38
|
Gopalan S, Smith SP, Korunes K, Hamid I, Ramachandran S, Goldberg A. Human genetic admixture through the lens of population genomics. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200410. [PMID: 35430881 PMCID: PMC9014191 DOI: 10.1098/rstb.2020.0410] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 03/24/2022] [Indexed: 12/13/2022] Open
Abstract
Over the past 50 years, geneticists have made great strides in understanding how our species' evolutionary history gave rise to current patterns of human genetic diversity classically summarized by Lewontin in his 1972 paper, 'The Apportionment of Human Diversity'. One evolutionary process that requires special attention in both population genetics and statistical genetics is admixture: gene flow between two or more previously separated source populations to form a new admixed population. The admixture process introduces ancestry-based structure into patterns of genetic variation within and between populations, which in turn influences the inference of demographic histories, identification of genetic targets of selection and prediction of complex traits. In this review, we outline some challenges for admixture population genetics, including limitations of applying methods designed for populations without recent admixture to the study of admixed populations. We highlight recent studies and methodological advances that aim to overcome such challenges, leveraging genomic signatures of admixture that occurred in the past tens of generations to gain insights into human history, natural selection and complex trait architecture. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- Shyamalika Gopalan
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
| | - Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Katharine Korunes
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
| | - Iman Hamid
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
- Department of Ecology, Evolution and Organismal Biology, Brown University, Providence, RI 02912, USA
- Data Science Initiative, Brown University, Providence, RI 02912, USA
| | - Amy Goldberg
- Department of Evolutionary Anthropology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
39
|
Matthews LJ, Turkheimer E. Three legs of the missing heritability problem. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2022; 93:183-191. [PMID: 35533541 PMCID: PMC9172633 DOI: 10.1016/j.shpsa.2022.04.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 03/07/2022] [Accepted: 04/20/2022] [Indexed: 05/31/2023]
Abstract
The so-called 'missing heritability problem' is often characterized by behavior geneticists as a numerical discrepancy between alternative kinds of heritability. For example, while 'traditional heritability' derived from twin and family studies indicates that approximately ∼50% of variation in intelligence is attributable to genetics, 'SNP heritability' derived from genome-wide association studies indicates that only ∼10% of variation in intelligence is attributable to genetics. This 40% gap in variance accounted for by alternative kinds of heritability is frequently referred to as what's "missing." Philosophers have picked up on this reading, suggesting that "dissolving" the missing heritability problem is merely a matter of closing the numerical gap between traditional and molecular kinds of heritability. We argue that this framing of the problem undervalues the severity of the many challenges to scientific understanding of the "heritability" of human behavior. On our view, resolving the numerical discrepancies between alternative kinds of heritability will do little to advance scientific explanation and understanding of behavior genetics. Thus, we propose a new conceptual framework of the missing heritability problem that comprises three independent methodological and explanatory challenges: the numerical gap, the prediction gap, and the mechanism gap.
Collapse
|
40
|
Smith SP, Shahamatdar S, Cheng W, Zhang S, Paik J, Graff M, Haiman C, Matise TC, North KE, Peters U, Kenny E, Gignoux C, Wojcik G, Crawford L, Ramachandran S. Enrichment analyses identify shared associations for 25 quantitative traits in over 600,000 individuals from seven diverse ancestries. Am J Hum Genet 2022; 109:871-884. [PMID: 35349783 PMCID: PMC9118115 DOI: 10.1016/j.ajhg.2022.03.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 03/02/2022] [Indexed: 12/12/2022] Open
Abstract
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Collapse
Affiliation(s)
- Samuel Pattillo Smith
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Sahar Shahamatdar
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Wei Cheng
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA
| | - Selena Zhang
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Joseph Paik
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA
| | - Misa Graff
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christopher Haiman
- Department of Preventative Medicine, University of Southern California, Los Angeles, CA 90089, USA
| | - T C Matise
- Department of Genetics, Rutgers University, Piscataway, NJ 08854, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Eimear Kenny
- The Center for Genomic Health, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Medicine, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Chris Gignoux
- Division of Biomedical Informatics and Personalized Medicine, University of Colorado, Denver, CO 80204, USA
| | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins University, Baltimore, MD 21287, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Biostatistics, Brown University, Providence, RI 02906, USA; Microsoft Research New England, Cambridge, MA 02142, USA
| | - Sohini Ramachandran
- Center for Computational Molecular Biology, Brown University, Providence, RI 02912, USA; Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02912, USA; Data Science Initiative, Brown University, Providence, RI 02912, USA.
| |
Collapse
|
41
|
Carlson MO, Rice DP, Berg JJ, Steinrücken M. Polygenic score accuracy in ancient samples: Quantifying the effects of allelic turnover. PLoS Genet 2022; 18:e1010170. [PMID: 35522704 PMCID: PMC9116686 DOI: 10.1371/journal.pgen.1010170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 05/18/2022] [Accepted: 03/26/2022] [Indexed: 11/19/2022] Open
Abstract
Polygenic scores link the genotypes of ancient individuals to their phenotypes, which are often unobservable, offering a tantalizing opportunity to reconstruct complex trait evolution. In practice, however, interpretation of ancient polygenic scores is subject to numerous assumptions. For one, the genome-wide association (GWA) studies from which polygenic scores are derived, can only estimate effect sizes for loci segregating in contemporary populations. Therefore, a GWA study may not correctly identify all loci relevant to trait variation in the ancient population. In addition, the frequencies of trait-associated loci may have changed in the intervening years. Here, we devise a theoretical framework to quantify the effect of this allelic turnover on the statistical properties of polygenic scores as functions of population genetic dynamics, trait architecture, power to detect significant loci, and the age of the ancient sample. We model the allele frequencies of loci underlying trait variation using the Wright-Fisher diffusion, and employ the spectral representation of its transition density to find analytical expressions for several error metrics, including the expected sample correlation between the polygenic scores of ancient individuals and their true phenotypes, referred to as polygenic score accuracy. Our theory also applies to a two-population scenario and demonstrates that allelic turnover alone may explain a substantial percentage of the reduced accuracy observed in cross-population predictions, akin to those performed in human genetics. Finally, we use simulations to explore the effects of recent directional selection, a bias-inducing process, on the statistics of interest. We find that even in the presence of bias, weak selection induces minimal deviations from our neutral expectations for the decay of polygenic score accuracy. By quantifying the limitations of polygenic scores in an explicit evolutionary context, our work lays the foundation for the development of more sophisticated statistical procedures to analyze both temporally and geographically resolved polygenic scores.
Collapse
Affiliation(s)
- Maryn O. Carlson
- Committee on Genetics, Genomics, & Systems Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Daniel P. Rice
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Jeremy J. Berg
- Committee on Genetics, Genomics, & Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthias Steinrücken
- Committee on Genetics, Genomics, & Systems Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Ecology & Evolution, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
42
|
Schultz LM, Merikangas AK, Ruparel K, Jacquemont S, Glahn DC, Gur RE, Barzilay R, Almasy L. Stability of polygenic scores across discovery genome-wide association studies. HGG ADVANCES 2022; 3:100091. [PMID: 35199043 PMCID: PMC8841810 DOI: 10.1016/j.xhgg.2022.100091] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 01/18/2022] [Indexed: 01/19/2023] Open
Abstract
Polygenic scores (PGS) are commonly evaluated in terms of their predictive accuracy at the population level by the proportion of phenotypic variance they explain. To be useful for precision medicine applications, they also need to be evaluated at the individual level when phenotypes are not necessarily already known. We investigated the stability of PGS in European American (EUR) and African American (AFR)-ancestry individuals from the Philadelphia Neurodevelopmental Cohort and the Adolescent Brain Cognitive Development study using different discovery genome-wide association study (GWAS) results for post-traumatic stress disorder (PTSD), type 2 diabetes (T2D), and height. We found that pairs of EUR-ancestry GWAS for the same trait had genetic correlations >0.92. However, PGS calculated from pairs of same-ancestry and different-ancestry GWAS had correlations that ranged from <0.01 to 0.74. PGS stability was greater for height than for PTSD or T2D. A series of height GWAS in the UK Biobank suggested that correlation between PGS is strongly dependent on the extent of sample overlap between the discovery GWAS. Focusing on the upper end of the PGS distribution, different discovery GWAS do not consistently identify the same individuals in the upper quantiles, with the best case being 60% of individuals above the 80th percentile of PGS overlapping from one height GWAS to another. The degree of overlap decreases sharply as higher quantiles, less heritable traits, and different-ancestry GWAS are considered. PGS computed from different discovery GWAS have only modest correlation at the individual level, underscoring the need to proceed cautiously with integrating PGS into precision medicine applications.
Collapse
Affiliation(s)
- Laura M. Schultz
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
| | - Alison K. Merikangas
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Kosha Ruparel
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sébastien Jacquemont
- UHC Sainte-Justine Research Center, Université de Montréal, Montréal, QC H3T 1C5, Canada
- Department of Pediatrics, Université de Montréal, Montréal, QC H3T 1C5, Canada
| | - David C. Glahn
- Tommy Fuss Center for Neuropsychiatric Disease Research, Boston Children's Hospital, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
| | - Raquel E. Gur
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Child Adolescent Psychiatry and Behavioral Sciences, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Ran Barzilay
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Child Adolescent Psychiatry and Behavioral Sciences, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Laura Almasy
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
43
|
Mars N, Kerminen S, Feng YCA, Kanai M, Läll K, Thomas LF, Skogholt AH, della Briotta Parolo P, Neale BM, Smoller JW, Gabrielsen ME, Hveem K, Mägi R, Matsuda K, Okada Y, Pirinen M, Palotie A, Ganna A, Martin AR, Ripatti S. Genome-wide risk prediction of common diseases across ancestries in one million people. CELL GENOMICS 2022; 2:None. [PMID: 35591975 PMCID: PMC9010308 DOI: 10.1016/j.xgen.2022.100118] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 08/24/2021] [Accepted: 03/18/2022] [Indexed: 12/14/2022]
Abstract
Polygenic risk scores (PRS) measure genetic disease susceptibility by combining risk effects across the genome. For coronary artery disease (CAD), type 2 diabetes (T2D), and breast and prostate cancer, we performed cross-ancestry evaluation of genome-wide PRSs in six biobanks in Europe, the United States, and Asia. We studied transferability of these highly polygenic, genome-wide PRSs across global ancestries, within European populations with different health-care systems, and local population substructures in a population isolate. All four PRSs had similar accuracy across European and Asian populations, with poorer transferability in the smaller group of individuals of African ancestry. The PRSs had highly similar effect sizes in different populations of European ancestry, and in early- and late-settlement regions with different recent population bottlenecks in Finland. Comparing genome-wide PRSs to PRSs containing a smaller number of variants, the highly polygenic, genome-wide PRSs generally displayed higher effect sizes and better transferability across global ancestries. Our findings indicate that in the populations investigated, the current genome-wide polygenic scores for common diseases have potential for clinical utility within different health-care settings for individuals of European ancestry, but that the utility in individuals of African ancestry is currently much lower.
Collapse
Affiliation(s)
- Nina Mars
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland
| | - Sini Kerminen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland
| | - Yen-Chen A. Feng
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA,Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA,Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kristi Läll
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Laurent F. Thomas
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway,K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway,BioCore - Bioinformatics Core Facility, Norwegian University of Science and Technology, Trondheim, Norway
| | - Anne Heidi Skogholt
- K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pietro della Briotta Parolo
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland
| | | | | | - Benjamin M. Neale
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA,Harvard Medical School, Boston, MA, USA
| | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA,Harvard Medical School, Boston, MA, USA
| | - Maiken E. Gabrielsen
- K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway,HUNT Research Center, Department of Public Health and Nursing, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Kristian Hveem
- K. G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, Faculty of Medicine and Health, Norwegian University of Science and Technology, Trondheim, Norway
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Koichi Matsuda
- Department of Computational Biology and Medical Sciences, Graduate school of Frontier Sciences, the University of Tokyo, Tokyo, Japan
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan,Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan,Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita, Japan,Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Kanagawa, Japan
| | - Matti Pirinen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland,Department of Public Health, University of Helsinki, Helsinki, Finland,Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Aarno Palotie
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland,Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland,Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Biomedicum 2U, Tukholmankatu 8, 00290 Helsinki, Finland,Department of Public Health, University of Helsinki, Helsinki, Finland,Broad Institute of MIT and Harvard, Cambridge, MA, USA,Corresponding author
| |
Collapse
|
44
|
Marciniak S, Bergey CM, Silva AM, Hałuszko A, Furmanek M, Veselka B, Velemínský P, Vercellotti G, Wahl J, Zariņa G, Longhi C, Kolář J, Garrido-Pena R, Flores-Fernández R, Herrero-Corral AM, Simalcsik A, Müller W, Sheridan A, Miliauskienė Ž, Jankauskas R, Moiseyev V, Köhler K, Király Á, Gamarra B, Cheronet O, Szeverényi V, Kiss V, Szeniczey T, Kiss K, Zoffmann ZK, Koós J, Hellebrandt M, Maier RM, Domboróczki L, Virag C, Novak M, Reich D, Hajdu T, von Cramon-Taubadel N, Pinhasi R, Perry GH. An integrative skeletal and paleogenomic analysis of stature variation suggests relatively reduced health for early European farmers. Proc Natl Acad Sci U S A 2022; 119:e2106743119. [PMID: 35389750 PMCID: PMC9169634 DOI: 10.1073/pnas.2106743119] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 02/24/2022] [Indexed: 12/02/2022] Open
Abstract
Human culture, biology, and health were shaped dramatically by the onset of agriculture ∼12,000 y B.P. This shift is hypothesized to have resulted in increased individual fitness and population growth as evidenced by archaeological and population genomic data alongside a decline in physiological health as inferred from skeletal remains. Here, we consider osteological and ancient DNA data from the same prehistoric individuals to study human stature variation as a proxy for health across a transition to agriculture. Specifically, we compared “predicted” genetic contributions to height from paleogenomic data and “achieved” adult osteological height estimated from long bone measurements for 167 individuals across Europe spanning the Upper Paleolithic to Iron Age (∼38,000 to 2,400 B.P.). We found that individuals from the Neolithic were shorter than expected (given their individual polygenic height scores) by an average of −3.82 cm relative to individuals from the Upper Paleolithic and Mesolithic (P = 0.040) and −2.21 cm shorter relative to post-Neolithic individuals (P = 0.068), with osteological vs. expected stature steadily increasing across the Copper (+1.95 cm relative to the Neolithic), Bronze (+2.70 cm), and Iron (+3.27 cm) Ages. These results were attenuated when we additionally accounted for genome-wide genetic ancestry variation: for example, with Neolithic individuals −2.82 cm shorter than expected on average relative to pre-Neolithic individuals (P = 0.120). We also incorporated observations of paleopathological indicators of nonspecific stress that can persist from childhood to adulthood in skeletal remains into our model. Overall, our work highlights the potential of integrating disparate datasets to explore proxies of health in prehistory.
Collapse
Affiliation(s)
- Stephanie Marciniak
- Department of Anthropology, Pennsylvania State University, University Park, PA 16802
| | - Christina M. Bergey
- Department of Anthropology, Pennsylvania State University, University Park, PA 16802
- Department of Genetics, Human Genetics Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854
| | - Ana Maria Silva
- Research Centre for Anthropology and Health (Centro de Investigação em Antropologia e Saúde - CIAS), Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Coimbra 3000-456, Portugal
- Archeology Center of the University of Lisbon (UNIARQ), University of Lisbon, Lisbon 1600-214, Portugal
| | - Agata Hałuszko
- Institute of Archaeology, University of Wrocław, Wrocław 50-139, Poland
- Archeolodzy.org Foundation, Wrocław 50-316, Poland
| | - Mirosław Furmanek
- Institute of Archaeology, University of Wrocław, Wrocław 50-139, Poland
| | - Barbara Veselka
- Department of Chemistry, Analytical Environmental and Geo-Chemistry Research Unit, Vrije Univeristeit Brussels, Brussels 1050, Belgium
- Department of Art Studies and Archaeology, Maritime Cultures Research Institute, Vrije Univeristeit Brussels, Brussels 1050, Belgium
| | - Petr Velemínský
- Department of Anthropology, National Museum, Prague 115-79, Czech Republic
| | - Giuseppe Vercellotti
- Department of Anthropology, Ohio State University, Columbus, OH 43210
- Institute for Research and Learning in Archaeology and Bioarchaeology, Columbus, OH 43215
| | - Joachim Wahl
- Institute for Scientific Archaeology, Working Group Palaeoanthropology, University of Tübingen, Tübingen 72074, Germany
| | - Gunita Zariņa
- Institute of Latvian History, University of Latvia, Riga 1050, Latvia
| | - Cristina Longhi
- Soprintendenza Archeologia, Belle Arti e Paesaggio, Rome 00186, Italy
| | - Jan Kolář
- Department of Vegetation Ecology, Institute of Botany of the Czech Academy of Sciences, Průhonice 252-43, Czech Republic
- Institute of Archaeology and Museology, Masaryk University, Brno 602-00, Czech Republic
| | - Rafael Garrido-Pena
- Department of Prehistory and Archaeology, Universidad Autónoma de Madrid, Madrid 28049, Spain
| | | | | | - Angela Simalcsik
- Olga Necrasov Center for Anthropological Research, Romanian Academy - Iasi Branch, Iasi 700481, Romania
- Orheiul Vechi Cultural-Natural Reserve, Orhei 3506, Republic of Moldova
| | - Werner Müller
- Laboratoire d'archéozoologie, Université de Neuchâtel, Neuchâtel 2000, Switzerland
| | - Alison Sheridan
- Department of Scottish History & Archaeology, National Museums Scotland, Edinburgh EH1 1JF, Scotland
| | - Žydrūnė Miliauskienė
- Department of Anatomy, Histology and Anthropology, Vilnius University, Vilnius 01513, Lithuania
| | - Rimantas Jankauskas
- Department of Anatomy, Histology and Anthropology, Vilnius University, Vilnius 01513, Lithuania
| | - Vyacheslav Moiseyev
- Department of Physical Anthropology, Peter the Great Museum of Anthropology and Ethnography (Kunstkamera), Russian Academy of Sciences, St. Petersburg 199034, Russia
| | - Kitti Köhler
- Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest 1097, Hungary
| | - Ágnes Király
- Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest 1097, Hungary
| | - Beatriz Gamarra
- Institut Català de Paleoecologia Humana i Evolució Social, Tarragona 43007, Spain
- Departament d’Història i Història de l’Art, Universitat Rovira i Virgili, Tarragona 43003, Spain
| | - Olivia Cheronet
- Department of Evolutionary Anthropology, University of Vienna, Vienna 1030, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna 1030, Austria
| | - Vajk Szeverényi
- Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest 1097, Hungary
- Department of Archaeology, Déri Múzeum, Debrecen 4026, Hungary
| | - Viktória Kiss
- Institute of Archaeology, Research Centre for the Humanities, Eötvös Loránd Research Network, Budapest 1097, Hungary
| | - Tamás Szeniczey
- Department of Biological Anthropology, Eötvös Loránd University, Budapest 1053, Hungary
| | - Krisztián Kiss
- Department of Biological Anthropology, Eötvös Loránd University, Budapest 1053, Hungary
- Department of Anthropology, Hungarian Natural History Museum, Budapest 1083, Hungary
| | | | - Judit Koós
- Department of Archaeology, Herman Ottó Museum, Miskolc 3530, Hungary
| | | | - Robert M. Maier
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138
| | - László Domboróczki
- Department of Archaeology, István Dobó Castle Museum, Eger 3300, Hungary
| | - Cristian Virag
- Department of Archaeology, Satu Mare County Museum, Satu Mare 440031, Romania
| | - Mario Novak
- Centre for Applied Bioanthropology, Institute for Anthropological Research, Zagreb 10000, Croatia
| | - David Reich
- Department of Genetics, Harvard Medical School, Boston, MA 02115
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138
- The Max Planck–Harvard Research Center for the Archaeoscience of the Ancient Mediterranean, Boston, MA 02115
- Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142
- HHMI, Harvard Medical School, Cambridge, MA 02138
| | - Tamás Hajdu
- Department of Biological Anthropology, Eötvös Loránd University, Budapest 1053, Hungary
| | - Noreen von Cramon-Taubadel
- Buffalo Human Evolutionary Morphology Lab, Department of Anthropology, University at Buffalo, Buffalo, NY 14261-0026
| | - Ron Pinhasi
- Department of Evolutionary Anthropology, University of Vienna, Vienna 1030, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna 1030, Austria
| | - George H. Perry
- Department of Anthropology, Pennsylvania State University, University Park, PA 16802
- Department of Biology, Pennsylvania State University, University Park, PA 16802
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802
- Deutsche Forschungsgemeinschaft (DFG) Center for Advanced Studies, University of Tübingen, Tübingen 72074, Germany
| |
Collapse
|
45
|
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Martin AR, Finucane HK, Price AL. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 2022; 54:450-458. [PMID: 35393596 PMCID: PMC9009299 DOI: 10.1038/s41588-022-01036-9] [Citation(s) in RCA: 108] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/25/2022] [Indexed: 01/25/2023]
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Collapse
Affiliation(s)
- Omer Weissbrod
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Huwenbo Shi
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- OMNI Bioinformatics, San Francisco, CA, USA
| | - Steven Gazal
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wouter J Peyrot
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
| | - Amit V Khera
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alkes L Price
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
46
|
Coon H, Shabalin A, Bakian AV, DiBlasi E, Monson ET, Kirby A, Chen D, Fraser A, Yu Z, Staley M, Callor WB, Christensen ED, Crowell SE, Gray D, Crockett DK, Li QS, Keeshin B, Docherty AR. Extended familial risk of suicide death is associated with younger age at death and elevated polygenic risk of suicide. Am J Med Genet B Neuropsychiatr Genet 2022; 189:60-73. [PMID: 35212135 PMCID: PMC9149029 DOI: 10.1002/ajmg.b.32890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 11/19/2021] [Accepted: 01/31/2022] [Indexed: 12/12/2022]
Abstract
Suicide accounts for >800,000 deaths annually worldwide; prevention is an urgent public health issue. Identification of risk factors remains challenging due to complexity and heterogeneity. The study of suicide deaths with increased extended familial risk provides an avenue to reduce etiological heterogeneity and explore traits associated with increased genetic liability. Using extensive genealogical records, we identified high-risk families where distant relatedness of suicides implicates genetic risk. We compared phenotypic and polygenic risk score (PRS) data between suicides in high-risk extended families (high familial risk (HFR), n = 1,634), suicides linked to genealogical data not in any high-risk families (low familial risk (LFR), n = 147), and suicides not linked to genealogical data with unknown familial risk (UFR, n = 1,865). HFR suicides were associated with lower age at death (mean = 39.34 years), more suicide attempts, and more PTSD and trauma diagnoses. For PRS tests, we included only suicides with >90% European ancestry and adjusted for residual ancestry effects. HFR suicides showed markedly higher PRS of suicide death (calculated using cross-validation), supporting specific elevation of genetic risk of suicide in this subgroup, and also showed increased PRS of PTSD, suicide attempt, and risk taking. LFR suicides were substantially older at death (mean = 49.10 years), had fewer psychiatric diagnoses of depression and pain, and significantly lower PRS of depression. Results suggest extended familiality and trauma/PTSD may provide specificity in identifying individuals at genetic risk for suicide death, especially among younger ages, and that LFR of suicide warrants further study regarding the contribution of demographic and medical risks.
Collapse
Affiliation(s)
- Hilary Coon
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Andrey Shabalin
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Amanda V. Bakian
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Emily DiBlasi
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Eric T. Monson
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Anne Kirby
- Department of Occupational TherapyUniversity of UtahSalt Lake CityUtahUSA
| | - Danli Chen
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Alison Fraser
- Pedigree & Population Resource, Huntsman Cancer InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Zhe Yu
- Pedigree & Population Resource, Huntsman Cancer InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Michael Staley
- Utah State Office of the Medical ExaminerUtah Department of HealthSalt Lake CityUtahUSA
| | | | - Erik D. Christensen
- Utah State Office of the Medical ExaminerUtah Department of HealthSalt Lake CityUtahUSA
| | | | - Douglas Gray
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | | | - Qingqin S. Li
- Neuroscience Therapeutic AreaJanssen Research & Development LLCTitusvilleUtahUSA
| | - Brooks Keeshin
- Department of PediatricsUniversity of UtahSalt Lake CityUtahUSA
- Primary Children's Hospital Center for Safe and Healthy FamiliesSalt Lake CityUtahUSA
| | - Anna R. Docherty
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| |
Collapse
|
47
|
Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo, Brazil. Nat Commun 2022; 13:1004. [PMID: 35246524 PMCID: PMC8897431 DOI: 10.1038/s41467-022-28648-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 01/21/2022] [Indexed: 02/07/2023] Open
Abstract
As whole-genome sequencing (WGS) becomes the gold standard tool for studying population genomics and medical applications, data on diverse non-European and admixed individuals are still scarce. Here, we present a high-coverage WGS dataset of 1,171 highly admixed elderly Brazilians from a census-based cohort, providing over 76 million variants, of which ~2 million are absent from large public databases. WGS enables identification of ~2,000 previously undescribed mobile element insertions without previous description, nearly 5 Mb of genomic segments absent from the human genome reference, and over 140 alleles from HLA genes absent from public resources. We reclassify and curate pathogenicity assertions for nearly four hundred variants in genes associated with dominantly-inherited Mendelian disorders and calculate the incidence for selected recessive disorders, demonstrating the clinical usefulness of the present study. Finally, we observe that whole-genome and HLA imputation could be significantly improved compared to available datasets since rare variation represents the largest proportion of input from WGS. These results demonstrate that even smaller sample sizes of underrepresented populations bring relevant data for genomic studies, especially when exploring analyses allowed only by WGS. Whole genome sequencing (WGS) data on non-European and admixed individuals remains scarce. Here, the authors analyse WGS data from 1,171 admixed elderly Brazilians from a census cohort, characterising population-specific genetic variation and exploring the clinical utility of this expanded dataset.
Collapse
|
48
|
Matthews LJ. Half a century later and we're back where we started: How the problem of locality turned in to the problem of portability. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2022; 91:1-9. [PMID: 34781197 PMCID: PMC8837680 DOI: 10.1016/j.shpsa.2021.10.021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Revised: 10/23/2021] [Accepted: 10/30/2021] [Indexed: 05/10/2023]
Abstract
In the 1970s, Lewontin sparked a debate about a problem of locality, by making the case that any given heritability estimate is local to the original population and environment studied, and could not be generalized to other populations and environments. Nearly 50 years later, a new problem of portability has emerged: the predictive accuracy of polygenic scores diminishes when applied to populations whose characteristics are different from the original population sample. This paper briefly reviews the nature of each problem and analyzes their similarities and differences in three areas: 1) conceptual underpinnings, 2) causal explanations, and 3) practical, social, and political implications. Although conceptually and methodologically different from the problem of locality in important respects, the problem of portability facing contemporary genomics today should come as no surprise, as it is an inevitable outcome of the kinds of problematic inferences detailed by Lewontin nearly half a century ago.
Collapse
|
49
|
Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O'Reilly PF, Vilhjálmsson BJ. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am J Hum Genet 2022; 109:12-23. [PMID: 34995502 PMCID: PMC8764121 DOI: 10.1016/j.ajhg.2021.11.008] [Citation(s) in RCA: 118] [Impact Index Per Article: 59.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/04/2021] [Indexed: 12/25/2022] Open
Abstract
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
Collapse
Affiliation(s)
- Florian Privé
- National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark.
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Paris 75015, France; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | | | - Clive Hoggart
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paul F O'Reilly
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark
| |
Collapse
|
50
|
Pärna K, Nolte IM, Snieder H, Fischer K, Marnetto D, Pagani L. A Principal Component Informed Approach to Address Polygenic Risk Score Transferability Across European Cohorts. Front Genet 2022; 13:899523. [PMID: 35923706 PMCID: PMC9340200 DOI: 10.3389/fgene.2022.899523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 05/26/2022] [Indexed: 11/16/2022] Open
Abstract
One important confounder in genome-wide association studies (GWASs) is population genetic structure, which may generate spurious associations if not properly accounted for. This may ultimately result in a biased polygenic risk score (PRS) prediction, especially when applied to another population. To explore this matter, we focused on principal component analysis (PCA) and asked whether a population genetics informed strategy focused on PCs derived from an external reference population helps in mitigating this PRS transferability issue. Throughout the study, we used two complex model traits, height and body mass index, and samples from UK and Estonian Biobanks. We aimed to investigate 1) whether using a reference population (1000G) for computation of the PCs adjusted for in the discovery cohort improves the resulting PRS performance in a target set from another population and 2) whether adjusting the validation model for PCs is required at all. Our results showed that any other set of PCs performed worse than the one computed on samples from the same population as the discovery dataset. Furthermore, we show that PC correction in GWAS cannot prevent residual population structure information in the PRS, also for non-structured traits. Therefore, we confirm the utility of PC correction in the validation model when the investigated trait shows an actual correlation with population genetic structure, to account for the residual confounding effect when evaluating the predictive value of PRS.
Collapse
Affiliation(s)
- Katri Pärna
- Institute of Genomics, University of Tartu, Tartu, Estonia.,Department of Epidemiology, University of Groningen, Groningen, Netherlands
| | - Ilja M Nolte
- Department of Epidemiology, University of Groningen, Groningen, Netherlands
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, Groningen, Netherlands
| | - Krista Fischer
- Institute of Genomics, University of Tartu, Tartu, Estonia.,Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
| | | | - Davide Marnetto
- Institute of Genomics, University of Tartu, Tartu, Estonia.,Department of Neurosciences "Rita Levi Montalcini", University of Turin, Torino, Italy
| | - Luca Pagani
- Institute of Genomics, University of Tartu, Tartu, Estonia.,Department of Biology, University of Padova, Padova, Italy
| |
Collapse
|