1
|
Jee YH, Thibord F, Dominguez A, Sept C, Boulier K, Venkateswaran V, Ding Y, Cherlin T, Verma SS, Faro VL, Bartz TM, Boland A, Brody JA, Deleuze JF, Emmerich J, Germain M, Johnson AD, Kooperberg C, Morange PE, Pankratz N, Psaty BM, Reiner AP, Smadja DM, Sitlani CM, Suchon P, Tang W, Trégouët DA, Zöllner S, Pasaniuc B, Damrauer SM, Sanna S, Snieder H, Kabrhel C, Smith NL, Kraft P. Multi-ancestry polygenic risk scores for venous thromboembolism. Hum Mol Genet 2024:ddae097. [PMID: 38879759 DOI: 10.1093/hmg/ddae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 05/29/2024] [Accepted: 06/03/2024] [Indexed: 06/25/2024] Open
Abstract
Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium genome-wide association studies meta-analyses of European- (71 771 cases and 1 059 740 controls) and African-ancestry samples (7482 cases and 129 975 controls). We used LDpred2 and PRS-CSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6781 cases and 103 016 controls) and African-ancestry sample (1385 cases and 12 569 controls). Multi-ancestry PRSs with weights tuned in European-ancestry samples slightly outperformed ancestry-specific PRSs in European-ancestry test samples (e.g. the area under the receiver operating curve [AUC] was 0.609 for PRS-CSx_combinedEUR and 0.608 for PRS-CSxEUR [P = 0.00029]). Multi-ancestry PRSs with weights tuned in African-ancestry samples also outperformed ancestry-specific PRSs in African-ancestry test samples (PRS-CSxAFR: AUC = 0.58, PRS-CSx_combined AFR: AUC = 0.59), although this difference was not statistically significant (P = 0.34). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS might be used to improve performance across diverse populations to identify individuals at highest risk for VTE.
Collapse
Affiliation(s)
- Yon Ho Jee
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, United States
| | - Florian Thibord
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, 31 Center Drive, Bethesda, MD 20892, United States
- Framingham Heart Study, Boston University and National Heart, Lung, and Blood Institute, Framingham, 73 Mt. Wayte Ave, Suite #2, Framingham, MA 01702, United States
| | - Alicia Dominguez
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, United States
| | - Corriene Sept
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, United States
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, University of California Los Angeles, 611 Charles E. Young Drive East, Los Angeles, CA 90095-1570, United States
| | - Vidhya Venkateswaran
- Department of Oral Biology, University of California Los Angeles School of Dentistry, 13-089 CHS, Box 951668, Box 951570, Los Angeles, CA 90095-1668, United States
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California Los Angeles, 611 Charles E. Young Drive East, Los Angeles, CA 90095-1570, United States
| | - Tess Cherlin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St. Philadelphia, PA 19104-4238, United States
| | - Shefali Setia Verma
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, 3400 Spruce St. Philadelphia, PA 19104-4238, United States
| | - Valeria Lo Faro
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, PO Box 30.001, 9700 RB Groningen, The Netherlands
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Dag Hammarskjölds väg 20751 85 Uppsala, Sweden
| | - Traci M Bartz
- Cardiovascular Health Research Unit, Departments of Biostatistics and Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
- Laboratory of Excellence in Medical Genomics, GENMED, F-91057 Evry, France
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Jean-Francois Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine, 91057 Evry, France
- Laboratory of Excellence in Medical Genomics, GENMED, F-91057 Evry, France
- Centre d'Etude du Polymorphisme Humain, Fondation Jean Dausset, 27 rue Juliette Dodu, 75010 Paris, France
| | - Joseph Emmerich
- Department of Vascular Medicine, Paris Saint-Joseph Hospital Group, University of Paris, 75014 Paris, France
- INSERM CRESS UMR 1153, F-75005, Paris, France
| | - Marine Germain
- Bordeaux Population Health Research Center, University of Bordeaux, INSERM, UMR 1219, Bordeaux, France
| | - Andrew D Johnson
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, 31 Center Drive, Bethesda, MD 20892, United States
- Framingham Heart Study, Boston University and National Heart, Lung, and Blood Institute, Framingham, 73 Mt. Wayte Ave, Suite #2, Framingham, MA 01702, United States
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinbson Cancer Center, PO Box 19024, Seattle, WA 98109, United States
| | - Pierre-Emmanuel Morange
- Aix-Marseille University, INSERM, INRAE, Centre de Recherche en CardioVasculaire et Nutrition, Laboratory of Haematology, CRB Assistance Publique - Hôpitaux de Marseille, HemoVasc, 27, boulevard Jean Moulin, 13005 Marseille, France
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota, 420 Delaware Street SE, Minneapolis, MN 55455, United States
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Department of Epidemiology, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Department of Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Alexander P Reiner
- Division of Public Health Sciences, Fred Hutchinbson Cancer Center, PO Box 19024, Seattle, WA 98109, United States
- Department of Epidemiology, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - David M Smadja
- Innovative Therapies in Hemostasis, Université de Paris, INSERM, F-75006, Paris, France
- Hematology Department and Biosurgical Research Lab (Carpentier Foundation), Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), F-75015, Paris, France
| | - Colleen M Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
| | - Pierre Suchon
- Aix-Marseille University, INSERM, INRAE, Centre de Recherche en CardioVasculaire et Nutrition, Laboratory of Haematology, CRB Assistance Publique - Hôpitaux de Marseille, HemoVasc, 27, boulevard Jean Moulin, 13005 Marseille, France
| | - Weihong Tang
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, 1300 S. 2nd St., Minneapolis, MN 55454, United States
| | - David-Alexandre Trégouët
- Bordeaux Population Health Research Center, University of Bordeaux, INSERM, UMR 1219, Bordeaux, France
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, United States
| | - Bogdan Pasaniuc
- Department of Oral Biology, University of California Los Angeles School of Dentistry, 13-089 CHS, Box 951668, Box 951570, Los Angeles, CA 90095-1668, United States
| | - Scott M Damrauer
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, 415 Curie Blvd, Philadelphia, PA 19104, United States
- Department of Surgery, Department of Genetics, and Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Boulevard, Building 421, Philadelphia, PA 19104, United States
- Department of Surgery, Corporal Michael Crescenz VA Medical Center, 3900 Woodland Ave, Philadelphia, PA 19104, United States
| | - Serena Sanna
- Department of Genetics, University of Groningen, University Medical Center Groningen (UMCG), PO Box 30.001, 9700 RB Groningen, The Netherlands
- Institute for Genetics and Biomedical Research, National Research Council, SS 554 Km 4,500, 09042 Monserrato CA, Italy
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, PO Box 30.001, 9700 RB Groningen, The Netherlands
| | - Christopher Kabrhel
- Center for Vascular Emergencies, Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, 55 Fruit Street, Boston, MA 02114, United States
| | - Nicholas L Smith
- Department of Health Systems and Population Health, University of Washington, 4333 Brooklyn Ave, Seattle, WA 98195, United States
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, 1730 Minor Ave, Seattle, WA 98101, United States
- Department of Veterans Affairs Office of Research and Development, Seattle Epidemiologic Research and Information Center, 1660 S Columbian Way, S-152-E, Seattle, WA 98108, United States
| | - Peter Kraft
- Transdivisional Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Dr, Rockville, MD 20850, United States
| |
Collapse
|
2
|
Teng J, Zhai T, Zhang X, Zhao C, Wang W, Tang H, Wang D, Shang Y, Ning C, Zhang Q. Improving multi-population genomic prediction accuracy using multi-trait GBLUP models which incorporate global or local genetic correlation information. Brief Bioinform 2024; 25:bbae276. [PMID: 38856170 PMCID: PMC11163384 DOI: 10.1093/bib/bbae276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 05/05/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
In the application of genomic prediction, a situation often faced is that there are multiple populations in which genomic prediction (GP) need to be conducted. A common way to handle the multi-population GP is simply to combine the multiple populations into a single population. However, since these populations may be subject to different environments, there may exist genotype-environment interactions which may affect the accuracy of genomic prediction. In this study, we demonstrated that multi-trait genomic best linear unbiased prediction (MTGBLUP) can be used for multi-population genomic prediction, whereby the performances of a trait in different populations are regarded as different traits, and thus multi-population prediction is regarded as multi-trait prediction by employing the between-population genetic correlation. Using real datasets, we proved that MTGBLUP outperformed the conventional multi-population model that simply combines different populations together. We further proposed that MTGBLUP can be improved by partitioning the global between-population genetic correlation into local genetic correlations (LGC). We suggested two LGC models, LGC-model-1 and LGC-model-2, which partition the genome into regions with and without significant LGC (LGC-model-1) or regions with and without strong LGC (LGC-model-2). In analysis of real datasets, we demonstrated that the LGC models could increase universally the prediction accuracy and the relative improvement over MTGBLUP reached up to 163.86% (25.64% on average).
Collapse
Affiliation(s)
- Jun Teng
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
- Shandong Futeng Food Co. Ltd., Zaozhuang 277500, Shandong, China
| | - Tingting Zhai
- National Key Laboratory of Wheat Improvement, College of Life Science, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Xinyi Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Changheng Zhao
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Wenwen Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Hui Tang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Dan Wang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Yingli Shang
- College of Veterinary Medicine, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Chao Ning
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Technology, Shandong Agricultural University, Tai'an 271018, Shandong, China
| |
Collapse
|
3
|
Momin MM, Zhou X, Hyppönen E, Benyamin B, Lee SH. Cross-ancestry genetic architecture and prediction for cholesterol traits. Hum Genet 2024; 143:635-648. [PMID: 38536467 DOI: 10.1007/s00439-024-02660-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 02/13/2024] [Indexed: 05/18/2024]
Abstract
While cholesterol is essential, a high level of cholesterol is associated with the risk of cardiovascular diseases. Genome-wide association studies (GWASs) have proven successful in identifying genetic variants that are linked to cholesterol levels, predominantly in white European populations. However, the extent to which genetic effects on cholesterol vary across different ancestries remains largely unexplored. Here, we estimate cross-ancestry genetic correlation to address questions on how genetic effects are shared across ancestries. We find significant genetic heterogeneity between ancestries for cholesterol traits. Furthermore, we demonstrate that single nucleotide polymorphisms (SNPs) with concordant effects across ancestries for cholesterol are more frequently found in regulatory regions compared to other genomic regions. Indeed, the positive genetic covariance between ancestries is mostly driven by the effects of the concordant SNPs, whereas the genetic heterogeneity is attributed to the discordant SNPs. We also show that the predictive ability of the concordant SNPs is significantly higher than the discordant SNPs in the cross-ancestry polygenic prediction. The list of concordant SNPs for cholesterol is available in GWAS Catalog. These findings have relevance for the understanding of shared genetic architecture across ancestries, contributing to the development of clinical strategies for polygenic prediction of cholesterol in cross-ancestral settings.
Collapse
Affiliation(s)
- Md Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Khulshi, Chattogram, 4225, Bangladesh.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Xuan Zhou
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| |
Collapse
|
4
|
Zhang T, Zhou G, Klei L, Liu P, Chouldechova A, Zhao H, Roeder K, G'Sell M, Devlin B. Evaluating and improving health equity and fairness of polygenic scores. HGG ADVANCES 2024; 5:100280. [PMID: 38402414 PMCID: PMC10937319 DOI: 10.1016/j.xhgg.2024.100280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 02/26/2024] Open
Abstract
Polygenic scores (PGSs) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single-nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWASs, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum (JLS). In the simulation settings we explore, JLS provides more accurate PGSs compared to other methods, especially when measured in terms of fairness. In analyses of UK Biobank data, JLS was computationally more efficient but slightly less accurate than a Bayesian comparator, SDPRX. Like all PGS methods, JLS requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how JLS can help mitigate fairness-related harms that might result from the use of PGSs in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWASs for different ancestries, JLS is an effective approach for enhancing portability and reducing predictive bias.
Collapse
Affiliation(s)
- Tianyu Zhang
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Geyu Zhou
- Department of Biostatistics, Yale University, New Haven, CT 06511, USA
| | - Lambertus Klei
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Peng Liu
- Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Alexandra Chouldechova
- Microsoft Research NYC, New York, NY 10012, USA; Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT 06511, USA
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Max G'Sell
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
5
|
Imamura M, Maeda S. Perspectives on genetic studies of type 2 diabetes from the genome-wide association studies era to precision medicine. J Diabetes Investig 2024; 15:410-422. [PMID: 38259175 PMCID: PMC10981147 DOI: 10.1111/jdi.14149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 12/24/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open
Abstract
Genome-wide association studies (GWAS) have facilitated a substantial and rapid increase in the number of confirmed genetic susceptibility variants for complex diseases. Approximately 700 variants predisposing individuals to the risk for type 2 diabetes have been identified through GWAS until 2023. From 2018 to 2022, hundreds of type 2 diabetes susceptibility loci with smaller effect sizes were identified through large-scale GWAS with sample sizes of 200,000 to >1 million. The clinical translation of genetic information for type 2 diabetes includes the development of novel therapeutics and risk predictions. Although drug discovery based on loci identified in GWAS remains challenging owing to the difficulty of functional annotation, global efforts have been made to identify novel biological mechanisms and therapeutic targets by applying multi-omics approaches or searching for disease-associated coding variants in isolated founder populations. Polygenic risk scores (PRSs), comprising up to millions of associated variants, can identify individuals with higher disease risk than those in the general population. In populations of European descent, PRSs constructed from base GWAS data with a sample size of approximately 450,000 have predicted the onset of diseases well. However, European GWAS-derived PRSs have limited predictive performance in non-European populations. The predictive accuracy of a PRS largely depends on the sample size of the base GWAS data. The results of GWAS meta-analyses for multi-ethnic groups as base GWAS data and cross-population polygenic prediction methodology have been applied to establish a universal PRS applicable to small isolated ethnic populations.
Collapse
Affiliation(s)
- Minako Imamura
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of MedicineUniversity of the RyukyusNishihara‐ChoJapan
- Division of Clinical Laboratory and Blood TransfusionUniversity of the Ryukyus HospitalNishihara‐ChoJapan
| | - Shiro Maeda
- Department of Advanced Genomic and Laboratory Medicine, Graduate School of MedicineUniversity of the RyukyusNishihara‐ChoJapan
- Division of Clinical Laboratory and Blood TransfusionUniversity of the Ryukyus HospitalNishihara‐ChoJapan
| |
Collapse
|
6
|
Hou K, Gogarten S, Kim J, Hua X, Dias JA, Sun Q, Wang Y, Tan T, Atkinson EG, Martin A, Shortt J, Hirbo J, Li Y, Pasaniuc B, Zhang H. Admix-kit: an integrated toolkit and pipeline for genetic analyses of admixed populations. Bioinformatics 2024; 40:btae148. [PMID: 38490256 PMCID: PMC10980565 DOI: 10.1093/bioinformatics/btae148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 02/08/2024] [Accepted: 03/13/2024] [Indexed: 03/17/2024] Open
Abstract
SUMMARY Admixed populations, with their unique and diverse genetic backgrounds, are often underrepresented in genetic studies. This oversight not only limits our understanding but also exacerbates existing health disparities. One major barrier has been the lack of efficient tools tailored for the special challenges of genetic studies of admixed populations. Here, we present admix-kit, an integrated toolkit and pipeline for genetic analyses of admixed populations. Admix-kit implements a suite of methods to facilitate genotype and phenotype simulation, association testing, genetic architecture inference, and polygenic scoring in admixed populations. AVAILABILITY AND IMPLEMENTATION Admix-kit package is open-source and available at https://github.com/KangchengHou/admix-kit. Additionally, users can use the pipeline designed for admixed genotype simulation available at https://github.com/UW-GAC/admix-kit_workflow.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Stephanie Gogarten
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, United States
| | - Joohyun Kim
- Vanderbilt Genetics Institute and Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| | - Julie-Alexia Dias
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02120, United States
| | - Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Ying Wang
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, United States
| | - Alicia Martin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, United States
| | - Jonathan Shortt
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, United States
| | - Jibril Hirbo
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, United States
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, United States
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, 90095, United States
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, 20892, United States
| |
Collapse
|
7
|
Xiang R, Kelemen M, Xu Y, Harris LW, Parkinson H, Inouye M, Lambert SA. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med 2024; 16:33. [PMID: 38373998 PMCID: PMC10875792 DOI: 10.1186/s13073-024-01304-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 02/07/2024] [Indexed: 02/21/2024] Open
Abstract
Polygenic scores (PGS) can be used for risk stratification by quantifying individuals' genetic predisposition to disease, and many potentially clinically useful applications have been proposed. Here, we review the latest potential benefits of PGS in the clinic and challenges to implementation. PGS could augment risk stratification through combined use with traditional risk factors (demographics, disease-specific risk factors, family history, etc.), to support diagnostic pathways, to predict groups with therapeutic benefits, and to increase the efficiency of clinical trials. However, there exist challenges to maximizing the clinical utility of PGS, including FAIR (Findable, Accessible, Interoperable, and Reusable) use and standardized sharing of the genomic data needed to develop and recalculate PGS, the equitable performance of PGS across populations and ancestries, the generation of robust and reproducible PGS calculations, and the responsible communication and interpretation of results. We outline how these challenges may be overcome analytically and with more diverse data as well as highlight sustained community efforts to achieve equitable, impactful, and responsible use of PGS in healthcare.
Collapse
Affiliation(s)
- Ruidong Xiang
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Martin Kelemen
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
| | - Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Laura W Harris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia.
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK.
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK.
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK.
| | - Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
8
|
Sun Q, Rowland BT, Chen J, Mikhaylova AV, Avery C, Peters U, Lundin J, Matise T, Buyske S, Tao R, Mathias RA, Reiner AP, Auer PL, Cox NJ, Kooperberg C, Thornton TA, Raffield LM, Li Y. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat Commun 2024; 15:1016. [PMID: 38310129 PMCID: PMC10838303 DOI: 10.1038/s41467-024-45135-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Bryce T Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Tara Matise
- Department of Genetics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
9
|
Liao K, Zöllner S. A Stacking Framework for Polygenic Risk Prediction in Admixed Individuals. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.31.24302103. [PMID: 38434717 PMCID: PMC10907988 DOI: 10.1101/2024.01.31.24302103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/05/2024]
Abstract
Polygenic risk scores (PRS) are summaries of an individual's personalized genetic risk for a trait or disease. However, PRS often perform poorly for phenotype prediction when the ancestry of the target population does not match the population in which GWAS effect sizes were estimated. For many populations this can be addressed by performing GWAS in the target population. However, admixed individuals (whose genomes can be traced to multiple ancestral populations) lie on an ancestry continuum and are not easily represented as a discrete population. Here, we propose slaPRS (stacking local ancestry PRS), which incorporates multiple ancestry GWAS to alleviate the ancestry dependence of PRS in admixed samples. slaPRS uses ensemble learning (stacking) to combine local population specific PRS in regions across the genome. We compare slaPRS to single population PRS and a method that combines single population PRS globally. In simulations, slaPRS outperformed existing approaches and reduced the ancestry dependence of PRS in African Americans. In lipid traits from African British individuals (UK Biobank), slaPRS again improved on single population PRS while performing comparably to the globally combined PRS. slaPRS provides a data-driven and flexible framework to incorporate multiple population-specific GWAS and local ancestry in samples of admixed ancestry.
Collapse
Affiliation(s)
- Kevin Liao
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Sebastian Zöllner
- University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
- University of Michigan, Department of Psychiatry, Ann Arbor, MI, 48109, USA
| |
Collapse
|
10
|
Miao J, Wu Y, Lu Q. Statistical methods for gene-environment interaction analysis. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2024; 16:e1635. [PMID: 38699459 PMCID: PMC11064894 DOI: 10.1002/wics.1635] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 09/12/2023] [Indexed: 05/05/2024]
Abstract
Most human complex phenotypes result from multiple genetic and environmental factors and their interactions. Understanding the mechanisms by which genetic and environmental factors interact offers valuable insights into the genetic architecture of complex traits and holds great potential for advancing precision medicine. The emergence of large population biobanks has led to the development of numerous statistical methods aiming at identifying gene-environment interactions (G × E). In this review, we present state-of-the-art statistical methodologies for G × E analysis. We will survey a spectrum of approaches for single-variant G × E mapping, followed by various techniques for polygenic G × E analysis. We conclude this review with a discussion on the future directions and challenges in G × E research.
Collapse
Affiliation(s)
- Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, USA
| | - Yixuan Wu
- University of Wisconsin–Madison, Madison, Wisconsin, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, USA
- Department of Statistics, University of Wisconsin–Madison, Madison, Wisconsin, USA
- Center for Demography of Health and Aging, University of Wisconsin–Madison, Madison, Wisconsin, USA
| |
Collapse
|
11
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
12
|
Khan A, Shang N, Nestor JG, Weng C, Hripcsak G, Harris PC, Gharavi AG, Kiryluk K. Polygenic risk alters the penetrance of monogenic kidney disease. Nat Commun 2023; 14:8318. [PMID: 38097619 PMCID: PMC10721887 DOI: 10.1038/s41467-023-43878-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 11/22/2023] [Indexed: 12/17/2023] Open
Abstract
Chronic kidney disease (CKD) is determined by an interplay of monogenic, polygenic, and environmental risks. Autosomal dominant polycystic kidney disease (ADPKD) and COL4A-associated nephropathy (COL4A-AN) represent the most common forms of monogenic kidney diseases. These disorders have incomplete penetrance and variable expressivity, and we hypothesize that polygenic factors explain some of this variability. By combining SNP array, exome/genome sequence, and electronic health record data from the UK Biobank and All-of-Us cohorts, we demonstrate that the genome-wide polygenic score (GPS) significantly predicts CKD among ADPKD monogenic variant carriers. Compared to the middle tertile of the GPS for noncarriers, ADPKD variant carriers in the top tertile have a 54-fold increased risk of CKD, while ADPKD variant carriers in the bottom tertile have only a 3-fold increased risk of CKD. Similarly, the GPS significantly predicts CKD in COL4A-AN carriers. The carriers in the top tertile of the GPS have a 2.5-fold higher risk of CKD, while the risk for carriers in the bottom tertile is not different from the average population risk. These results suggest that accounting for polygenic risk improves risk stratification in monogenic kidney disease.
Collapse
Affiliation(s)
- Atlas Khan
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Ning Shang
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Jordan G Nestor
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - George Hripcsak
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Peter C Harris
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, USA
| | - Ali G Gharavi
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY, USA.
| |
Collapse
|
13
|
Cai M, Wang Z, Xiao J, Hu X, Chen G, Yang C. XMAP: Cross-population fine-mapping by leveraging genetic diversity and accounting for confounding bias. Nat Commun 2023; 14:6870. [PMID: 37898663 PMCID: PMC10613261 DOI: 10.1038/s41467-023-42614-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 10/17/2023] [Indexed: 10/30/2023] Open
Abstract
Fine-mapping prioritizes risk variants identified by genome-wide association studies (GWASs), serving as a critical step to uncover biological mechanisms underlying complex traits. However, several major challenges still remain for existing fine-mapping methods. First, the strong linkage disequilibrium among variants can limit the statistical power and resolution of fine-mapping. Second, it is computationally expensive to simultaneously search for multiple causal variants. Third, the confounding bias hidden in GWAS summary statistics can produce spurious signals. To address these challenges, we develop a statistical method for cross-population fine-mapping (XMAP) by leveraging genetic diversity and accounting for confounding bias. By using cross-population GWAS summary statistics from global biobanks and genomic consortia, we show that XMAP can achieve greater statistical power, better control of false positive rate, and substantially higher computational efficiency for identifying multiple causal signals, compared to existing methods. Importantly, we show that the output of XMAP can be integrated with single-cell datasets, which greatly improves the interpretation of putative causal variants in their cellular context at single-cell resolution.
Collapse
Affiliation(s)
- Mingxuan Cai
- Department of Biostatistics, City University of Hong Kong, Hong Kong SAR, China.
| | - Zhiwei Wang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Jiashun Xiao
- Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
- WeGene, Shenzhen Zaozhidao Technology Co., Ltd, Shenzhen, 518040, China
- Graduate Affairs, Faculty of Medicine, Chulalongkorn University, 10330, Bangkok, Thailand
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou, 511458, China.
- Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China.
| |
Collapse
|
14
|
Chen T, Zhang H, Mazumder R, Lin X. Ensembled best subset selection using summary statistics for polygenic risk prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.25.559307. [PMID: 37886515 PMCID: PMC10602024 DOI: 10.1101/2023.09.25.559307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L 0 L 2 penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
Collapse
|
15
|
Zhang T, Klei L, Liu P, Chouldechova A, Roeder K, G'Sell M, Devlin B. Evaluating and Improving Health Equity and Fairness of Polygenic Scores. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.22.559051. [PMID: 37790341 PMCID: PMC10542523 DOI: 10.1101/2023.09.22.559051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Polygenic scores (PGS) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWAS, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum. In the simulation settings we explore, Joint-Lassosum provides more accurate PGS compared with other methods, especially when measured in terms of fairness. Like all PGS methods, Joint-Lassosum requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how Joint-Lassosum can help mitigate fairness-related harms that might result from the use of PGS scores in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWAS for different ancestries, Joint-Lassosum is an effective approach for enhancing portability and reducing predictive bias.
Collapse
|
16
|
Zhang C, Zhang Y, Zhang Y, Zhao H. Benchmarking of local genetic correlation estimation methods using summary statistics from genome-wide association studies. Brief Bioinform 2023; 24:bbad407. [PMID: 37974509 PMCID: PMC10654488 DOI: 10.1093/bib/bbad407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/06/2023] [Accepted: 10/24/2023] [Indexed: 11/19/2023] Open
Abstract
Local genetic correlation evaluates the correlation of additive genetic effects between different traits across the same genetic variants at a genomic locus. It has been proven informative for understanding the genetic similarities of complex traits beyond that captured by global genetic correlation calculated across the whole genome. Several summary-statistics-based approaches have been developed for estimating local genetic correlation, including $\rho$-hess, SUPERGNOVA and LAVA. However, there has not been a comprehensive evaluation of these methods to offer practical guidelines on the choices of these methods. In this study, we conduct benchmark comparisons of the performance of these three methods through extensive simulation and real data analyses. We focus on two technical difficulties in estimating local genetic correlation: sample overlaps across traits and local linkage disequilibrium (LD) estimates when only the external reference panels are available. Our simulations suggest the likelihood of incorrectly identifying correlated regions and local correlation estimation accuracy are highly dependent on the estimation of the local LD matrix. These observations are corroborated by real data analyses of 31 complex traits. Overall, our findings illuminate the distinct results yielded by different methods applied in post-genome-wide association studies (post-GWAS) local correlation studies. We underscore the sensitivity of local genetic correlation estimates and inferences to the precision of local LD estimation. These observations accentuate the vital need for ongoing refinement in methodologies.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Yiliang Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Yunxuan Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| |
Collapse
|
17
|
Gyawali PK, Le Guen Y, Liu X, Belloy ME, Tang H, Zou J, He Z. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun Biol 2023; 6:964. [PMID: 37736834 PMCID: PMC10517023 DOI: 10.1038/s42003-023-05352-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.
Collapse
Affiliation(s)
- Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Institut du Cerveau-Paris Brain Institute-ICM, Paris, France
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA.
| |
Collapse
|