51
|
Dapas M, Lee YL, Wentworth-Sheilds W, Im HK, Ober C, Schoettler N. Revealing polygenic pleiotropy using genetic risk scores for asthma. HGG ADVANCES 2023; 4:100233. [PMID: 37663543 PMCID: PMC10474095 DOI: 10.1016/j.xhgg.2023.100233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 08/11/2023] [Indexed: 09/05/2023] Open
Abstract
In this study we examined how genetic risk for asthma associates with different features of the disease and with other medical conditions and traits. Using summary statistics from two multi-ancestry genome-wide association studies of asthma, we modeled polygenic risk scores (PRSs) and validated their predictive performance in the UK Biobank. We then performed phenome-wide association studies of the asthma PRSs with 371 heritable traits in the UK Biobank. We identified 228 total significant associations across a variety of organ systems, including associations that varied by PRS model, sex, age of asthma onset, ancestry, and human leukocyte antigen region alleles. Our results highlight pervasive pleiotropy between asthma and numerous other traits and conditions and elucidate pathways that contribute to asthma and its comorbidities.
Collapse
Affiliation(s)
- Matthew Dapas
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Yu Lin Lee
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
- Biological Sciences Collegiate Division, University of Chicago, Chicago, IL, USA
| | | | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Carole Ober
- Department of Human Genetics, University of Chicago, Chicago, IL, USA
| | - Nathan Schoettler
- Section of Pulmonary and Critical Care Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| |
Collapse
|
52
|
Wang Y, Kanai M, Tan T, Kamariza M, Tsuo K, Yuan K, Zhou W, Okada Y, Huang H, Turley P, Atkinson EG, Martin AR. Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology. CELL GENOMICS 2023; 3:100408. [PMID: 37868036 PMCID: PMC10589629 DOI: 10.1016/j.xgen.2023.100408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/21/2023] [Accepted: 08/22/2023] [Indexed: 10/24/2023]
Abstract
Polygenic risk scores (PRSs) developed from multi-ancestry genome-wide association studies (GWASs), PRSmulti, hold promise for improving PRS accuracy and generalizability across populations. To establish best practices for leveraging the increasing diversity of genomic studies, we investigated how various factors affect the performance of PRSmulti compared with PRSs constructed from single-ancestry GWASs (PRSsingle). Through extensive simulations and empirical analyses, we showed that PRSmulti overall outperformed PRSsingle in understudied populations, except when the understudied population represented a small proportion of the multi-ancestry GWAS. Furthermore, integrating PRSs based on local ancestry-informed GWASs and large-scale, European-based PRSs improved predictive performance in understudied African populations, especially for less polygenic traits with large-effect ancestry-enriched variants. Our work highlights the importance of diversifying genomic studies to achieve equitable PRS performance across ancestral populations and provides guidance for developing PRSs from multiple studies.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kai Yuan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Center for Infectious Disease Education and Research (CiDER), and Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
| | - the BioBank Japan Project
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Society of Fellows, Harvard University, Cambridge, MA 02138, USA
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Center for Infectious Disease Education and Research (CiDER), and Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
- Department of Economics, and Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Patrick Turley
- Department of Economics, and Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Elizabeth G. Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| |
Collapse
|
53
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
54
|
Zhang H, Zhan J, Jin J, Zhang J, Lu W, Zhao R, Ahearn TU, Yu Z, O'Connell J, Jiang Y, Chen T, Okuhara D, Garcia-Closas M, Lin X, Koelsch BL, Chatterjee N. A new method for multiancestry polygenic prediction improves performance across diverse populations. Nat Genet 2023; 55:1757-1768. [PMID: 37749244 PMCID: PMC10923245 DOI: 10.1038/s41588-023-01501-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 08/16/2023] [Indexed: 09/27/2023]
Abstract
Polygenic risk scores (PRSs) increasingly predict complex traits; however, suboptimal performance in non-European populations raise concerns about clinical applications and health inequities. We developed CT-SLEB, a powerful and scalable method to calculate PRSs, using ancestry-specific genome-wide association study summary statistics from multiancestry training samples, integrating clumping and thresholding, empirical Bayes and superlearning. We evaluated CT-SLEB and nine alternative methods with large-scale simulated genome-wide association studies (~19 million common variants) and datasets from 23andMe, Inc., the Global Lipids Genetics Consortium, All of Us and UK Biobank, involving 5.1 million individuals of diverse ancestry, with 1.18 million individuals from four non-European populations across 13 complex traits. Results demonstrated that CT-SLEB significantly improves PRS performance in non-European populations compared with simple alternatives, with comparable or superior performance to a recent, computationally intensive method. Moreover, our simulation studies offered insights into sample size requirements and SNP density effects on multiancestry risk prediction.
Collapse
Affiliation(s)
- Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | | | - Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Wenxuan Lu
- Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Zhi Yu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | | | | | - Tony Chen
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Division of Genetics and Epidemiology, Institute of Cancer Research, London, UK
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | | | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
55
|
Chen T, Zhang H, Mazumder R, Lin X. Ensembled best subset selection using summary statistics for polygenic risk prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.25.559307. [PMID: 37886515 PMCID: PMC10602024 DOI: 10.1101/2023.09.25.559307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, yet existing methods face a tradeoff between predictive power and computational efficiency. We introduce ALL-Sum, a fast and scalable PRS method that combines an efficient summary statistic-based L 0 L 2 penalized regression algorithm with an ensembling step that aggregates estimates from different tuning parameters for improved prediction performance. In extensive large-scale simulations across a wide range of polygenicity and genome-wide association studies (GWAS) sample sizes, ALL-Sum consistently outperforms popular alternative methods in terms of prediction accuracy, runtime, and memory usage. We analyze 27 published GWAS summary statistics for 11 complex traits from 9 reputable data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen, evaluated using individual-level UKBB data. ALL-Sum achieves the highest accuracy for most traits, particularly for GWAS with large sample sizes. We provide ALL-Sum as a user-friendly command-line software with pre-computed reference data for streamlined user-end analysis.
Collapse
|
56
|
Sun Y, McDonald T, Baur A, Xu H, Bateman NB, Shen Y, Li C, Ye K. Fish oil supplementation modifies the genetic potential for blood lipids. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.22.23295987. [PMID: 37808791 PMCID: PMC10557817 DOI: 10.1101/2023.09.22.23295987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Background Dyslipidemia is a well-known risk factor for cardiovascular disease, which has been the leading cause of mortality worldwide. Although habitual intake of fish oil has been implicated in offering cardioprotective effects through triglyceride reduction, the interactions of fish oil with the genetic predisposition to dysregulated lipids remain elusive. Objectives We examined whether fish oil supplementation can modify the genetic potential for the circulating levels of four lipids, including total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides. Methods A total of 441,985 participants with complete genetic and phenotypic data from the UK Biobank were included in our study. Polygenic scores (PGS) were calculated in participants of diverse ancestries. Multivariable linear regression models were used to assess associations with adjustment for relevant risk factors. Results Fish oil supplementation mitigated genetic susceptibility to elevated levels of total cholesterol, LDL-C, and triglycerides, while amplifying genetic potential for increased HDL-C among 424,090 participants of European ancestry P interaction < 0.05 . Consistent significant findings were obtained using PGS calculated based on multiple genome-wide association studies or alternative PGS methods. We also showed that fish oil significantly attenuated genetic predisposition to high triglycerides in African-ancestry participants. Conclusions Fish oil supplementation attenuated the genetic susceptibility to elevated blood levels of total cholesterol, LDL-C, and triglycerides, while accentuating genetic potential for higher HDL-C. These results suggest that fish oil may have a beneficial impact on modifying genome-wide genetic effects on elevated lipid levels in the general population.
Collapse
Affiliation(s)
- Yitang Sun
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Tryggvi McDonald
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Abigail Baur
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Huifang Xu
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Naveen Brahman Bateman
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
| | - Ye Shen
- Department of Epidemiology and Biostatistics, College of Public Health, University of Georgia, Athens, GA, USA
| | - Changwei Li
- Department of Epidemiology, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, USA
| | - Kaixiong Ye
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, Athens, GA, USA
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| |
Collapse
|
57
|
Zhang T, Klei L, Liu P, Chouldechova A, Roeder K, G'Sell M, Devlin B. Evaluating and Improving Health Equity and Fairness of Polygenic Scores. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.22.559051. [PMID: 37790341 PMCID: PMC10542523 DOI: 10.1101/2023.09.22.559051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Polygenic scores (PGS) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWAS, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum. In the simulation settings we explore, Joint-Lassosum provides more accurate PGS compared with other methods, especially when measured in terms of fairness. Like all PGS methods, Joint-Lassosum requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how Joint-Lassosum can help mitigate fairness-related harms that might result from the use of PGS scores in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWAS for different ancestries, Joint-Lassosum is an effective approach for enhancing portability and reducing predictive bias.
Collapse
|
58
|
Gyawali PK, Le Guen Y, Liu X, Belloy ME, Tang H, Zou J, He Z. Improving genetic risk prediction across diverse population by disentangling ancestry representations. Commun Biol 2023; 6:964. [PMID: 37736834 PMCID: PMC10517023 DOI: 10.1038/s42003-023-05352-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Accepted: 09/12/2023] [Indexed: 09/23/2023] Open
Abstract
Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.
Collapse
Affiliation(s)
- Prashnna K Gyawali
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
| | - Yann Le Guen
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
- Institut du Cerveau-Paris Brain Institute-ICM, Paris, France
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Michael E Belloy
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Hua Tang
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - James Zou
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA.
- Quantitative Sciences Unit, Department of Medicine (Biomedical Informatics Research), Stanford University, Stanford, CA, USA.
| |
Collapse
|
59
|
Jin J, Zhan J, Zhang J, Zhao R, O’Connell J, Jiang Y, Buyske S, Gignoux C, Haiman C, Kenny EE, Kooperberg C, North K, Koelsch BL, Wojcik G, Zhang H, Chatterjee N. MUSSEL: Enhanced Bayesian Polygenic Risk Prediction Leveraging Information across Multiple Ancestry Groups. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.12.536510. [PMID: 37090648 PMCID: PMC10120638 DOI: 10.1101/2023.04.12.536510] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2023]
Abstract
Polygenic risk scores (PRS) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across different populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in the summary statistics from genome-wide association studies (GWAS) across multiple ancestry groups. MUSSEL conducts Bayesian hierarchical modeling under a MUltivariate Spike-and-Slab model for effect-size distribution and incorporates an Ensemble Learning step using super learner to combine information across different tuning parameter settings and ancestry groups. In our simulation studies and data analyses of 16 traits across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. The method, for example, has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African Ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, underlying trait architecture, and the choice of reference samples for LD estimation, and thus ultimately, a combination of methods may be needed to generate the most robust PRS across diverse populations.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | - Steven Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, USA
| | - Christopher Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Eimear E. Kenny
- Icahn Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Kari North
- Department of Epidemiology, University of North Carolina Chapel Hill, Chapel Hill, NC, USA
| | | | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Haoyu Zhang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
60
|
Forrest IS, O’Neal AJ, Pedra JHF, Do R. Cholesterol Contributes to Risk, Severity, and Machine Learning-Driven Diagnosis of Lyme Disease. Clin Infect Dis 2023; 77:839-847. [PMID: 37227948 PMCID: PMC10506776 DOI: 10.1093/cid/ciad307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/09/2023] [Accepted: 05/18/2023] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND Lyme disease is the most prevalent vector-borne disease in the US, yet its host factors are poorly understood and diagnostic tests are limited. We evaluated patients in a large health system to uncover cholesterol's role in the susceptibility, severity, and machine learning-based diagnosis of Lyme disease. METHODS A longitudinal health system cohort comprised 1 019 175 individuals with electronic health record data and 50 329 with linked genetic data. Associations of blood cholesterol level, cholesterol genetic scores comprising common genetic variants, and burden of rare loss-of-function (LoF) variants in cholesterol metabolism genes with Lyme disease were investigated. A portable machine learning model was constructed and tested to predict Lyme disease using routine lipid and clinical measurements. RESULTS There were 3832 cases of Lyme disease. Increasing cholesterol was associated with greater risk of Lyme disease and hypercholesterolemia was more prevalent in Lyme disease cases than in controls. Cholesterol genetic scores and rare LoF variants in CD36 and LDLR were associated with Lyme disease risk. Serological profiling of cases revealed parallel trajectories of rising cholesterol and immunoglobulin levels over the disease course, including marked increases in individuals with LoF variants and high cholesterol genetic scores. The machine learning model predicted Lyme disease solely using routine lipid panel, blood count, and metabolic measurements. CONCLUSIONS These results demonstrate the value of large-scale genetic and clinical data to reveal host factors underlying infectious disease biology, risk, and prognosis and the potential for their clinical translation to machine learning diagnostics that do not need specialized assays.
Collapse
Affiliation(s)
- Iain S Forrest
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Medical Scientist Training Program, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Anya J O’Neal
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Joao H F Pedra
- Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, Maryland, USA
| | - Ron Do
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
| |
Collapse
|
61
|
Abstract
Since the publication of the first genome-wide association study for cancer in 2007, thousands of common alleles that are associated with the risk of cancer have been identified. The relative risk associated with individual variants is small and of limited clinical significance. However, the combined effect of multiple risk variants as captured by polygenic scores (PGSs) may be much greater and therefore provide risk discrimination that is clinically useful. We review the considerable research efforts over the past 15 years for developing statistical methods for PGSs and their application in large-scale genome-wide association studies to develop PGSs for various cancers. We review the predictive performance of these PGSs and the multiple challenges currently limiting the clinical application of PGSs. Despite this, PGSs are beginning to be incorporated into clinical multifactorial risk prediction models to stratify risk in both clinical trials and clinical implementation studies.
Collapse
Affiliation(s)
- Xin Yang
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Siddhartha Kar
- MRC Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK
- Early Cancer Institute, Department of Oncology, University of Cambridge, Cambridge, UK
| | - Antonis C Antoniou
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Paul D P Pharoah
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
62
|
Amariuta T, Siewert-Rocks K, Price AL. Modeling tissue co-regulation estimates tissue-specific contributions to disease. Nat Genet 2023; 55:1503-1511. [PMID: 37580597 PMCID: PMC10904330 DOI: 10.1038/s41588-023-01474-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 07/13/2023] [Indexed: 08/16/2023]
Abstract
Integrative analyses of genome-wide association studies and gene expression data have implicated many disease-critical tissues. However, co-regulation of genetic effects on gene expression across tissues impedes distinguishing biologically causal tissues from tagging tissues. In the present study, we introduce tissue co-regulation score regression (TCSC), which disentangles causal tissues from tagging tissues by regressing gene-disease association statistics (from transcriptome-wide association studies) on tissue co-regulation scores, reflecting correlations of predicted gene expression across genes and tissues. We applied TCSC to 78 diseases/traits (average n = 302,000) and gene expression prediction models for 48 GTEx tissues. TCSC identified 21 causal tissue-trait pairs at a 5% false discovery rate (FDR), including well-established findings, biologically plausible new findings (for example, aorta artery and glaucoma) and increased specificity of known tissue-trait associations (for example, subcutaneous adipose, but not visceral adipose, and high-density lipoprotein). TCSC also identified 17 causal tissue-trait covariance pairs at 5% FDR. In conclusion, TCSC is a precise method for distinguishing causal tissues from tagging tissues.
Collapse
Affiliation(s)
- Tiffany Amariuta
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Katherine Siewert-Rocks
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
63
|
Gregga I, Pharoah PDP, Gayther SA, Manichaikul A, Im HK, Kar SP, Schildkraut JM, Wheeler HE. Predicted Proteome Association Studies of Breast, Prostate, Ovarian, and Endometrial Cancers Implicate Plasma Protein Regulation in Cancer Susceptibility. Cancer Epidemiol Biomarkers Prev 2023; 32:1198-1207. [PMID: 37409955 PMCID: PMC10528410 DOI: 10.1158/1055-9965.epi-23-0309] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/30/2023] [Accepted: 06/28/2023] [Indexed: 07/07/2023] Open
Abstract
BACKGROUND Predicting protein levels from genotypes for proteome-wide association studies (PWAS) may provide insight into the mechanisms underlying cancer susceptibility. METHODS We performed PWAS of breast, endometrial, ovarian, and prostate cancers and their subtypes in several large European-ancestry discovery consortia (effective sample size: 237,483 cases/317,006 controls) and tested the results for replication in an independent European-ancestry GWAS (31,969 cases/410,350 controls). We performed PWAS using the cancer GWAS summary statistics and two sets of plasma protein prediction models, followed by colocalization analysis. RESULTS Using Atherosclerosis Risk in Communities (ARIC) models, we identified 93 protein-cancer associations [false discovery rate (FDR) < 0.05]. We then performed a meta-analysis of the discovery and replication PWAS, resulting in 61 significant protein-cancer associations (FDR < 0.05). Ten of 15 protein-cancer pairs that could be tested using Trans-Omics for Precision Medicine (TOPMed) protein prediction models replicated with the same directions of effect in both cancer GWAS (P < 0.05). To further support our results, we applied Bayesian colocalization analysis and found colocalized SNPs for SERPINA3 protein levels and prostate cancer (posterior probability, PP = 0.65) and SNUPN protein levels and breast cancer (PP = 0.62). CONCLUSIONS We used PWAS to identify potential biomarkers of hormone-related cancer risk. SNPs in SERPINA3 and SNUPN did not reach genome-wide significance for cancer in the original GWAS, highlighting the power of PWAS for novel locus discovery, with the added advantage of providing directions of protein effect. IMPACT PWAS and colocalization are promising methods to identify potential molecular mechanisms underlying complex traits.
Collapse
Affiliation(s)
- Isabelle Gregga
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| | - Paul D. P. Pharoah
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Simon A. Gayther
- Center for Bioinformatics and Functional Genomics, Department of Biomedical Science, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA
| | - Siddhartha P. Kar
- Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Joellen M. Schildkraut
- Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Heather E. Wheeler
- Department of Biology, Loyola University Chicago, Chicago, IL, USA
- Program in Bioinformatics, Loyola University Chicago, Chicago, IL, USA
| |
Collapse
|
64
|
Salehi Nowbandegani P, Wohns AW, Ballard JL, Lander ES, Bloemendal A, Neale BM, O'Connor LJ. Extremely sparse models of linkage disequilibrium in ancestrally diverse association studies. Nat Genet 2023; 55:1494-1502. [PMID: 37640881 DOI: 10.1038/s41588-023-01487-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 07/24/2023] [Indexed: 08/31/2023]
Abstract
Linkage disequilibrium (LD) is the correlation among nearby genetic variants. In genetic association studies, LD is often modeled using large correlation matrices, but this approach is inefficient, especially in ancestrally diverse studies. In the present study, we introduce LD graphical models (LDGMs), which are an extremely sparse and efficient representation of LD. LDGMs are derived from genome-wide genealogies; statistical relationships among alleles in the LDGM correspond to genealogical relationships among haplotypes. We published LDGMs and ancestry-specific LDGM precision matrices for 18 million common variants (minor allele frequency >1%) in five ancestry groups, validated their accuracy and demonstrated order-of-magnitude improvements in runtime for commonly used LD matrix computations. We implemented an extremely fast multiancestry polygenic prediction method, BLUPx-ldgm, which performs better than a similar method based on the reference LD correlation matrix. LDGMs will enable sophisticated methods that scale to ancestrally diverse genetic association data across millions of variants and individuals.
Collapse
Affiliation(s)
- Pouria Salehi Nowbandegani
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Anthony Wilder Wohns
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
- Stanford University School of Medicine, Stanford, CA, USA.
| | - Jenna L Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Graduate Group in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA
| | - Eric S Lander
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biology, MIT, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alex Bloemendal
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luke J O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
65
|
Tan T, Atkinson EG. Strategies for the Genomic Analysis of Admixed Populations. Annu Rev Biomed Data Sci 2023; 6:105-127. [PMID: 37127050 PMCID: PMC10871708 DOI: 10.1146/annurev-biodatasci-020722-014310] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Admixed populations constitute a large portion of global human genetic diversity, yet they are often left out of genomics analyses. This exclusion is problematic, as it leads to disparities in the understanding of the genetic structure and history of diverse cohorts and the performance of genomic medicine across populations. Admixed populations have particular statistical challenges, as they inherit genomic segments from multiple source populations-the primary reason they have historically been excluded from genetic studies. In recent years, however, an increasing number of statistical methods and software tools have been developed to account for and leverage admixture in the context of genomics analyses. Here, we provide a survey of such computational strategies for the informed consideration of admixture to allow for the well-calibrated inclusion of mixed ancestry populations in large-scale genomics studies, and we detail persisting gaps in existing tools.
Collapse
Affiliation(s)
- Taotao Tan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| | - Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA;
| |
Collapse
|
66
|
Gao Y, Sharma T, Cui Y. Addressing the Challenge of Biomedical Data Inequality: An Artificial Intelligence Perspective. Annu Rev Biomed Data Sci 2023; 6:153-171. [PMID: 37104653 PMCID: PMC10529864 DOI: 10.1146/annurev-biodatasci-020722-020704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.
Collapse
Affiliation(s)
- Yan Gao
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Teena Sharma
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| | - Yan Cui
- Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, Tennessee, USA;
| |
Collapse
|
67
|
Hou K, Xu Z, Ding Y, Harpak A, Pasaniuc B. Calibrated prediction intervals for polygenic scores across diverse contexts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.24.23293056. [PMID: 37546999 PMCID: PMC10402211 DOI: 10.1101/2023.07.24.23293056] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Polygenic scores (PGS) have emerged as the tool of choice for genomic prediction in a wide range of fields from agriculture to personalized medicine. We analyze data from two large biobanks in the US (All of Us) and the UK (UK Biobank) to find widespread variability in PGS performance across contexts. Many contexts, including age, sex, and income, impact PGS accuracies with similar magnitudes as genetic ancestry. PGSs trained in single versus multi-ancestry cohorts show similar context-specificity in their accuracies. We introduce trait prediction intervals that are allowed to vary across contexts as a principled approach to account for context-specific PGS accuracy in genomic prediction. We model the impact of all contexts in a joint framework to enable PGS-based trait predictions that are well-calibrated (contain the trait value with 90% probability in all contexts), whereas methods that ignore context are mis-calibrated. We show that prediction intervals need to be adjusted for all considered traits ranging from 10% for diastolic blood pressure to 80% for waist circumference. Adjustment of prediction intervals depends on the dataset; for example, prediction intervals for education years need to be adjusted by 90% in All of Us versus 8% in UK Biobank. Our results provide a path forward towards utilization of PGS as a prediction tool across all individuals regardless of their contexts while highlighting the importance of comprehensive profile of context information in study design and data collection.
Collapse
Affiliation(s)
- Kangcheng Hou
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Yi Ding
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - Arbel Harpak
- Department of Population Health, The University of Texas at Austin, Austin, TX, USA
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, Los Angeles
| |
Collapse
|
68
|
The Impact of Genomic Variation on Function (IGVF) Consortium. ARXIV 2023:arXiv:2307.13708v1. [PMID: 37547663 PMCID: PMC10402186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Our genomes influence nearly every aspect of human biology from molecular and cellular functions to phenotypes in health and disease. Human genetics studies have now associated hundreds of thousands of differences in our DNA sequence ("genomic variation") with disease risk and other phenotypes, many of which could reveal novel mechanisms of human biology and uncover the basis of genetic predispositions to diseases, thereby guiding the development of new diagnostics and therapeutics. Yet, understanding how genomic variation alters genome function to influence phenotype has proven challenging. To unlock these insights, we need a systematic and comprehensive catalog of genome function and the molecular and cellular effects of genomic variants. Toward this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations, and predictive modeling to investigate the relationships among genomic variation, genome function, and phenotypes. Through systematic comparisons and benchmarking of experimental and computational methods, we aim to create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how both coding and noncoding variants may connect through gene regulatory and protein interaction networks. These experimental data, computational predictions, and accompanying standards and pipelines will be integrated into an open resource that will catalyze community efforts to explore genome function and the impact of genetic variation on human biology and disease across populations.
Collapse
|
69
|
Raben TG, Lello L, Widen E, Hsu SDH. Biobank-scale methods and projections for sparse polygenic prediction from machine learning. Sci Rep 2023; 13:11662. [PMID: 37468507 PMCID: PMC10356957 DOI: 10.1038/s41598-023-37580-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 06/23/2023] [Indexed: 07/21/2023] Open
Abstract
In this paper we characterize the performance of linear models trained via widely-used sparse machine learning algorithms. We build polygenic scores and examine performance as a function of training set size, genetic ancestral background, and training method. We show that predictor performance is most strongly dependent on size of training data, with smaller gains from algorithmic improvements. We find that LASSO generally performs as well as the best methods, judged by a variety of metrics. We also investigate performance characteristics of predictors trained on one genetic ancestry group when applied to another. Using LASSO, we develop a novel method for projecting AUC and correlation as a function of data size (i.e., for new biobanks) and characterize the asymptotic limit of performance. Additionally, for LASSO (compressed sensing) we show that performance metrics and predictor sparsity are in agreement with theoretical predictions from the Donoho-Tanner phase transition. Specifically, a future predictor trained in the Taiwan Precision Medicine Initiative for asthma can achieve an AUC of [Formula: see text] and for height a correlation of [Formula: see text] for a Taiwanese population. This is above the measured values of [Formula: see text] and [Formula: see text], respectively, for UK Biobank trained predictors applied to a European population.
Collapse
Affiliation(s)
- Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, Michigan, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Erik Widen
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, Michigan, USA
- Genomic Prediction, Inc., North Brunswick, NJ, USA
| |
Collapse
|
70
|
Bocher O, Gilly A, Park YC, Zeggini E, Morris AP. Bridging the diversity gap: Analytical and study design considerations for improving the accuracy of trans-ancestry genetic prediction. HGG ADVANCES 2023; 4:100214. [PMID: 37448981 PMCID: PMC10336686 DOI: 10.1016/j.xhgg.2023.100214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Genetic prediction of common complex disease risk is an essential component of precision medicine. Currently, genome-wide association studies (GWASs) are mostly composed of European-ancestry samples and resulting polygenic scores (PGSs) have been shown to poorly transfer to other ancestries partly due to heterogeneity of allelic effects between populations. Fixed-effects (FETA) and random-effects (RETA) trans-ancestry meta-analyses do not model such ancestry-related heterogeneity, while ancestry-specific (AS) scores may suffer from low power due to low sample sizes. In contrast, trans-ancestry meta-regression (TAMR) builds ancestry-aware PGS that account for more complex trans-ancestry architectures. Here, we examine the predictive performance of these four PGSs under multiple genetic architectures and ancestry configurations. We show that the predictive performance of FETA and RETA is strongly affected by cross-ancestry genetic heterogeneity, while AS PGS performance decreases in under-represented target populations. TAMR PGS is also impacted by heterogeneity but maintains good prediction performance in most situations, especially in ancestry-diverse scenarios. In simulations of human complex traits, TAMR scores currently explain 25% more phenotypic variance than AS in triglyceride levels and 33% more phenotypic variance than FETA in type 2 diabetes in most non-European populations. Importantly, a high proportion of non-European-ancestry individuals is needed to reach prediction levels that are comparable in those populations to the one observed in European-ancestry studies. Our results highlight the need to rebalance the ancestral composition of GWAS to enable accurate prediction in non-European-ancestry groups, and demonstrate the relevance of meta-regression approaches for compensating some of the current population biases in GWAS.
Collapse
Affiliation(s)
| | | | | | - Eleftheria Zeggini
- ITG, Helmholtz Zentrum München, Munich, Germany
- Technical University of Munich, Munich, Germany
- Klinikum Rechts der Isar, Munich, Germany
| | - Andrew P. Morris
- ITG, Helmholtz Zentrum München, Munich, Germany
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
| |
Collapse
|
71
|
Jeong R, Bulyk ML. Blood cell traits' GWAS loci colocalization with variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants. CELL GENOMICS 2023; 3:100327. [PMID: 37492098 PMCID: PMC10363807 DOI: 10.1016/j.xgen.2023.100327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 02/10/2023] [Accepted: 04/25/2023] [Indexed: 07/27/2023]
Abstract
Genome-wide association studies (GWASs) have uncovered numerous trait-associated loci across the human genome, most of which are located in noncoding regions, making interpretation difficult. Moreover, causal variants are hard to statistically fine-map at many loci because of widespread linkage disequilibrium. To address this challenge, we present a strategy utilizing transcription factor (TF) binding quantitative trait loci (bQTLs) for colocalization analysis to identify trait associations likely mediated by TF occupancy variation and to pinpoint likely causal variants using motif scores. We applied this approach to PU.1 bQTLs in lymphoblastoid cell lines and blood cell trait GWAS data. Colocalization analysis revealed 69 blood cell trait GWAS loci putatively driven by PU.1 occupancy variation. We nominate PU.1 motif-altering variants as the likely shared causal variants at 51 loci. Such integration of TF bQTL data with other GWAS data may reveal transcriptional regulatory mechanisms and causal noncoding variants underlying additional complex traits.
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
72
|
Lehmann B, Mackintosh M, McVean G, Holmes C. Optimal strategies for learning multi-ancestry polygenic scores vary across traits. Nat Commun 2023; 14:4023. [PMID: 37419925 PMCID: PMC10328935 DOI: 10.1038/s41467-023-38930-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 05/22/2023] [Indexed: 07/09/2023] Open
Abstract
Polygenic scores (PGSs) are individual-level measures that aggregate the genome-wide genetic predisposition to a given trait. As PGS have predominantly been developed using European-ancestry samples, trait prediction using such European ancestry-derived PGS is less accurate in non-European ancestry individuals. Although there has been recent progress in combining multiple PGS trained on distinct populations, the problem of how to maximize performance given a multiple-ancestry cohort is largely unexplored. Here, we investigate the effect of sample size and ancestry composition on PGS performance for fifteen traits in UK Biobank. For some traits, PGS estimated using a relatively small African-ancestry training set outperformed, on an African-ancestry test set, PGS estimated using a much larger European-ancestry only training set. We observe similar, but not identical, results when considering other minority-ancestry groups within UK Biobank. Our results emphasise the importance of targeted data collection from underrepresented groups in order to address existing disparities in PGS performance.
Collapse
Affiliation(s)
- Brieuc Lehmann
- Department of Statistical Science, University College London, London, UK.
| | | | - Gil McVean
- Big Data Institute, University of Oxford, Oxford, UK
| | - Chris Holmes
- The Alan Turing Institute, London, UK
- Big Data Institute, University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
| |
Collapse
|
73
|
Patel AP, Wang M, Ruan Y, Koyama S, Clarke SL, Yang X, Tcheandjieu C, Agrawal S, Fahed AC, Ellinor PT, Tsao PS, Sun YV, Cho K, Wilson PWF, Assimes TL, van Heel DA, Butterworth AS, Aragam KG, Natarajan P, Khera AV. A multi-ancestry polygenic risk score improves risk prediction for coronary artery disease. Nat Med 2023; 29:1793-1803. [PMID: 37414900 PMCID: PMC10353935 DOI: 10.1038/s41591-023-02429-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 50.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 05/30/2023] [Indexed: 07/08/2023]
Abstract
Identification of individuals at highest risk of coronary artery disease (CAD)-ideally before onset-remains an important public health need. Prior studies have developed genome-wide polygenic scores to enable risk stratification, reflecting the substantial inherited component to CAD risk. Here we develop a new and significantly improved polygenic score for CAD, termed GPSMult, that incorporates genome-wide association data across five ancestries for CAD (>269,000 cases and >1,178,000 controls) and ten CAD risk factors. GPSMult strongly associated with prevalent CAD (odds ratio per standard deviation 2.14, 95% confidence interval 2.10-2.19, P < 0.001) in UK Biobank participants of European ancestry, identifying 20.0% of the population with 3-fold increased risk and conversely 13.9% with 3-fold decreased risk as compared with those in the middle quintile. GPSMult was also associated with incident CAD events (hazard ratio per standard deviation 1.73, 95% confidence interval 1.70-1.76, P < 0.001), identifying 3% of healthy individuals with risk of future CAD events equivalent to those with existing disease and significantly improving risk discrimination and reclassification. Across multiethnic, external validation datasets inclusive of 33,096, 124,467, 16,433 and 16,874 participants of African, European, Hispanic and South Asian ancestry, respectively, GPSMult demonstrated increased strength of associations across all ancestries and outperformed all available previously published CAD polygenic scores. These data contribute a new GPSMult for CAD to the field and provide a generalizable framework for how large-scale integration of genetic association data for CAD and related traits from diverse populations can meaningfully improve polygenic risk prediction.
Collapse
Affiliation(s)
- Aniruddh P Patel
- Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Minxian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China.
| | - Yunfeng Ruan
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Satoshi Koyama
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Veteran Affairs Boston Healthcare System, Boston, MA, USA
| | - Shoa L Clarke
- Stanford University School of Medicine, Palo Alto, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Xiong Yang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, China
| | | | - Saaket Agrawal
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Akl C Fahed
- Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Patrick T Ellinor
- Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Philip S Tsao
- Stanford University School of Medicine, Palo Alto, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Yan V Sun
- Veteran Affairs Atlanta Healthcare System, Decatur, GA, USA
| | - Kelly Cho
- Veteran Affairs Boston Healthcare System, Boston, MA, USA
| | | | - Themistocles L Assimes
- Stanford University School of Medicine, Palo Alto, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Adam S Butterworth
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, and Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Krishna G Aragam
- Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Pradeep Natarajan
- Division of Cardiology, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Amit V Khera
- Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- Verve Therapeutics, Boston, MA, USA.
| |
Collapse
|
74
|
Herrera-Rivero M, Gutiérrez-Fragoso K, Thalamuthu A, Amare AT, Adli M, Akiyama K, Akula N, Ardau R, Arias B, Aubry JM, Backlund L, Bellivier F, Benabarre A, Bengesser S, Abesh B, Biernacka J, Birner A, Cearns M, Cervantes P, Chen HC, Chillotti C, Cichon S, Clark S, Colom F, Cruceanu C, Czerski P, Dalkner N, Degenhardt F, Del Zompo M, DePaulo JR, Etain B, Falkai P, Ferensztajn-Rochowiak E, Forstner AJ, Frank J, Frisen L, Frye M, Fullerton J, Gallo C, Gard S, Garnham J, Goes F, Grigoroiu-Serbanescu M, Grof P, Hashimoto R, Hasler R, Hauser J, Heilbronner U, Herms S, Hoffmann P, Hou L, Hsu Y, Jamain S, Jiménez E, Kahn JP, Kassem L, Kato T, Kelsoe J, Kittel-Schneider S, Kuo PH, Kurtz J, Kusumi I, König B, Laje G, Landén M, Lavebratt C, Leboyer M, Leckband S, Maj M, Manchia M, Marie-Claire C, Martinsson L, McCarthy M, McElroy SL, Millischer V, Mitjans M, Mondimore F, Monteleone P, Nievergelt C, Novak T, Nöthen M, Odonovan C, Ozaki N, Papiol S, Pfennig A, Pisanu C, Potash J, Reif A, Reininghaus E, Richard-Lepouriel H, Roberts G, Rouleau G, Rybakowski JK, Schalling M, Schofield P, Schubert KO, Schulte E, Schweizer B, Severino G, Shekhtman T, Shilling P, Shimoda K, Simhandl C, Slaney C, Squassina A, Stamm T, Stopkova P, Streit F, Ayele F, Tortorella A, Turecki G, Veeh J, Vieta E, Viswanath B, Witt S, Zandi P, Alda M, Bauer M, McMahon F, Mitchell P, Rietschel M, Schulze T, Baune B. Immunogenetics of lithium response and psychiatric phenotypes in patients with bipolar disorder. RESEARCH SQUARE 2023:rs.3.rs-3068352. [PMID: 37461719 PMCID: PMC10350128 DOI: 10.21203/rs.3.rs-3068352/v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/25/2023]
Abstract
The link between bipolar disorder (BP) and immune dysfunction remains controversial. While epidemiological studies have long suggested an association, recent research has found only limited evidence of such a relationship. To clarify this, we investigated the contributions of immune-relevant genetic factors to the response to lithium (Li) treatment and the clinical presentation of BP. First, we assessed the association of a large collection of immune-related genes (4,925) with Li response, defined by the Retrospective Assessment of the Lithium Response Phenotype Scale (Alda scale), and clinical characteristics in patients with BP from the International Consortium on Lithium Genetics (ConLi+Gen, N = 2,374). Second, we calculated here previously published polygenic scores (PGSs) for immune-related traits and evaluated their associations with Li response and clinical features. We found several genes associated with Li response at p < 1×10- 4 values, including HAS3, CNTNAP5 and NFIB. Network and functional enrichment analyses uncovered an overrepresentation of pathways involved in cell adhesion and intercellular communication, which appear to converge on the well-known Li-induced inhibition of GSK-3β. We also found various genes associated with BP's age-at-onset, number of mood episodes, and presence of psychosis, substance abuse and/or suicidal ideation at the exploratory threshold. These included RTN4, XKR4, NRXN1, NRG1/3 and GRK5. Additionally, PGS analyses suggested serum FAS, ECP, TRANCE and cytokine ligands, amongst others, might represent potential circulating biomarkers of Li response and clinical presentation. Taken together, our results support the notion of a relatively weak association between immunity and clinically relevant features of BP at the genetic level.
Collapse
Affiliation(s)
| | | | | | | | | | - Kazufumi Akiyama
- Department of Biological Psychiatry and Neuroscience, Dokkyo Medical University
| | - Nirmala Akula
- National Institutes of Health, US Dept of Health & Human Services
| | | | - Bárbara Arias
- Facultat de Biologia and Institut de Biomedicina (IBUB), Universitat de Barcelona, CIBERSAM
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Urs Heilbronner
- Institute of Psychiatric Phenomics and Genomics, University Hospital, LMU Munich
| | | | | | - Liping Hou
- National Institute of Mental Health Intramural Research Program, National Institutes of Health
| | | | | | | | | | | | - Tadafumi Kato
- Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Brain Science Institute, Wako, Saitama 351-0198, Japan
| | | | | | - Po-Hsiu Kuo
- College of Public Health, National Taiwan University, Taipei, Taiwan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Marina Mitjans
- Max Planck Institute of Experimental Medicine, Göttingen, Germany
| | | | | | | | - Tomas Novak
- National Institute of Mental Health, Klecany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Thomas Stamm
- Charité - Universitätsmedizin Berlin, Campus Charité Mitte
| | | | | | | | | | - Gustavo Turecki
- Douglas Institute, Department of Psychiatry, McGill University
| | | | | | - Biju Viswanath
- National Institute of Mental Health and Neuro Sciences, Bengaluru, Karnataka, India
| | | | | | | | | | - Francis McMahon
- National Institute of Mental Health Intramural Research Program; National Institutes of Health
| | | | | | | | | |
Collapse
|
75
|
Li C, Pan Y, Zhang R, Huang Z, Li D, Han Y, Larkin C, Rao V, Sun X, Kelly TN. Genomic Innovation in Early Life Cardiovascular Disease Prevention and Treatment. Circ Res 2023; 132:1628-1647. [PMID: 37289909 PMCID: PMC10328558 DOI: 10.1161/circresaha.123.321999] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Cardiovascular disease (CVD) is a leading cause of morbidity and mortality globally. Although CVD events do not typically manifest until older adulthood, CVD develops gradually across the life-course, beginning with the elevation of risk factors observed as early as childhood or adolescence and the emergence of subclinical disease that can occur in young adulthood or midlife. Genomic background, which is determined at zygote formation, is among the earliest risk factors for CVD. With major advances in molecular technology, including the emergence of gene-editing techniques, along with deep whole-genome sequencing and high-throughput array-based genotyping, scientists now have the opportunity to not only discover genomic mechanisms underlying CVD but use this knowledge for the life-course prevention and treatment of these conditions. The current review focuses on innovations in the field of genomics and their applications to monogenic and polygenic CVD prevention and treatment. With respect to monogenic CVD, we discuss how the emergence of whole-genome sequencing technology has accelerated the discovery of disease-causing variants, allowing comprehensive screening and early, aggressive CVD mitigation strategies in patients and their families. We further describe advances in gene editing technology, which might soon make possible cures for CVD conditions once thought untreatable. In relation to polygenic CVD, we focus on recent innovations that leverage findings of genome-wide association studies to identify druggable gene targets and develop predictive genomic models of disease, which are already facilitating breakthroughs in the life-course treatment and prevention of CVD. Gaps in current research and future directions of genomics studies are also discussed. In aggregate, we hope to underline the value of leveraging genomics and broader multiomics information for characterizing CVD conditions, work which promises to expand precision approaches for the life-course prevention and treatment of CVD.
Collapse
Affiliation(s)
- Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Yang Pan
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Ruiyuan Zhang
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Zhijie Huang
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Davey Li
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Yunan Han
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Claire Larkin
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Varun Rao
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| | - Xiao Sun
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA (C. Li, R.Z., Z.H., X.S.)
| | - Tanika N Kelly
- Division of Nephrology, Department of Medicine, College of Medicine, University of Illinois Chicago (Y.P., D.L., Y.H., C.L., V.R., T.N.K.)
| |
Collapse
|
76
|
Smith JL, Tcheandjieu C, Dikilitas O, lyer K, Miyazawa K, Hilliard A, Lynch J, Rotter JI, Chen YDI, Sheu WHH, Chang KM, Kanoni S, Tsao P, Ito K, Kosel M, Clarke SL, Schaid DJ, Assimes TL, Kullo IJ. A Multi-Ancestry Polygenic Risk Score for Coronary Heart Disease Based on an Ancestrally Diverse Genome-Wide Association Study and Population-Specific Optimization. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.02.23290896. [PMID: 37609230 PMCID: PMC10441485 DOI: 10.1101/2023.06.02.23290896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
Background Predictive performance of polygenic risk scores (PRS) varies across populations. To facilitate equitable clinical use, we developed PRS for coronary heart disease (PRSCHD) for 5 genetic ancestry groups. Methods We derived ancestry-specific and multi-ancestry PRSCHD based on pruning and thresholding (PRSP+T) and continuous shrinkage priors (PRSCSx) applied on summary statistics from the largest multi-ancestry genome-wide meta-analysis for CHD to date, including 1.1 million participants from 5 continental populations. Following training and optimization of PRSCHD in the Million Veteran Program, we evaluated predictive performance of the best performing PRSCHD in 176,988 individuals across 9 cohorts of diverse genetic ancestry. Results Multi-ancestry PRSP+T outperformed ancestry specific PRSP+T across a range of tuning values. In training stage, for all ancestry groups, PRSCSx performed better than PRSP+T and multi-ancestry PRS outperformed ancestry-specific PRS. In independent validation cohorts, the selected multi-ancestry PRSP+T demonstrated the strongest association with CHD in individuals of South Asian (SAS) and European (EUR) ancestry (OR per 1SD[95% CI]; 2.75[2.41-3.14], 1.65[1.59-1.72]), followed by East Asian (EAS) (1.56[1.50-1.61]), Hispanic/Latino (HIS) (1.38[1.24-1.54]), and weakest in African (AFR) ancestry (1.16[1.11-1.21]). The selected multi-ancestry PRSCSx showed stronger associacion with CHD in comparison within each ancestry group where the association was strongest in SAS (2.67[2.38-3.00]) and EUR (1.65[1.59-1.71]), progressively decreasing in EAS (1.59[1.54-1.64]), HIS (1.51[1.35-1.69]), and lowest in AFR (1.20[1.15-1.26]). Conclusions Utilizing diverse summary statistics from a large multi-ancestry genome-wide meta-analysis led to improved performance of PRSCHD in most ancestry groups compared to single-ancestry methods. Improvement of predictive performance was limited, specifically in AFR and HIS, despite use of one of the largest and most diverse set of training and validation cohorts to date. This highlights the need for larger GWAS datasets of AFR and HIS individuals to enhance performance of PRSCHD.
Collapse
Affiliation(s)
- Johanna L. Smith
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Catherine Tcheandjieu
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
- Gladstone Institute of Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Ozan Dikilitas
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Kruthika lyer
- Stanford University School of Medicine, Palo Alto, CA, USA
| | - Kazuo Miyazawa
- Riken Ctr. for Integrative Medical Sciences, Yokohama City, Japan
| | - Austin Hilliard
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Stanford University School of Medicine, Palo Alto, CA, USA
| | - Julie Lynch
- Salt Lake City VA Met CTR., Salt Lake City, UT, USA
| | - Jerome I. Rotter
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Yii-Der Ida Chen
- Institute for Translational Genomics and Population Sciences, Department of Pediatrics, Lundquist Institute at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Wayne Huey-Herng Sheu
- Institute of Molecular and Genomic Medicine, National Health Research Institutes, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Kyong-Mi Chang
- Corporal Michael J Crescenz VA Medical Ctr. Philadelphia, PA, USA
| | | | - Phil Tsao
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Stanford University, Stanford, CA, USA
| | - Kaoru Ito
- Riken Ctr. for Integrative Medical Sciences, Yokohama City, Japan
| | - Matthew Kosel
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Shoa L. Clarke
- VA Palo Alto Health Care System, Palo Alto, CA, USA
- Stanford University, Stanford, CA, USA
| | - Daniel J. Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | - Iftikhar J. Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
77
|
Akhtari FS, Lloyd D, Burkholder A, Tong X, House JS, Lee EY, Buse J, Schurman SH, Fargo DC, Schmitt CP, Hall J, Motsinger-Reif AA. Questionnaire-Based Polyexposure Assessment Outperforms Polygenic Scores for Classification of Type 2 Diabetes in a Multiancestry Cohort. Diabetes Care 2023; 46:929-937. [PMID: 36383734 PMCID: PMC10154656 DOI: 10.2337/dc22-0295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 10/23/2022] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Environmental exposures may have greater predictive power for type 2 diabetes than polygenic scores (PGS). Studies examining environmental risk factors, however, have included only individuals with European ancestry, limiting the applicability of results. We conducted an exposome-wide association study in the multiancestry Personalized Environment and Genes Study to assess the effects of environmental factors on type 2 diabetes. RESEARCH DESIGN AND METHODS Using logistic regression for single-exposure analysis, we identified exposures associated with type 2 diabetes, adjusting for age, BMI, household income, and self-reported sex and race. To compare cumulative genetic and environmental effects, we computed an overall clinical score (OCS) as a weighted sum of BMI and prediabetes, hypertension, and high cholesterol status and a polyexposure score (PXS) as a weighted sum of 13 environmental variables. Using UK Biobank data, we developed a multiancestry PGS and calculated it for participants. RESULTS We found 76 significant associations with type 2 diabetes, including novel associations of asbestos and coal dust exposure. OCS, PXS, and PGS were significantly associated with type 2 diabetes. PXS had moderate power to determine associations, with larger effect size and greater power and reclassification improvement than PGS. For all scores, the results differed by race. CONCLUSIONS Our findings in a multiancestry cohort elucidate how type 2 diabetes odds can be attributed to clinical, genetic, and environmental factors and emphasize the need for exposome data in disease-risk association studies. Race-based differences in predictive scores highlight the need for genetic and exposome-wide studies in diverse populations.
Collapse
Affiliation(s)
- Farida S. Akhtari
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
- Clinical Research Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - Dillon Lloyd
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - Adam Burkholder
- Office of the Director, National Institute of Environmental Health Sciences, Durham, NC
| | - Xiaoran Tong
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - John S. House
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - Eunice Y. Lee
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - John Buse
- Department of Medicine, University of North Carolina, Chapel Hill, NC
| | - Shepherd H. Schurman
- Clinical Research Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - David C. Fargo
- Office of the Director, National Institute of Environmental Health Sciences, Durham, NC
| | - Charles P. Schmitt
- Office of Data Science, National Institute of Environmental Health Science, Durham, NC
| | - Janet Hall
- Clinical Research Branch, National Institute of Environmental Health Sciences, Durham, NC
| | - Alison A. Motsinger-Reif
- Biostatistics & Computational Biology Branch, National Institute of Environmental Health Sciences, Durham, NC
| |
Collapse
|
78
|
Clark R, Lee SSY, Du R, Wang Y, Kneepkens SCM, Charng J, Huang Y, Hunter ML, Jiang C, Tideman JWL, Melles RB, Klaver CCW, Mackey DA, Williams C, Choquet H, Ohno-Matsui K, Guggenheim JA. A new polygenic score for refractive error improves detection of children at risk of high myopia but not the prediction of those at risk of myopic macular degeneration. EBioMedicine 2023; 91:104551. [PMID: 37055258 PMCID: PMC10203044 DOI: 10.1016/j.ebiom.2023.104551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 03/17/2023] [Accepted: 03/17/2023] [Indexed: 04/15/2023] Open
Abstract
BACKGROUND High myopia (HM), defined as a spherical equivalent refractive error (SER) ≤ -6.00 diopters (D), is a leading cause of sight impairment, through myopic macular degeneration (MMD). We aimed to derive an improved polygenic score (PGS) for predicting children at risk of HM and to test if a PGS is predictive of MMD after accounting for SER. METHODS The PGS was derived from genome-wide association studies in participants of UK Biobank, CREAM Consortium, and Genetic Epidemiology Research on Adult Health and Aging. MMD severity was quantified by a deep learning algorithm. Prediction of HM was quantified as the area under the receiver operating curve (AUROC). Prediction of severe MMD was assessed by logistic regression. FINDINGS In independent samples of European, African, South Asian and East Asian ancestry, the PGS explained 19% (95% confidence interval 17-21%), 2% (1-3%), 8% (7-10%) and 6% (3-9%) of the variation in SER, respectively. The AUROC for HM in these samples was 0.78 (0.75-0.81), 0.58 (0.53-0.64), 0.71 (0.69-0.74) and 0.67 (0.62-0.72), respectively. The PGS was not associated with the risk of MMD after accounting for SER: OR = 1.07 (0.92-1.24). INTERPRETATION Performance of the PGS approached the level required for clinical utility in Europeans but not in other ancestries. A PGS for refractive error was not predictive of MMD risk once SER was accounted for. FUNDING Supported by the Welsh Government and Fight for Sight (24WG201).
Collapse
Affiliation(s)
- Rosie Clark
- School of Optometry & Vision Sciences, Cardiff University, Maindy Road, Cardiff, CF24 4HQ, UK
| | - Samantha Sze-Yee Lee
- University of Western Australia, Centre for Ophthalmology and Visual Science (incorporating the Lions Eye Institute), Perth, Western Australia, Australia
| | - Ran Du
- Department of Ophthalmology and Visual Science, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 1138510, Japan; Department of Ophthalmology, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing, 100045, China
| | - Yining Wang
- Department of Ophthalmology and Visual Science, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 1138510, Japan
| | - Sander C M Kneepkens
- Department of Ophthalmology, Erasmus University Medical Center, Rotterdam, the Netherlands; Department of Epidemiology, Erasmus University Medical Center, Rotterdam, the Netherlands; Generation R Study Group, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Jason Charng
- University of Western Australia, Centre for Ophthalmology and Visual Science (incorporating the Lions Eye Institute), Perth, Western Australia, Australia; Department of Optometry, School of Allied Health, University of Western Australia, Perth, Australia
| | - Yu Huang
- Department of Ophthalmology, Guangdong Eye Institute, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Michael L Hunter
- Busselton Health Study Centre, Busselton Population Medical Research Institute, Busselton, Western Australia; School of Population and Global Health, University of Western Australia, Perth, Western Australia
| | - Chen Jiang
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - J Willem L Tideman
- Department of Ophthalmology, Martini Hospital, Groningen, the Netherlands; Department of Ophthalmology, Erasmus University Medical Center, Rotterdam, the Netherlands
| | - Ronald B Melles
- Department of Ophthalmology Kaiser Permanente Northern California, Redwood City, CA, USA
| | - Caroline C W Klaver
- Department of Ophthalmology, Erasmus University Medical Center, Rotterdam, the Netherlands; Department of Epidemiology, Erasmus University Medical Center, Rotterdam, the Netherlands; Generation R Study Group, Erasmus University Medical Center, Rotterdam, the Netherlands; Institute of Molecular and Clinical Ophthalmology, Basel, Switzerland; Department of Ophthalmology, Radboud University Medical Center, Nijmegen, the Netherlands
| | - David A Mackey
- University of Western Australia, Centre for Ophthalmology and Visual Science (incorporating the Lions Eye Institute), Perth, Western Australia, Australia; Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, East Melbourne, Victoria, Australia; School of Medicine, Menzies Research Institute Tasmania, University of Tasmania, Hobart, Tasmania, Australia
| | - Cathy Williams
- Centre for Academic Child Health, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS81NU, UK
| | - Hélène Choquet
- Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
| | - Kyoko Ohno-Matsui
- Department of Ophthalmology and Visual Science, Tokyo Medical and Dental University, 1-5-45 Yushima, Bunkyo-ku, Tokyo, 1138510, Japan
| | - Jeremy A Guggenheim
- School of Optometry & Vision Sciences, Cardiff University, Maindy Road, Cardiff, CF24 4HQ, UK.
| |
Collapse
|
79
|
Liu J, Zhang C, Song J, Zhang Q, Zhang R, Zhang M, Han D, Tan W. Unlocking Genetic Profiles with a Programmable DNA-Powered Decoding Circuit. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023:e2206343. [PMID: 37116171 PMCID: PMC10369254 DOI: 10.1002/advs.202206343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 04/12/2023] [Indexed: 06/19/2023]
Abstract
Human genetic architecture provides remarkable insights into disease risk prediction and personalized medication. Advances in genomics have boosted the fine-mapping of disease-associated genetic variants across human genome. In healthcare practice, interpreting intricate genetic profiles into actionable medical decisions can improve health outcomes but remains challenging. Here an intelligent genetic decoder is engineered with programmable DNA computation to automate clinical analyses and interpretations. The DNA-based decoder recognizes multiplex genetic information by one-pot ligase-dependent reactions and interprets implicit genetic profiles into explicit decision reports. It is shown that the DNA decoder implements intended computation on genetic profiles and outputs a corresponding answer within hours. Effectiveness in 30 human genomic samples is validated and it is shown that it achieves desirable performance on the interpretation of CYP2C19 genetic profiles into drug responses, with accuracy equivalent to that of Sanger sequencing. Circuit modules of the DNA decoder can also be readily reprogrammed to interpret another pharmacogenetics genes, provide drug dosing recommendations, and implement reliable molecular calculation of polygenic risk score (PRS) and PRS-informed cancer risk assessment. The DNA-powered intelligent decoder provides a general solution to the translation of complex genetic profiles into actionable healthcare decisions and will facilitate personalized healthcare in primary care.
Collapse
Affiliation(s)
- Junlan Liu
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Chao Zhang
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jinxing Song
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qing Zhang
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Rongjun Zhang
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Mingzhi Zhang
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Da Han
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang, 310022, China
| | - Weihong Tan
- Institute of Molecular Medicine (IMM), Renji Hospital, School of Medicine, and College of Chemistry and Chemical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China
- The Key Laboratory of Zhejiang Province for Aptamers and Theranostics, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, Hangzhou, Zhejiang, 310022, China
- Molecular Science and Biomedicine Laboratory (MBL), State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, College of Biology, Aptamer Engineering Center of Hunan Province, Hunan University, Changsha, Hunan, 410082, China
| |
Collapse
|
80
|
Sullivan PF, Meadows JRS, Gazal S, Phan BN, Li X, Genereux DP, Dong MX, Bianchi M, Andrews G, Sakthikumar S, Nordin J, Roy A, Christmas MJ, Marinescu VD, Wang C, Wallerman O, Xue J, Yao S, Sun Q, Szatkiewicz J, Wen J, Huckins LM, Lawler A, Keough KC, Zheng Z, Zeng J, Wray NR, Li Y, Johnson J, Chen J, Paten B, Reilly SK, Hughes GM, Weng Z, Pollard KS, Pfenning AR, Forsberg-Nilsson K, Karlsson EK, Lindblad-Toh K. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science 2023; 380:eabn2937. [PMID: 37104612 PMCID: PMC10259825 DOI: 10.1126/science.abn2937] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 02/09/2023] [Indexed: 04/29/2023]
Abstract
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
Collapse
Affiliation(s)
- Patrick F. Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 17177 Stockholm, Sweden
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - BaDoi N. Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Xue Li
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Diane P. Genereux
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Sharadha Sakthikumar
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Jessika Nordin
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Ananya Roy
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
| | - Matthew J. Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Voichita D. Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Chao Wang
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
| | - James Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Shuyang Yao
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 17177 Stockholm, Sweden
| | - Quan Sun
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jin Szatkiewicz
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jia Wen
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Laura M. Huckins
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Alyssa Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kathleen C. Keough
- Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94158, USA
| | - Zhili Zheng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Jian Zeng
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Naomi R. Wray
- Institute for Molecular Bioscience, University of Queensland, Brisbane, QLD 4072, Australia
| | - Yun Li
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | - Jessica Johnson
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, Santa Cruz, CA 95064, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA 94158, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Karin Forsberg-Nilsson
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, 75185 Uppsala, Sweden
- Biodiscovery Institute, University of Nottingham, Nottingham NG7 2RD, UK
| | - Elinor K. Karlsson
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 75132 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| |
Collapse
|
81
|
Dron JS. The clinical utility of polygenic risk scores for combined hyperlipidemia. Curr Opin Lipidol 2023; 34:44-51. [PMID: 36602940 DOI: 10.1097/mol.0000000000000865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
PURPOSE OF REVIEW Combined hyperlipidemia is the most common lipid disorder and is strongly polygenic. Given its prevalence and associated risk for atherosclerotic cardiovascular disease, this review describes the potential for utilizing polygenic risk scores for risk prediction and management of combined hyperlipidemia. RECENT FINDINGS Different diagnostic criteria have led to inconsistent prevalence estimates and missed diagnoses. Given that individuals with combined hyperlipidemia have risk estimates for incident coronary artery disease similar to individuals with familial hypercholesterolemia, early identification and therapeutic management of those affected is crucial. With diagnostic criteria including traits such apolipoprotein B, low-density lipoprotein cholesterol, and triglyceride, polygenic risk scores for these traits strongly associate with combined hyperlipidemia and could be used in combination for clinical risk prediction models and developing specific treatment plans for patients. SUMMARY Polygenic risk scores are effective tools in risk prediction of combined hyperlipidemia, can provide insight into disease pathophysiology, and may be useful in managing and guiding treatment plans for patients. However, efforts to ensure equitable polygenic risk score performance across different genetic ancestry groups is necessary before clinical implementation in order to prevent the exacerbation of racial disparities in the clinic.
Collapse
Affiliation(s)
- Jacqueline S Dron
- Center for Genomic Medicine, Massachusetts General Hospital, Boston
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
82
|
Jeong R, Bulyk ML. Colocalization of blood cell traits GWAS associations and variation in PU.1 genomic occupancy prioritizes causal noncoding regulatory variants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.29.534582. [PMID: 37034747 PMCID: PMC10081269 DOI: 10.1101/2023.03.29.534582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Genome-wide association studies (GWAS) have uncovered numerous trait-associated loci across the human genome, most of which are located in noncoding regions, making interpretations difficult. Moreover, causal variants are hard to statistically fine-map at many loci because of widespread linkage disequilibrium. To address this challenge, we present a strategy utilizing transcription factor (TF) binding quantitative trait loci (bQTLs) for colocalization analysis to identify trait associations likely mediated by TF occupancy variation and to pinpoint likely causal variants using motif scores. We applied this approach to PU.1 bQTLs in lymphoblastoid cell lines and blood cell traits GWAS data. Colocalization analysis revealed 69 blood cell trait GWAS loci putatively driven by PU.1 occupancy variation. We nominate PU.1 motif-altering variants as the likely shared causal variants at 51 loci. Such integration of TF bQTL data with other GWAS data may reveal transcriptional regulatory mechanisms and causal noncoding variants underlying additional complex traits.
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
83
|
Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, van Heel DA, Wang Y, Martin AR, Lee SH, Natarajan P. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23286110. [PMID: 36865265 PMCID: PMC9980241 DOI: 10.1101/2023.02.21.23286110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
Polygenic risk scores (PRS) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. Validation and transferability of existing PRS across independent datasets and diverse ancestries are limited, which hinders the practical utility and exacerbates health disparities. We propose PRSmix, a framework that evaluates and leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture. We applied PRSmix to 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% CI: [1.10; 1.3]; P-value = 9.17 × 10-5) and 1.19-fold (95% CI: [1.11; 1.27]; P-value = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI: [1.40; 2.04]; P-value = 7.58 × 10-6) and 1.42-fold (95% CI: [1.25; 1.59]; P-value = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously established cross-trait-combination method with scores from pre-defined correlated traits, we demonstrated that our method can improve prediction accuracy for coronary artery disease up to 3.27-fold (95% CI: [2.1; 4.44]; P-value after FDR correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Collapse
Affiliation(s)
- Buu Truong
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Leland E. Hull
- Division of General Internal Medicine, 100 Cambridge Street,
Massachusetts General Hospital, Boston, MA, 02114
- Department of Medicine, Harvard Medical School, 25 Shattuck
Street, Boston, MA 02115
| | - Yunfeng Ruan
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute,
Cambridge, UK
| | - Whitney Hornsby
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Hilary Martin
- Department of Human Genetics, Wellcome Sanger Institute,
Cambridge, UK
| | - David A. van Heel
- Blizard Institute, Barts and the London School of Medicine and
Dentistry, Queen Mary University of London, London, UK
| | - Ying Wang
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Stanley Center for Psychiatric Research, Broad Institute of
Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA, USA
| | - Alicia R. Martin
- Stanley Center for Psychiatric Research, Broad Institute of
Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA, USA
| | - S. Hong Lee
- Australian Centre for Precision Health, University of South
Australia Cancer Research Institute, University of South Australia, Adelaide, SA, 5000,
Australia
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
- Department of Medicine, Harvard Medical School, 25 Shattuck
Street, Boston, MA 02115
| |
Collapse
|
84
|
Thareja G, Belkadi A, Arnold M, Albagha OME, Graumann J, Schmidt F, Grallert H, Peters A, Gieger C, Consortium TQGPR, Suhre K. Differences and commonalities in the genetic architecture of protein quantitative trait loci in European and Arab populations. Hum Mol Genet 2023; 32:907-916. [PMID: 36168886 PMCID: PMC9990988 DOI: 10.1093/hmg/ddac243] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 11/12/2022] Open
Abstract
Polygenic scores (PGS) can identify individuals at risk of adverse health events and guide genetics-based personalized medicine. However, it is not clear how well PGS translate between different populations, limiting their application to well-studied ethnicities. Proteins are intermediate traits linking genetic predisposition and environmental factors to disease, with numerous blood circulating protein levels representing functional readouts of disease-related processes. We hypothesized that studying the genetic architecture of a comprehensive set of blood-circulating proteins between a European and an Arab population could shed fresh light on the translatability of PGS to understudied populations. We therefore conducted a genome-wide association study with whole-genome sequencing data using 1301 proteins measured on the SOMAscan aptamer-based affinity proteomics platform in 2935 samples of Qatar Biobank and evaluated the replication of protein quantitative traits (pQTLs) from European studies in an Arab population. Then, we investigated the colocalization of shared pQTL signals between the two populations. Finally, we compared the performance of protein PGS derived from a Caucasian population in a European and an Arab cohort. We found that the majority of shared pQTL signals (81.8%) colocalized between both populations. About one-third of the genetic protein heritability was explained by protein PGS derived from a European cohort, with protein PGS performing ~20% better in Europeans when compared to Arabs. Our results are relevant for the translation of PGS to non-Caucasian populations, as well as for future efforts to extend genetic research to understudied populations.
Collapse
Affiliation(s)
- Gaurav Thareja
- Bioinformatics Core, Weill Cornell Medicine-Qatar, Education City, 24144 Doha, Qatar.,Department of Biophysics and Physiology, Weill Cornell Medicine, NY 10065, New York, USA
| | - Aziz Belkadi
- Bioinformatics Core, Weill Cornell Medicine-Qatar, Education City, 24144 Doha, Qatar.,Department of Biophysics and Physiology, Weill Cornell Medicine, NY 10065, New York, USA
| | - Matthias Arnold
- Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,Department of Psychiatry and Behavioral Sciences, Duke University, NC 27710, USA
| | - Omar M E Albagha
- College of Health and Life Sciences, Hamad Bin Khalifa University, 34110 Doha, Qatar.,Centre for Genomic and Experimental Medicine, Institute of Genetics and Cancer, University of Edinburgh, EH4 2XU, Edinburgh, UK
| | - Johannes Graumann
- Institute of Translational Proteomics, Department of Medicine, Philipps-Universität Marburg, Marburg, Germany
| | - Frank Schmidt
- Proteomics Core, Weill Cornell Medicine-Qatar, Education City, 24144 Doha, Qatar
| | - Harald Grallert
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, Neuherberg 85764, Germany
| | - Annette Peters
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany.,Department of Epidemiology, Institute for Medical Information Processing, Biometry, and Epidemiology, Ludwig-Maximilians-University Munich, 81377 Munich, Germany
| | - Christian Gieger
- Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg 85764, Germany.,German Center for Diabetes Research (DZD), Ingolstädter Landstraße 1, Neuherberg 85764, Germany
| | | | - Karsten Suhre
- Bioinformatics Core, Weill Cornell Medicine-Qatar, Education City, 24144 Doha, Qatar.,Department of Biophysics and Physiology, Weill Cornell Medicine, NY 10065, New York, USA
| |
Collapse
|
85
|
Tervo-Clemmens B, Marek S, Chauvin RJ, Van AN, Kay BP, Laumann TO, Thompson WK, Nichols TE, Yeo BTT, Barch DM, Luna B, Fair DA, Dosenbach NUF. Reply to: Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023; 615:E8-E12. [PMID: 36890374 PMCID: PMC9995264 DOI: 10.1038/s41586-023-05746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2023]
Affiliation(s)
- Brenden Tervo-Clemmens
- Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
| | - Scott Marek
- Department of Radiology, Washington University School of Medicine, St Louis, MO, USA.
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA.
| | - Roselyne J Chauvin
- Department of Neurology, Washington University School of Medicine, St Louis, MO, USA
| | - Andrew N Van
- Department of Neurology, Washington University School of Medicine, St Louis, MO, USA
- Department of Biomedical Engineering, Washington University in St Louis, St Louis, MO, USA
| | - Benjamin P Kay
- Department of Neurology, Washington University School of Medicine, St Louis, MO, USA
| | - Timothy O Laumann
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - Wesley K Thompson
- Division of Biostatistics, University of California San Diego, La Jolla, CA, USA
| | - Thomas E Nichols
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - B T Thomas Yeo
- Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore
- Centre for Sleep and Cognition, National University of Singapore, Singapore, Singapore
- Centre for Translational MR Research, National University of Singapore, Singapore, Singapore
- N.1 Institute for Health, Institute for Digital Medicine, National University of Singapore, Singapore, Singapore
- Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore
- Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA
| | - Deanna M Barch
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
- Department of Psychological and Brain Sciences, Washington University in St Louis, St Louis, MO, USA
| | - Beatriz Luna
- Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - Damien A Fair
- Masonic Institute for the Developing Brain, University of Minnesota Medical School, Minneapolis, MN, USA.
- Department of Pediatrics, University of Minnesota Medical School, Minneapolis, MN, USA.
- Institute of Child Development, University of Minnesota Medical School, Minneapolis, MN, USA.
| | - Nico U F Dosenbach
- Department of Radiology, Washington University School of Medicine, St Louis, MO, USA.
- Department of Neurology, Washington University School of Medicine, St Louis, MO, USA.
- Department of Biomedical Engineering, Washington University in St Louis, St Louis, MO, USA.
- Department of Psychological and Brain Sciences, Washington University in St Louis, St Louis, MO, USA.
- Program in Occupational Therapy, Washington University School of Medicine, St Louis, MO, USA.
- Department of Pediatrics, Washington University School of Medicine, St Louis, MO, USA.
| |
Collapse
|
86
|
Breedon JR, Marshall CR, Giovannoni G, van Heel DA, Dobson R, Jacobs BM. Polygenic risk score prediction of multiple sclerosis in individuals of South Asian ancestry. Brain Commun 2023; 5:fcad041. [PMID: 37006331 PMCID: PMC10053643 DOI: 10.1093/braincomms/fcad041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/12/2022] [Accepted: 02/21/2023] [Indexed: 02/24/2023] Open
Abstract
Polygenic risk scores aggregate an individual's burden of risk alleles to estimate the overall genetic risk for a specific trait or disease. Polygenic risk scores derived from genome-wide association studies of European populations perform poorly for other ancestral groups. Given the potential for future clinical utility, underperformance of polygenic risk scores in South Asian populations has the potential to reinforce health inequalities. To determine whether European-derived polygenic risk scores underperform at multiple sclerosis prediction in a South Asian-ancestry population compared with a European-ancestry cohort, we used data from two longitudinal genetic cohort studies: Genes & Health (2015-present), a study of ∼50 000 British-Bangladeshi and British-Pakistani individuals, and UK Biobank (2006-present), which is comprised of ∼500 000 predominantly White British individuals. We compared individuals with and without multiple sclerosis in both studies (Genes & Health: N Cases = 42, N Control = 40 490; UK Biobank: N Cases = 2091, N Control = 374 866). Polygenic risk scores were calculated using clumping and thresholding with risk allele effect sizes obtained from the largest multiple sclerosis genome-wide association study to date. Scores were calculated with and without the major histocompatibility complex region, the most influential locus in determining multiple sclerosis risk. Polygenic risk score prediction was evaluated using Nagelkerke's pseudo-R 2 metric adjusted for case ascertainment, age, sex and the first four genetic principal components. We found that, as expected, European-derived polygenic risk scores perform poorly in the Genes & Health cohort, explaining 1.1% (including the major histocompatibility complex) and 1.5% (excluding the major histocompatibility complex) of disease risk. In contrast, multiple sclerosis polygenic risk scores explained 4.8% (including the major histocompatibility complex) and 2.8% (excluding the major histocompatibility complex) of disease risk in European-ancestry UK Biobank participants. These findings suggest that polygenic risk score prediction of multiple sclerosis based on European genome-wide association study results is less accurate in a South Asian population. Genetic studies of ancestrally diverse populations are required to ensure that polygenic risk scores can be useful across ancestries.
Collapse
Affiliation(s)
- Joshua R Breedon
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
| | - Charles R Marshall
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| | - Gavin Giovannoni
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
- Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - David A van Heel
- Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - Ruth Dobson
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| | - Benjamin M Jacobs
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University of London, London EC1M 6BQ, UK
- Department of Neurology, Royal London Hospital, London E1 1FR, UK
| |
Collapse
|
87
|
Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics. Nat Commun 2023; 14:832. [PMID: 36788230 PMCID: PMC9929290 DOI: 10.1038/s41467-023-36544-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 02/07/2023] [Indexed: 02/16/2023] Open
Abstract
Polygenic risk scores (PRS) calculated from genome-wide association studies (GWAS) of Europeans are known to have substantially reduced predictive accuracy in non-European populations, limiting their clinical utility and raising concerns about health disparities across ancestral populations. Here, we introduce a statistical framework named X-Wing to improve predictive performance in ancestrally diverse populations. X-Wing quantifies local genetic correlations for complex traits between populations, employs an annotation-dependent estimation procedure to amplify correlated genetic effects between populations, and combines multiple population-specific PRS into a unified score with GWAS summary statistics alone as input. Through extensive benchmarking, we demonstrate that X-Wing pinpoints portable genetic effects and substantially improves PRS performance in non-European populations, showing 14.1%-119.1% relative gain in predictive R2 compared to state-of-the-art methods based on GWAS summary statistics. Overall, X-Wing addresses critical limitations in existing approaches and may have broad applications in cross-population polygenic risk prediction.
Collapse
|
88
|
Kafyra M, Kalafati IP, Dimitriou M, Grigoriou E, Kokkinos A, Rallidis L, Kolovou G, Trovas G, Marouli E, Deloukas P, Moulos P, Dedoussis GV. Robust Bioinformatics Approaches Result in the First Polygenic Risk Score for BMI in Greek Adults. J Pers Med 2023; 13:jpm13020327. [PMID: 36836561 PMCID: PMC9960517 DOI: 10.3390/jpm13020327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 01/29/2023] [Accepted: 02/10/2023] [Indexed: 02/17/2023] Open
Abstract
Quantifying the role of genetics via construction of polygenic risk scores (PRSs) is deemed a resourceful tool to enable and promote effective obesity prevention strategies. The present paper proposes a novel methodology for PRS extraction and presents the first PRS for body mass index (BMI) in a Greek population. A novel pipeline for PRS derivation was used to analyze genetic data from a unified database of three cohorts of Greek adults. The pipeline spans various steps of the process, from iterative dataset splitting to training and test partitions, calculation of summary statistics and PRS extraction, up to PRS aggregation and stabilization, achieving higher evaluation metrics. Using data from 2185 participants, implementation of the pipeline enabled consecutive repetitions in splitting training and testing samples and resulted in a 343-single nucleotide polymorphism PRS yielding an R2 = 0.3241 (beta = 1.011, p-value = 4 × 10-193) for BMI. PRS-included variants displayed a variety of associations with known traits (i.e., blood cell count, gut microbiome, lifestyle parameters). The proposed methodology led to creation of the first-ever PRS for BMI in Greek adults and aims at promoting a facilitating approach to reliable PRS development and integration in healthcare practice.
Collapse
Affiliation(s)
- Maria Kafyra
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece
| | - Ioanna Panagiota Kalafati
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece
- Department of Nutrition and Dietetics, School of Physical Education, Sport Science and Dietetics, University of Thessaly, 42132 Trikala, Greece
| | - Maria Dimitriou
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece
- Department of Nutritional Science and Dietetics, School of Health Science, University of the Peloponnese, Antikalamos, 24100 Kalamata, Greece
| | - Effimia Grigoriou
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece
| | - Alexandros Kokkinos
- First Department of Propaedeutic and Internal Medicine, Laiko General Hospital, Athens University Medical School, 11527 Athens, Greece
| | - Loukianos Rallidis
- Second Department of Cardiology, Medical School, National and Kapodistrian University of Athens, Attikon Hospital, 12462 Athens, Greece
| | - Genovefa Kolovou
- Cardiometabolic Center, Metropolitan Hospital, 18547 Piraeus, Greece
| | - Georgios Trovas
- Laboratory for the Research of Musculoskeletal System “Th. Garofalidis”, School of Medicine, National and Kapodistrian University of Athens, KAT General Hospital, Athinas 10th Str., 14561 Athens, Greece
| | - Eirini Marouli
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Panos Deloukas
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Panagiotis Moulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center ‘Alexander Fleming’, 16672 Vari, Greece
- Correspondence:
| | - George V. Dedoussis
- Department of Nutrition and Dietetics, School of Health Science and Education, Harokopio University, 17671 Athens, Greece
- Genome Analysis, 17671 Athens, Greece
| |
Collapse
|
89
|
Abstract
Polygenic scores quantify inherited risk by integrating information from many common sites of DNA variation into a single number. Rapid increases in the scale of genetic association studies and new statistical algorithms have enabled development of polygenic scores that meaningfully measure-as early as birth-risk of coronary artery disease. These newer-generation polygenic scores identify up to 8% of the population with triple the normal risk based on genetic variation alone, and these individuals cannot be identified on the basis of family history or clinical risk factors alone. For those identified with increased genetic risk, evidence supports risk reduction with at least two interventions, adherence to a healthy lifestyle and cholesterol-lowering therapies, that can substantially reduce risk. Alongside considerable enthusiasm for the potential of polygenic risk estimation to enable a new era of preventive clinical medicine is recognition of a need for ongoing research into how best to ensure equitable performance across diverse ancestries, how and in whom to assess the scores in clinical practice, as well as randomized trials to confirm clinical utility.
Collapse
Affiliation(s)
- Aniruddh P Patel
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Amit V Khera
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.,Verve Therapeutics, Cambridge, Massachusetts, USA
| |
Collapse
|
90
|
Wang Y, Namba S, Lopera E, Kerminen S, Tsuo K, Läll K, Kanai M, Zhou W, Wu KH, Favé MJ, Bhatta L, Awadalla P, Brumpton B, Deelen P, Hveem K, Lo Faro V, Mägi R, Murakami Y, Sanna S, Smoller JW, Uzunovic J, Wolford BN, Willer C, Gamazon ER, Cox NJ, Surakka I, Okada Y, Martin AR, Hirbo J. Global Biobank analyses provide lessons for developing polygenic risk scores across diverse cohorts. CELL GENOMICS 2023; 3:100241. [PMID: 36777179 PMCID: PMC9903818 DOI: 10.1016/j.xgen.2022.100241] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 08/28/2022] [Accepted: 12/03/2022] [Indexed: 01/06/2023]
Abstract
Polygenic risk scores (PRSs) have been widely explored in precision medicine. However, few studies have thoroughly investigated their best practices in global populations across different diseases. We here utilized data from Global Biobank Meta-analysis Initiative (GBMI) to explore methodological considerations and PRS performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRSs using pruning and thresholding (P + T) and PRS-continuous shrinkage (CS). For both methods, using a European-based linkage disequilibrium (LD) reference panel resulted in comparable or higher prediction accuracy compared with several other non-European-based panels. PRS-CS overall outperformed the classic P + T method, especially for endpoints with higher SNP-based heritability. Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma, which has known variation in disease prevalence across populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using GBMI resources and highlight the importance of best practices for PRS in the biobank-scale genomics era.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shinichi Namba
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
| | - Esteban Lopera
- Department of Genetics, UMCG, University of Groningen, Groningen, the Netherlands
| | - Sini Kerminen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kristi Läll
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wei Zhou
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kuan-Han Wu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48103, USA
| | | | - Laxmi Bhatta
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
| | - Philip Awadalla
- Ontario Institute for Cancer Research, Toronto, ON, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Ben Brumpton
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
- HUNT Research Centre, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7600 Levanger, Norway
- Clinic of Medicine, St. Olav’s Hospital, Trondheim University Hospital, 7030 Trondheim, Norway
| | - Patrick Deelen
- Department of Genetics, UMCG, University of Groningen, Groningen, the Netherlands
- Oncode Institute, Utrecht, the Netherlands
| | - Kristian Hveem
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
- HUNT Research Centre, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7600 Levanger, Norway
| | - Valeria Lo Faro
- Department of Ophthalmology, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Department of Clinical Genetics, Amsterdam University Medical Center (AMC), Amsterdam, the Netherlands
- Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Reedik Mägi
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Yoshinori Murakami
- Division of Molecular Pathology, Institute of Medical Science, the University of Tokyo, Tokyo, Japan
| | - Serena Sanna
- Department of Genetics, UMCG, University of Groningen, Groningen, the Netherlands
- Institute for Genetics and Biomedical Research (IRGB), National Research Council (CNR), 09100 Cagliari, Italy
| | - Jordan W. Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Brooke N. Wolford
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48103, USA
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
| | - Cristen Willer
- K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, 7030 Trondheim, Norway
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biostatistics and Center for Statistical Genetics, and Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Eric R. Gamazon
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Nancy J. Cox
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ida Surakka
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC) and Center for Infectious Disease Education and Research (CiDER), Osaka University, Suita 565-0871, Japan
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-0033, Japan
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Jibril Hirbo
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
91
|
Novembre J, Stein C, Asgari S, Gonzaga-Jauregui C, Landstrom A, Lemke A, Li J, Mighton C, Taylor M, Tishkoff S. Addressing the challenges of polygenic scores in human genetic research. Am J Hum Genet 2022; 109:2095-2100. [PMID: 36459976 PMCID: PMC9808501 DOI: 10.1016/j.ajhg.2022.10.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The genotyping of millions of human samples has made it possible to evaluate variants across the human genome for their possible association with risks for numerous diseases and other traits by using genome-wide association studies (GWASs). The associations between phenotype and genotype found in GWASs make possible the construction of polygenic scores (PGSs), which aim to predict a trait or disease outcome in an individual on the basis of their genotype (in the disease case, the term polygenic risk score [PRS] is often used). PGSs have shown promise for studying the biology of complex traits and as a tool for evaluating individual disease risks in clinical settings. Although the quantity and quality of data to compute PGSs are increasing, challenges remain in the technical aspects of developing PGSs and in the ethical and social issues that might arise from their use. This ASHG Guidance emphasizes three major themes for researchers working with or interested in the application of PGSs in their own research: (1) developing diverse research cohorts; (2) fostering robustness in the development, application, and interpretation of PGSs; and (3) improving the communication of PGS results and their implications to broad audiences.
Collapse
Affiliation(s)
- John Novembre
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Chicago, Chicago, IL, USA,Department of Ecology and Evolution, University of Chicago, Chicago, IL, USA,Corresponding author
| | - Catherine Stein
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA,Corresponding author
| | - Samira Asgari
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Claudia Gonzaga-Jauregui
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,International Laboratory for Human Genome Research, Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - Andrew Landstrom
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Pediatrics, Division of Cardiology, Duke University School of Medicine, Durham, NC, USA
| | - Amy Lemke
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Norton Children’s Research Institute, affiliated with the University of Louisville School of Medicine, Louisville, KY, USA
| | - Jun Li
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Chloe Mighton
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Genomics Health Services Research Program, St. Michael’s Hospital, Unity Health Toronto, Toronto, ON, Canada,Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Matthew Taylor
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Adult Medical Genetics Program, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Sarah Tishkoff
- Professional Practice and Social Implications Committee Polygenic Scores Guidance Writing Group, American Society of Human Genetics, Rockville MD, USA,Department of Genetics, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA,Department of Biology, Center for Global Genomics and Health Equity, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
92
|
Maj C, Staerk C, Borisov O, Klinkhammer H, Wai Yeung M, Krawitz P, Mayr A. Statistical learning for sparser fine-mapped polygenic models: The prediction of LDL-cholesterol. Genet Epidemiol 2022; 46:589-603. [PMID: 35938382 DOI: 10.1002/gepi.22495] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/10/2022]
Abstract
Polygenic risk scores quantify the individual genetic predisposition regarding a particular trait. We propose and illustrate the application of existing statistical learning methods to derive sparser models for genome-wide data with a polygenic signal. Our approach is based on three consecutive steps. First, potentially informative loci are identified by a marginal screening approach. Then, fine-mapping is independently applied for blocks of variants in linkage disequilibrium, where informative variants are retrieved by using variable selection methods including boosting with probing and stochastic searches with the Adaptive Subspace method. Finally, joint prediction models with the selected variants are derived using statistical boosting. In contrast to alternative approaches relying on univariate summary statistics from genome-wide association studies, our three-step approach enables to select and fit multivariable regression models on large-scale genotype data. Based on UK Biobank data, we develop prediction models for LDL-cholesterol as a continuous trait. Additionally, we consider a recent scalable algorithm for the Lasso. Results show that statistical learning approaches based on fine-mapping of genetic signals result in a competitive prediction performance compared to classical polygenic risk approaches, while yielding sparser risk models.
Collapse
Affiliation(s)
- Carlo Maj
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| | - Christian Staerk
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Oleg Borisov
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Ming Wai Yeung
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Department of Cardiology, University of Groningen, Groningen, The Netherlands
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| |
Collapse
|
93
|
McDonald MLN, Lakshman Kumar P, Srinivasasainagendra V, Nair A, Rocco AP, Wilson AC, Chiles JW, Richman JS, Pinson SA, Dennis RA, Jagadale V, Brown CJ, Pyarajan S, Tiwari HK, Bamman MM, Singh JA. Novel genetic loci associated with osteoarthritis in multi-ancestry analyses in the Million Veteran Program and UK Biobank. Nat Genet 2022; 54:1816-1826. [PMID: 36411363 DOI: 10.1038/s41588-022-01221-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 10/05/2022] [Indexed: 11/22/2022]
Abstract
Osteoarthritis is a common progressive joint disease. As no effective medical interventions are available, osteoarthritis often progresses to the end stage, in which only surgical options such as total joint replacement are available. A more thorough understanding of genetic influences of osteoarthritis is essential to develop targeted personalized approaches to treatment, ideally long before the end stage is reached. To date, there have been no large multiancestry genetic studies of osteoarthritis. Here, we leveraged the unique resources of 484,374 participants in the Million Veteran Program and UK Biobank to address this gap. Analyses included participants of European, African, Asian and Hispanic descent. We discovered osteoarthritis-associated genetic variation at 10 loci and replicated findings from previous osteoarthritis studies. We also present evidence that some osteoarthritis-associated regions are robust to population ancestry. Drug repurposing analyses revealed enrichment of targets of several medication classes and provide potential insight into the etiology of beneficial effects of antiepileptics on osteoarthritis pain.
Collapse
Affiliation(s)
- Merry-Lynn N McDonald
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA.
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA.
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA.
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA.
| | - Preeti Lakshman Kumar
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
| | - Vinodh Srinivasasainagendra
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ashwathy Nair
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
| | - Alison P Rocco
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
| | - Ava C Wilson
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Joe W Chiles
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
| | - Joshua S Richman
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Department of Surgery, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sarah A Pinson
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Medicine, School of Medicine, University of Alabama at Birmingham (UAB), Birmingham, AL, USA
| | - Richard A Dennis
- Central Arkansas Veterans Healthcare System (CAVHS), Little Rock, AR, USA
| | - Vivek Jagadale
- Central Arkansas Veterans Healthcare System (CAVHS), Little Rock, AR, USA
| | - Cynthia J Brown
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Department of Medicine, Louisiana State University Health Sciences Center, New Orleans, LA, USA
| | - Saiju Pyarajan
- Center for Data and Computational Sciences (C-DACS), Veterans Affairs Boston Healthcare System (VABHS), Boston, MA, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Marcas M Bamman
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Department of Cell, Developmental, and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Florida Institute for Human & Machine Cognition, Pensacola, FL, USA
| | - Jasvinder A Singh
- Birmingham Veterans Affairs Health Care System (BVAHCS), Birmingham, AL, USA
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
- Division of Rheumatology and Clinical Immunology, Department of Medicine at the School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| |
Collapse
|
94
|
Pham D, Truong B, Tran K, Ni G, Nguyen D, Tran TTH, Tran MH, Nguyen Thuy D, Vo NS, Nguyen Q. Assessing polygenic risk score models for applications in populations with under-represented genomics data: an example of Vietnam. Brief Bioinform 2022; 23:6793778. [PMID: 36326078 PMCID: PMC9677487 DOI: 10.1093/bib/bbac459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 08/24/2022] [Accepted: 09/25/2022] [Indexed: 11/05/2022] Open
Abstract
Most polygenic risk score (PRS)models have been based on data from populations of European origins (accounting for the majority of the large genomics datasets, e.g. >78% in the UK Biobank and >85% in the GTEx project). Although several large-scale Asian biobanks were initiated (e.g. Japanese, Korean, Han Chinese biobanks), most other Asian countries have little or near-zero genomics data. To implement PRS models for under-represented populations, we explored transfer learning approaches, assuming that information from existing large datasets can compensate for the small sample size that can be feasibly obtained in developing countries, like Vietnam. Here, we benchmark 13 common PRS methods in meta-population strategy (combining individual genotype data from multiple populations) and multi-population strategy (combining summary statistics from multiple populations). Our results highlight the complementarity of different populations and the choice of methods should depend on the target population. Based on these results, we discussed a set of guidelines to help users select the best method for their datasets. We developed a robust and comprehensive software to allow for benchmarking comparisons between methods and proposed a computational framework for improving PRS performance in a dataset with a small sample size. This work is expected to inform the development of genomics applications in under-represented populations. PRSUP framework is available at: https://github.com/BiomedicalMachineLearning/VGP.
Collapse
Affiliation(s)
- Duy Pham
- Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia
| | - Buu Truong
- UniSA STEM, University of South Australia, Mawson Lakes, 5095, South Australia, Australia
| | - Khai Tran
- Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam
| | - Guiyan Ni
- Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia
| | - Dat Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam
| | | | | | | | - Nam S Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, 458 Minh Khai , 10000, Hanoi, Vietnam
| | - Quan Nguyen
- Institute for Molecular Bioscience, The University of Queensland, Carmody Rd, 4072, Queensland, Australia
| |
Collapse
|
95
|
Widen E, Lello L, Raben TG, Tellier LCAM, Hsu SDH. Polygenic Health Index, General Health, and Pleiotropy: Sibling Analysis and Disease Risk Reduction. Sci Rep 2022; 12:18173. [PMID: 36307513 PMCID: PMC9616929 DOI: 10.1038/s41598-022-22637-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Accepted: 10/18/2022] [Indexed: 12/31/2022] Open
Abstract
We construct a polygenic health index as a weighted sum of polygenic risk scores for 20 major disease conditions, including, e.g., coronary artery disease, type 1 and 2 diabetes, schizophrenia, etc. Individual weights are determined by population-level estimates of impact on life expectancy. We validate this index in odds ratios and selection experiments using unrelated individuals and siblings (pairs and trios) from the UK Biobank. Individuals with higher index scores have decreased disease risk across almost all 20 diseases (no significant risk increases), and longer calculated life expectancy. When estimated Disability Adjusted Life Years (DALYs) are used as the performance metric, the gain from selection among ten individuals (highest index score vs average) is found to be roughly 4 DALYs. We find no statistical evidence for antagonistic trade-offs in risk reduction across these diseases. Correlations between genetic disease risks are found to be mostly positive and generally mild. These results have important implications for public health and also for fundamental issues such as pleiotropy and genetic architecture of human disease conditions.
Collapse
Affiliation(s)
- Erik Widen
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA. .,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA.
| | - Louis Lello
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA. .,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA.
| | - Timothy G Raben
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA
| | - Laurent C A M Tellier
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA.,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA
| | - Stephen D H Hsu
- Department of Physics and Astronomy, Michigan State University, 567 Wilson Rd, East Lansing, MI, 48824, USA.,Genomic Prediction, Inc., 671 US Highway One, North Brunswick, NJ, 08902, USA
| |
Collapse
|
96
|
Jacobs BM, Peter M, Giovannoni G, Noyce AJ, Morris HR, Dobson R. Towards a global view of multiple sclerosis genetics. Nat Rev Neurol 2022; 18:613-623. [PMID: 36075979 DOI: 10.1038/s41582-022-00704-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2022] [Indexed: 11/09/2022]
Abstract
Multiple sclerosis (MS) is a neuroimmunological disorder of the CNS with a strong heritable component. The genetic architecture of MS susceptibility is well understood in populations of European ancestry. However, the extent to which this architecture explains MS susceptibility in populations of non-European ancestry remains unclear. In this Perspective article, we outline the scientific arguments for studying MS genetics in ancestrally diverse populations. We argue that this approach is likely to yield insights that could benefit individuals with MS from all ancestral groups. We explore the logistical and theoretical challenges that have held back this field to date and conclude that, despite these challenges, inclusion of participants of non-European ancestry in MS genetics studies will ultimately be of value to all patients with MS worldwide.
Collapse
Affiliation(s)
- Benjamin Meir Jacobs
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University London, London, UK. .,Department of Neurology, Royal London Hospital, London, UK.
| | - Michelle Peter
- NHS North Thames Genomic Laboratory Hub, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Gavin Giovannoni
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University London, London, UK.,Department of Neurology, Royal London Hospital, London, UK.,Blizard Institute, Queen Mary University London, London, UK
| | - Alastair J Noyce
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University London, London, UK.,Department of Neurology, Royal London Hospital, London, UK.,Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Huw R Morris
- Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Ruth Dobson
- Preventive Neurology Unit, Wolfson Institute of Population Health, Queen Mary University London, London, UK.,Department of Neurology, Royal London Hospital, London, UK
| |
Collapse
|
97
|
Atkinson EG, Bianchi SB, Ye GY, Martínez-Magaña JJ, Tietz GE, Montalvo-Ortiz JL, Giusti-Rodriguez P, Palmer AA, Sanchez-Roige S. Cross-ancestry genomic research: time to close the gap. Neuropsychopharmacology 2022; 47:1737-1738. [PMID: 35739257 PMCID: PMC9372026 DOI: 10.1038/s41386-022-01365-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Revised: 05/27/2022] [Accepted: 06/10/2022] [Indexed: 12/23/2022]
Affiliation(s)
- Elizabeth G Atkinson
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Sevim B Bianchi
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Gordon Y Ye
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - José Jaime Martínez-Magaña
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- VA CT Healthcare Center, West Haven, CT, USA
| | - Grace E Tietz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Janitza L Montalvo-Ortiz
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- VA CT Healthcare Center, West Haven, CT, USA
- US Department of Veterans Affairs National Center of Posttraumatic Stress Disorder, Clinical Neurosciences Division, West Haven, CT, USA
| | - Paola Giusti-Rodriguez
- Department of Psychiatry, University of Florida College of Medicine, Gainesville, FL, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
98
|
Tian P, Chan TH, Wang YF, Yang W, Yin G, Zhang YD. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front Genet 2022; 13:906965. [PMID: 36061179 PMCID: PMC9438789 DOI: 10.3389/fgene.2022.906965] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/27/2022] [Indexed: 11/28/2022] Open
Abstract
Polygenic risk scores (PRS) leverage the genetic contribution of an individual’s genotype to a complex trait by estimating disease risk. Traditional PRS prediction methods are predominantly for the European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting a transfer learning framework to learn useful knowledge from the European population to correct the bias for non-European populations. We considered non-European GWAS data as the target data and European GWAS data as the informative auxiliary data. TL-Multi borrows useful information from the auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict the risk of systemic lupus erythematosus (SLE) in the Asian population and the risk of asthma in the Indian population by borrowing information from the European population. TL-Multi achieved better prediction accuracy than the competing methods, including Lassosum and meta-analysis in both simulations and real applications.
Collapse
Affiliation(s)
- Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Tsai Hor Chan
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yong-Fei Wang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Yan Dora Zhang,
| |
Collapse
|
99
|
Khan AT, Gogarten SM, McHugh CP, Stilp AM, Sofer T, Bowers ML, Wong Q, Cupples LA, Hidalgo B, Johnson AD, McDonald MLN, McGarvey ST, Taylor MR, Fullerton SM, Conomos MP, Nelson SC. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: Experiences from the NHLBI TOPMed program. CELL GENOMICS 2022; 2:100155. [PMID: 36119389 PMCID: PMC9481067 DOI: 10.1016/j.xgen.2022.100155] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
How race, ethnicity, and ancestry are used in genomic research has wide-ranging implications for how research is translated into clinical care and incorporated into public understanding. Correlation between race and genetic ancestry contributes to unresolved complexity for the scientific community, as illustrated by heterogeneous definitions and applications of these variables. Here, we offer commentary and recommendations on the use of race, ethnicity, and ancestry across the arc of genetic research, including data harmonization, analysis, and reporting. While informed by our experiences as researchers affiliated with the NHLBI Trans-Omics for Precision Medicine (TOPMed) program, these recommendations are applicable to basic and translational genomic research in diverse populations with genome-wide data. Moving forward, considerable collaborative effort will be required to ensure that race, ethnicity, and ancestry are described and used appropriately to generate scientific knowledge that yields broad and equitable benefit.
Collapse
Affiliation(s)
- Alyna T. Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| | | | - Caitlin P. McHugh
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Adrienne M. Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Tamar Sofer
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
| | - Michael L. Bowers
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Quenna Wong
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - L. Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Bertha Hidalgo
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Andrew D. Johnson
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Framingham, MA, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Merry-Lynn N. McDonald
- Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Stephen T. McGarvey
- Department of Epidemiology and International Health Institute, Brown University School of Public Health, Providence, RI, USA
- Department of Anthropology, Brown University, Providence, RI, USA
| | - Matthew R.G. Taylor
- Department of Medicine, Adult Medical Genetics Program, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | | | - Sarah C. Nelson
- Department of Biostatistics, University of Washington, Seattle, WA, USA
- Institute for Public Health Genetics, University of Washington, Seattle, WA, USA
| |
Collapse
|
100
|
Huang QQ, Sallah N, Dunca D, Trivedi B, Hunt KA, Hodgson S, Lambert SA, Arciero E, Wright J, Griffiths C, Trembath RC, Hemingway H, Inouye M, Finer S, van Heel DA, Lumbers RT, Martin HC, Kuchenbaecker K. Transferability of genetic loci and polygenic scores for cardiometabolic traits in British Pakistani and Bangladeshi individuals. Nat Commun 2022; 13:4664. [PMID: 35945198 PMCID: PMC9363492 DOI: 10.1038/s41467-022-32095-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 07/15/2022] [Indexed: 12/30/2022] Open
Abstract
Individuals with South Asian ancestry have a higher risk of heart disease than other groups but have been largely excluded from genetic research. Using data from 22,000 British Pakistani and Bangladeshi individuals with linked electronic health records from the Genes & Health cohort, we conducted genome-wide association studies of coronary artery disease and its key risk factors. Using power-adjusted transferability ratios, we found evidence for transferability for the majority of cardiometabolic loci powered to replicate. The performance of polygenic scores was high for lipids and blood pressure, but lower for BMI and coronary artery disease. Adding a polygenic score for coronary artery disease to clinical risk factors showed significant improvement in reclassification. In Mendelian randomisation using transferable loci as instruments, our findings were consistent with results in European-ancestry individuals. Taken together, trait-specific transferability of trait loci between populations is an important consideration with implications for risk prediction and causal inference.
Collapse
Affiliation(s)
- Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Neneh Sallah
- Institute of Health Informatics, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Diana Dunca
- Institute of Health Informatics, University College London, London, UK
- UCL Genetics Institute, University College London, London, UK
| | - Bhavi Trivedi
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Karen A Hunt
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Sam Hodgson
- Primary Care Research Centre, University of Southampton, Southampton, UK
| | - Samuel A Lambert
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
| | - Elena Arciero
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford, UK
| | - Chris Griffiths
- Institute of Population Health Sciences, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Richard C Trembath
- Department of Medical and Molecular Genetics, King's College London, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, London, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals Biomedical Research Centre (UCLH BRC), London, UK
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Cambridge Centre of Research Excellence, Department of Clinical Medicine, University of Cambridge, Cambridge, UK
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Sarah Finer
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - R Thomas Lumbers
- Institute of Health Informatics, University College London, London, UK
- University College London Hospitals Biomedical Research Centre (UCLH BRC), London, UK
- British Heart Foundation Research Accelerator, University College London, London, UK
| | - Hilary C Martin
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Karoline Kuchenbaecker
- UCL Genetics Institute, University College London, London, UK.
- Division of Psychiatry, University College London, London, UK.
| |
Collapse
|