1
|
Kunkel D, Sørensen P, Shankar V, Morgante F. Improving polygenic prediction from summary data by learning patterns of effect sharing across multiple phenotypes. PLoS Genet 2025; 21:e1011519. [PMID: 39775068 DOI: 10.1371/journal.pgen.1011519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Polygenic prediction of complex trait phenotypes has become important in human genetics, especially in the context of precision medicine. Recently, mr.mash, a flexible and computationally efficient method that models multiple phenotypes jointly and leverages sharing of effects across such phenotypes to improve prediction accuracy, was introduced. However, a drawback of mr.mash is that it requires individual-level data, which are often not publicly available. In this work, we introduce mr.mash-rss, an extension of the mr.mash model that requires only summary statistics from Genome-Wide Association Studies (GWAS) and linkage disequilibrium (LD) estimates from a reference panel. By using summary data, we achieve the twin goal of increasing the applicability of the mr.mash model to data sets that are not publicly available and making it scalable to biobank-size data. Through simulations, we show that mr.mash-rss is competitive with, and often outperforms, current state-of-the-art methods for single- and multi-phenotype polygenic prediction in a variety of scenarios that differ in the pattern of effect sharing across phenotypes, the number of phenotypes, the number of causal variants, and the genomic heritability. We also present a real data analysis of 16 blood cell phenotypes in the UK Biobank, showing that mr.mash-rss achieves higher prediction accuracy than competing methods for the majority of traits, especially when the data set has smaller sample size.
Collapse
Affiliation(s)
- Deborah Kunkel
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, South Carolina, United States of America
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Vijay Shankar
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
| |
Collapse
|
2
|
Morrison J. GWASBrewer: An R Package for Simulating Realistic GWAS Summary Statistics. Genet Epidemiol 2025; 49:e22594. [PMID: 39370594 DOI: 10.1002/gepi.22594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/11/2024] [Accepted: 09/03/2024] [Indexed: 10/08/2024]
Abstract
Many statistical genetics analysis methods make use of GWAS summary statistics. Best statistical practice requires evaluating these methods in realistic simulation experiments. However, simulating summary statistics by first simulating individual genotype and phenotype data is extremely computationally demanding. This high cost may force researchers to conduct overly simplistic simulations that fail to accurately measure method performance. Alternatively, summary statistics can be simulated directly from their theoretical distribution. Although this is a common need among statistical genetics researchers, no software packages exist for comprehensive GWAS summary statistic simulation. We present GWASBrewer, an open source R package for direct simulation of GWAS summary statistics. We show that statistics simulated by GWASBrewer have the same distribution as statistics generated from individual level data, and can be produced at a fraction of the computational expense. Additionally, GWASBrewer can simulate standard error estimates, something that is typically not done when sampling summary statistics directly. GWASBrewer is highly flexible, allowing the user to simulate data for multiple traits connected by causal effects and with complex distributions of effect sizes. We demonstrate example uses of GWASBrewer for evaluating Mendelian randomization, polygenic risk score, and heritability estimation methods.
Collapse
Affiliation(s)
- Jean Morrison
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
3
|
Zhao Z, Dorn S, Wu Y, Yang X, Jin J, Lu Q. One score to rule them all: regularized ensemble polygenic risk prediction with GWAS summary statistics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.11.27.625748. [PMID: 39677614 PMCID: PMC11642782 DOI: 10.1101/2024.11.27.625748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Ensemble learning has been increasingly popular for boosting the predictive power of polygenic risk scores (PRS), with almost every recent multi-ancestry PRS approach employing ensemble learning as a final step. Existing ensemble approaches rely on individual-level data for model training, which severely limits their real-world applications, especially in non-European populations without sufficient genomic samples. Here, we introduce a statistical framework to construct regularized ensemble PRS, which allows us to combine a large number of candidate PRS models using only summary statistics from genome-wide association studies. We demonstrate its robust and substantial improvement over many existing PRS models in both within- and cross-ancestry applications. We believe this is truly "one score to rule them all" due to its capability to continuously combine newly developed PRS models with existing models to improve prediction performance, which makes it a universal approach that should always be employed in future PRS applications.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Stephen Dorn
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Xiaoyu Yang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
| | - Jin Jin
- Department of Biostatistics, Epidemiology and Bioinformatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI
- Department of Statistics, University of Wisconsin-Madison, Madison, WI
| |
Collapse
|
4
|
Zhao Z, Gruenloh T, Yan M, Wu Y, Sun Z, Miao J, Wu Y, Song J, Lu Q. Optimizing and benchmarking polygenic risk scores with GWAS summary statistics. Genome Biol 2024; 25:260. [PMID: 39379999 PMCID: PMC11462675 DOI: 10.1186/s13059-024-03400-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 09/23/2024] [Indexed: 10/10/2024] Open
Abstract
BACKGROUND Polygenic risk score (PRS) is a major research topic in human genetics. However, a significant gap exists between PRS methodology and applications in practice due to often unavailable individual-level data for various PRS tasks including model fine-tuning, benchmarking, and ensemble learning. RESULTS We introduce an innovative statistical framework to optimize and benchmark PRS models using summary statistics of genome-wide association studies. This framework builds upon our previous work and can fine-tune virtually all existing PRS models while accounting for linkage disequilibrium. In addition, we provide an ensemble learning strategy named PUMAS-ensemble to combine multiple PRS models into an ensemble score without requiring external data for model fitting. Through extensive simulations and analysis of many complex traits in the UK Biobank, we demonstrate that this approach closely approximates gold-standard analytical strategies based on external validation, and substantially outperforms state-of-the-art PRS methods. CONCLUSIONS Our method is a powerful and general modeling technique that can continue to combine the best-performing PRS methods out there through ensemble learning and could become an integral component for all future PRS applications.
Collapse
Affiliation(s)
- Zijie Zhao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Tim Gruenloh
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Meiyi Yan
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA
| | - Yixuan Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Zhongxuan Sun
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
| | - Yuchang Wu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Jie Song
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA.
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, USA.
- Center for Demography of Health and Aging, University of Wisconsin-Madison, Madison, WI, USA.
| |
Collapse
|
5
|
Sharew NT, Clark SR, Schubert KO, Amare AT. Pharmacogenomic scores in psychiatry: systematic review of current evidence. Transl Psychiatry 2024; 14:322. [PMID: 39107294 PMCID: PMC11303815 DOI: 10.1038/s41398-024-02998-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 06/21/2024] [Accepted: 06/27/2024] [Indexed: 08/10/2024] Open
Abstract
In the past two decades, significant progress has been made in the development of polygenic scores (PGSs). One specific application of PGSs is the development and potential use of pharmacogenomic- scores (PGx-scores) to identify patients who can benefit from a specific medication or are likely to experience side effects. This systematic review comprehensively evaluates published PGx-score studies in psychiatry and provides insights into their potential clinical use and avenues for future development. A systematic literature search was conducted across PubMed, EMBASE, and Web of Science databases until 22 August 2023. This review included fifty-three primary studies, of which the majority (69.8%) were conducted using samples of European ancestry. We found that over 90% of PGx-scores in psychiatry have been developed based on psychiatric and medical diagnoses or trait variants, rather than pharmacogenomic variants. Among these PGx-scores, the polygenic score for schizophrenia (PGSSCZ) has been most extensively studied in relation to its impact on treatment outcomes (32 publications). Twenty (62.5%) of these studies suggest that individuals with higher PGSSCZ have negative outcomes from psychotropic treatment - poorer treatment response, higher rates of treatment resistance, more antipsychotic-induced side effects, or more psychiatric hospitalizations, while the remaining studies did not find significant associations. Although PGx-scores alone accounted for at best 5.6% of the variance in treatment outcomes (in schizophrenia treatment resistance), together with clinical variables they explained up to 13.7% (in bipolar lithium response), suggesting that clinical translation might be achieved by including PGx-scores in multivariable models. In conclusion, our literature review found that there are still very few studies developing PGx-scores using pharmacogenomic variants. Research with larger and diverse populations is required to develop clinically relevant PGx-scores, using biology-informed and multi-phenotypic polygenic scoring approaches, as well as by integrating clinical variables with these scores to facilitate their translation to psychiatric practice.
Collapse
Affiliation(s)
- Nigussie T Sharew
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- Asrat Woldeyes Health Science Campus, Debre Berhan University, Debre Berhan, Ethiopia
| | - Scott R Clark
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
| | - K Oliver Schubert
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia
- Division of Mental Health, Northern Adelaide Local Health Network, SA Health, Adelaide, Australia
- Headspace Adelaide Early Psychosis - Sonder, Adelaide, SA, Australia
| | - Azmeraw T Amare
- Discipline of Psychiatry, Adelaide Medical School, The University of Adelaide, Adelaide, SA, Australia.
| |
Collapse
|
6
|
Kelemen M, Vigorito E, Fachal L, Anderson CA, Wallace C. shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores. Am J Hum Genet 2024; 111:1006-1017. [PMID: 38703768 PMCID: PMC11179256 DOI: 10.1016/j.ajhg.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/06/2024] Open
Abstract
We present shaPRS, a method that leverages widespread pleiotropy between traits or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of polygenic risk scores (PRSs) for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method, and as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.
Collapse
Affiliation(s)
- Martin Kelemen
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK; Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK.
| | - Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Laura Fachal
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | | | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK; MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
7
|
Xiang R, Liu Y, Ben-Eghan C, Ritchie S, Lambert SA, Xu Y, Takeuchi F, Inouye M. Genome-wide analyses of variance in blood cell phenotypes provide new insights into complex trait biology and prediction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.15.24305830. [PMID: 38699308 PMCID: PMC11065006 DOI: 10.1101/2024.04.15.24305830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
Blood cell phenotypes are routinely tested in healthcare to inform clinical decisions. Genetic variants influencing mean blood cell phenotypes have been used to understand disease aetiology and improve prediction; however, additional information may be captured by genetic effects on observed variance. Here, we mapped variance quantitative trait loci (vQTL), i.e. genetic loci associated with trait variance, for 29 blood cell phenotypes from the UK Biobank (N~408,111). We discovered 176 independent blood cell vQTLs, of which 147 were not found by additive QTL mapping. vQTLs displayed on average 1.8-fold stronger negative selection than additive QTL, highlighting that selection acts to reduce extreme blood cell phenotypes. Variance polygenic scores (vPGSs) were constructed to stratify individuals in the INTERVAL cohort (N~40,466), where genetically less variable individuals (low vPGS) had increased conventional PGS accuracy (by ~19%) than genetically more variable individuals. Genetic prediction of blood cell traits improved by ~10% on average combining PGS with vPGS. Using Mendelian randomisation and vPGS association analyses, we found that alcohol consumption significantly increased blood cell trait variances highlighting the utility of blood cell vQTLs and vPGSs to provide novel insight into phenotype aetiology as well as improve prediction.
Collapse
Affiliation(s)
- Ruidong Xiang
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria 3083, Australia
- Baker Department of Cardiovascular Research, Translation and Implementation, La Trobe University, Melbourne, VIC, 3086, Australia
- Baker Department of Cardiometabolic Health, The University of Melbourne, VIC, 3010, Australia
| | - Yang Liu
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Chief Ben-Eghan
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Scott Ritchie
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Samuel A. Lambert
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Yu Xu
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| | - Fumihiko Takeuchi
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan
| | - Michael Inouye
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge, UK
- Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
- British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
| |
Collapse
|
8
|
Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, van Heel DA, Wang Y, Martin AR, Lee SH, Natarajan P. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. CELL GENOMICS 2024; 4:100523. [PMID: 38508198 PMCID: PMC11019356 DOI: 10.1016/j.xgen.2024.100523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/15/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Polygenic risk scores (PRSs) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. We propose PRSmix, a framework that leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture for 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% confidence interval [CI], [1.10; 1.3]; p = 9.17 × 10-5) and 1.19-fold (95% CI, [1.11; 1.27]; p = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI, [1.40; 2.04]; p = 7.58 × 10-6) and 1.42-fold (95% CI, [1.25; 1.59]; p = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously cross-trait-combination methods with scores from pre-defined correlated traits, we demonstrated that our method improved prediction accuracy for coronary artery disease up to 3.27-fold (95% CI, [2.1; 4.44]; p value after false discovery rate (FDR) correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Collapse
Affiliation(s)
- Buu Truong
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Leland E Hull
- Division of General Internal Medicine, Massachusetts General Hospital, 100 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Yunfeng Ruan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Whitney Hornsby
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Hilary Martin
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Ying Wang
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA 5000, Australia
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA.
| |
Collapse
|
9
|
Norland K, Schaid DJ, Kullo IJ. A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease. Eur J Hum Genet 2024; 32:209-214. [PMID: 37752310 PMCID: PMC10853172 DOI: 10.1038/s41431-023-01463-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 08/07/2023] [Accepted: 09/13/2023] [Indexed: 09/28/2023] Open
Abstract
Polygenic scores (PGS) for coronary heart disease (CHD) are constructed using GWAS summary statistics for CHD. However, pleiotropy is pervasive in biology and disease-associated variants often share etiologic pathways with multiple traits. Therefore, incorporating GWAS summary statistics of additional traits could improve the performance of PGS for CHD. Using lasso regression models, we developed two multi-PGS for CHD: 1) multiPGSCHD, utilizing GWAS summary statistics for CHD, its risk factors, and other ASCVD as training data and the UK Biobank for tuning, and 2) extendedPGSCHD, using existing PGS for a broader range of traits in the PGS Catalog as training data and the Atherosclerosis Risk in Communities Study (ARIC) cohort for tuning. We evaluated the performance of multiPGSCHD and extendedPGSCHD in the Mayo Clinic Biobank, an independent cohort of 43,578 adults of European ancestry which included 4,479 CHD cases and 39,099 controls. In the Mayo Clinic Biobank, a 1 SD increase in multiPGSCHD and extendedPGSCHD was associated with a 1.66-fold (95% CI: 1.60-1.71) and 1.70-fold (95% CI: 1.64-1.76) increased odds of CHD, respectively, in models that included age, sex, and 10 PCs, whereas an already published PGS for CHD (CHD_PRSCS) increased the odds by 1.50 (95% CI: 1.45-1.56). In the highest deciles of extendedPGSCHD, multiPGSCHD, and CHD_PRSCS, 18.4%, 17.5%, and 16.3% of patients had CHD, respectively.
Collapse
Affiliation(s)
- Kristjan Norland
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
- Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
10
|
Zhai S, Mehrotra DV, Shen J. Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities. Brief Bioinform 2023; 25:bbad470. [PMID: 38152980 PMCID: PMC10782924 DOI: 10.1093/bib/bbad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
Polygenic risk scores (PRSs) have emerged as promising tools for the prediction of human diseases and complex traits in disease genome-wide association studies (GWAS). Applying PRSs to pharmacogenomics (PGx) studies has begun to show great potential for improving patient stratification and drug response prediction. However, there are unique challenges that arise when applying PRSs to PGx GWAS beyond those typically encountered in disease GWAS (e.g. Eurocentric or trans-ethnic bias). These challenges include: (i) the lack of knowledge about whether PGx or disease GWAS/variants should be used in the base cohort (BC); (ii) the small sample sizes in PGx GWAS with corresponding low power and (iii) the more complex PRS statistical modeling required for handling both prognostic and predictive effects simultaneously. To gain insights in this landscape about the general trends, challenges and possible solutions, we first conduct a systematic review of both PRS applications and PRS method development in PGx GWAS. To further address the challenges, we propose (i) a novel PRS application strategy by leveraging both PGx and disease GWAS summary statistics in the BC for PRS construction and (ii) a new Bayesian method (PRS-PGx-Bayesx) to reduce Eurocentric or cross-population PRS prediction bias. Extensive simulations are conducted to demonstrate their advantages over existing PRS methods applied in PGx GWAS. Our systematic review and methodology research work not only highlights current gaps and key considerations while applying PRS methods to PGx GWAS, but also provides possible solutions for better PGx PRS applications and future research.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
11
|
Busby GB, Kulm S, Bolli A, Kintzle J, Domenico PD, Bottà G. Ancestry-specific polygenic risk scores are risk enhancers for clinical cardiovascular disease assessments. Nat Commun 2023; 14:7105. [PMID: 37925478 PMCID: PMC10625612 DOI: 10.1038/s41467-023-42897-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 10/25/2023] [Indexed: 11/06/2023] Open
Abstract
Clinical implementation of new prediction models requires evaluation of their utility in a broad range of intended use populations. Here we develop and validate ancestry-specific Polygenic Risk Scores (PRSs) for Coronary Artery Disease (CAD) using 29,389 individuals from diverse cohorts and genetic ancestry groups. The CAD PRSs outperform published scores with an average Odds Ratio per Standard Deviation of 1.57 (SD = 0.14) and identify between 12% and 24% of individuals with high genetic risk. Using this risk factor to reclassify borderline or intermediate 10 year Atherosclerotic Cardiovascular Disease (ASCVD) risk improves assessments for both CAD (Net Reclassification Improvement (NRI) = 13.14% (95% CI 9.23-17.06%)) and ASCVD (NRI = 10.70 (95% CI 7.35-14.05)) in an independent cohort of 9,691 individuals. Our analyses demonstrate that using PRSs as Risk Enhancers improves ASCVD risk assessments outlining an approach for guiding ASCVD prevention with genetic information.
Collapse
Affiliation(s)
| | - Scott Kulm
- Allelica Inc, 447 Broadway, New York, NY, 10013, USA
| | | | - Jen Kintzle
- Allelica Inc, 447 Broadway, New York, NY, 10013, USA
| | | | | |
Collapse
|
12
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
13
|
Zhang C, Zhang Y, Zhang Y, Zhao H. Benchmarking of local genetic correlation estimation methods using summary statistics from genome-wide association studies. Brief Bioinform 2023; 24:bbad407. [PMID: 37974509 PMCID: PMC10654488 DOI: 10.1093/bib/bbad407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 10/06/2023] [Accepted: 10/24/2023] [Indexed: 11/19/2023] Open
Abstract
Local genetic correlation evaluates the correlation of additive genetic effects between different traits across the same genetic variants at a genomic locus. It has been proven informative for understanding the genetic similarities of complex traits beyond that captured by global genetic correlation calculated across the whole genome. Several summary-statistics-based approaches have been developed for estimating local genetic correlation, including $\rho$-hess, SUPERGNOVA and LAVA. However, there has not been a comprehensive evaluation of these methods to offer practical guidelines on the choices of these methods. In this study, we conduct benchmark comparisons of the performance of these three methods through extensive simulation and real data analyses. We focus on two technical difficulties in estimating local genetic correlation: sample overlaps across traits and local linkage disequilibrium (LD) estimates when only the external reference panels are available. Our simulations suggest the likelihood of incorrectly identifying correlated regions and local correlation estimation accuracy are highly dependent on the estimation of the local LD matrix. These observations are corroborated by real data analyses of 31 complex traits. Overall, our findings illuminate the distinct results yielded by different methods applied in post-genome-wide association studies (post-GWAS) local correlation studies. We underscore the sensitivity of local genetic correlation estimates and inferences to the precision of local LD estimation. These observations accentuate the vital need for ongoing refinement in methodologies.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Yiliang Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Yunxuan Zhang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| |
Collapse
|
14
|
Tsegaselassie W, Jian Y, Berhanu GG, Tianyuan L, April M, Tali E, Fasil TA, Timothy TA, Jordana C, Marguerite IR, Robert SM, Michael VW, Kristine Y, Myriam F, Donald LJM, Mario S, Daichi S, Yuichiro Y, Paul M, Adam B. Associations of cardiometabolic polygenic risk scores with cardiovascular disease in African Americans. RESEARCH SQUARE 2023:rs.3.rs-3228815. [PMID: 37693576 PMCID: PMC10491340 DOI: 10.21203/rs.3.rs-3228815/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Background Cardiovascular disease (CVD) is a complex disease, and genetic factors contribute individually or cumulatively to CVD risk. While African American women and men are disproportionately affected by CVD, their lack of representation in genomic investigations may widen disparities in health. We investigated the associations of cardiometabolic polygenic risk scores (PRSs) with CVD risk in African Americans. Methods We used the Jackson Heart Study, a prospective cohort study of CVD in African American adults and the predicted atherosclerotic cardiovascular disease (ASCVD) 10-year risk. We included 40-79 years old adults without a history of coronary heart disease (CHD) or stroke at baseline. We derived genome-wide PRSs for systolic blood pressure (SBP), diastolic blood pressure (DBP), total cholesterol, LDL cholesterol, hemoglobin A1c (HbA1c), triglycerides, and C-reactive protein (CRP) separately for each of the participants, using African-origin UK Biobank participants' genome-wide association summary statistics. We estimated the associations between PRSs and 10-year predicted ASCVD risk adjusting for age, sex, study visit date, and genetic ancestry using linear and logistic regression models. Results Participants (n=2,077) were 63% female and 66% never-smokers. They had mean (SD) 56 (10) years of age, 127.8 (16.3) mmHg SBP, 76.3 (8.7) mmHg DBP, 200.4 (40.2) mg/dL total cholesterol, 51.7 (14.7) mg/dL HDL cholesterol, 127.2 (36.7) mg/dL LDL cholesterol, 6.0 (1.3) mmol/mol HbA1c, 108.9 (81.7) mg/dL triglycerides and 0.53 (1.1) CRP. Their median (interquartile range) predicted 10-year predicted ASCVD risk was 8.0 (4.0-15.0). Participants in the >75th percentile for HbA1c PRS had 1.42 percentage-point greater predicted 10-year ASCVD risk (1.42 [95% CI: 0.58-2.26]) and higher odds of ≥10% predicted 10-year ASCVD risk (OR: 1.46 [95% CI: 1.03-2.07]) compared with those in the <25th percentile for HbA1c PRS. Participants in the >75th percentile for SBP PRS had higher odds of ≥10% predicted 10-year ASCVD risk (OR: 1.52 [95% CI: 1.07-2.15]) compared with those in the <25th percentile for SBP PRS. Conclusion Among 40-79 years old African Americans without CHD and stroke, higher PRSs for HbA1c and SBP were associated with CVD risk. PRSs may help stratify individuals based on their clinical risk factors for CVD early prevention and clinical management.
Collapse
Affiliation(s)
| | | | | | - Lu Tianyuan
- Lady Davis Institute for Medical Research, Jewish General Hospital
| | | | | | | | | | | | | | | | | | | | | | | | - Sims Mario
- University of Mississippi Medical Center
| | | | | | | | | |
Collapse
|
15
|
Albiñana C, Zhu Z, Schork AJ, Ingason A, Aschard H, Brikell I, Bulik CM, Petersen LV, Agerbo E, Grove J, Nordentoft M, Hougaard DM, Werge T, Børglum AD, Mortensen PB, McGrath JJ, Neale BM, Privé F, Vilhjálmsson BJ. Multi-PGS enhances polygenic prediction by combining 937 polygenic scores. Nat Commun 2023; 14:4702. [PMID: 37543680 PMCID: PMC10404269 DOI: 10.1038/s41467-023-40330-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/21/2023] [Indexed: 08/07/2023] Open
Abstract
The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.
Collapse
Affiliation(s)
- Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
| | - Zhihong Zhu
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Andrew J Schork
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- The Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Andrés Ingason
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université de Paris, 25-28 Rue du Dr Roux, 75015, Paris, France
| | - Isabell Brikell
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Cynthia M Bulik
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| | - Liselotte V Petersen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Copenhagen Research Centre on Mental Health (CORE), University of Copenhagen, Copenhagen, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, 2300, Copenhagen S, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- Lundbeck Foundation Centre for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350, Copenhagen K, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD, 4076, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Florian Privé
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Bjarni J Vilhjálmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark.
- Novo Nordisk Foundation Center for Genomic Mechanisms, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
16
|
Clark K, Fu W, Liu CL, Ho PC, Wang H, Lee WP, Chou SY, Wang LS, Tzeng JY. The prediction of Alzheimer's disease through multi-trait genetic modeling. Front Aging Neurosci 2023; 15:1168638. [PMID: 37577355 PMCID: PMC10416111 DOI: 10.3389/fnagi.2023.1168638] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 06/26/2023] [Indexed: 08/15/2023] Open
Abstract
To better capture the polygenic architecture of Alzheimer's disease (AD), we developed a joint genetic score, MetaGRS. We incorporated genetic variants for AD and 24 other traits from two independent cohorts, NACC (n = 3,174, training set) and UPitt (n = 2,053, validation set). One standard deviation increase in the MetaGRS is associated with about 57% increase in the AD risk [hazard ratio (HR) = 1.577, p = 7.17 E-56], showing little difference from the HR for AD GRS alone (HR = 1.579, p = 1.20E-56), suggesting similar utility of both models. We also conducted APOE-stratified analyses to assess the role of the e4 allele on risk prediction. Similar to that of the combined model, our stratified results did not show a considerable improvement of the MetaGRS. Our study showed that the prediction power of the MetaGRS significantly outperformed that of the reference model without any genetic information, but was effectively equivalent to the prediction power of the AD GRS.
Collapse
Affiliation(s)
- Kaylyn Clark
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wei Fu
- Department of Health Management and Systems Sciences, School of Public Health and Information Sciences, University of Louisville, Louisville, KY, United States
| | - Chia-Lun Liu
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Pei-Chuan Ho
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, PA, United States
| | - Hui Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Shin-Yi Chou
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Economics, Lehigh University, Bethlehem, PA, United States
- National Bureau of Economic Research, Cambridge, MA, United States
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Jung-Ying Tzeng
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Statistics, North Carolina State University, Raleigh, NC, United States
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, United States
| |
Collapse
|
17
|
Bahda M, Ricard J, Girard SL, Maziade M, Isabelle M, Bureau A. Multivariate extension of penalized regression on summary statistics to construct polygenic risk scores for correlated traits. HGG ADVANCES 2023; 4:100209. [PMID: 37333772 PMCID: PMC10276147 DOI: 10.1016/j.xhgg.2023.100209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 05/17/2023] [Indexed: 06/20/2023] Open
Abstract
Genetic correlations between human traits and disorders such as schizophrenia (SZ) and bipolar disorder (BD) diagnoses are well established. Improved prediction of individual traits has been obtained by combining predictors of multiple genetically correlated traits derived from summary statistics produced by genome-wide association studies, compared with single trait predictors. We extend this idea to penalized regression on summary statistics in Multivariate Lassosum, expressing regression coefficients for the multiple traits on single nucleotide polymorphisms (SNPs) as correlated random effects, similarly to multi-trait summary statistic best linear unbiased predictors (MT-SBLUPs). We also allow the SNP contributions to genetic covariance and heritability to depend on genomic annotations. We conducted simulations with two dichotomous traits having polygenic architecture similar to SZ and BD, using genotypes from 29,330 subjects from the CARTaGENE cohort. Multivariate Lassosum produced polygenic risk scores (PRSs) more strongly correlated with the true genetic risk predictor and had better discrimination power between affected and non-affected subjects than previously published sparse multi-trait (PANPRS) and univariate (Lassosum, sparse LDpred2, and the standard clumping and thresholding) methods in most simulation settings. Application of Multivariate Lassosum to predict SZ, BD, and related psychiatric traits in the Eastern Quebec SZ and BD kindred study revealed associations with every trait stronger than those obtained with univariate sparse PRSs, particularly when heritability and genetic covariance depended on genomic annotations. Multivariate Lassosum thus appears promising to improve prediction of genetically correlated traits with summary statistics for a selected subset of SNPs.
Collapse
Affiliation(s)
- Meriem Bahda
- Department of Mathematics and Statistic, Laval University, Québec, QC G1V 0A6, Canada
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
| | - Jasmin Ricard
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
| | - Simon L. Girard
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Fundamental Sciences, University of Quebec in Chicoutimi, Chicoutimi, QC G7H 2B1, Canada
| | - Michel Maziade
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Psychiatry and Neurosciences, Laval University, Québec, QC G1V 0A6, Canada
| | - Maripier Isabelle
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Economics, Laval University, Québec, QC G1V 0A6, Canada
| | - Alexandre Bureau
- CERVO Brain Research Centre, Québec, QC G1E 1T2, Canada
- Department of Social and Preventive Medicine, Laval University, Québec, QC G1V 0A6, Canada
| |
Collapse
|
18
|
Morgante F, Carbonetto P, Wang G, Zou Y, Sarkar A, Stephens M. A flexible empirical Bayes approach to multivariate multiple regression, and its improved accuracy in predicting multi-tissue gene expression from genotypes. PLoS Genet 2023; 19:e1010539. [PMID: 37418505 PMCID: PMC10355440 DOI: 10.1371/journal.pgen.1010539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 06/02/2023] [Indexed: 07/09/2023] Open
Abstract
Predicting phenotypes from genotypes is a fundamental task in quantitative genetics. With technological advances, it is now possible to measure multiple phenotypes in large samples. Multiple phenotypes can share their genetic component; therefore, modeling these phenotypes jointly may improve prediction accuracy by leveraging effects that are shared across phenotypes. However, effects can be shared across phenotypes in a variety of ways, so computationally efficient statistical methods are needed that can accurately and flexibly capture patterns of effect sharing. Here, we describe new Bayesian multivariate, multiple regression methods that, by using flexible priors, are able to model and adapt to different patterns of effect sharing and specificity across phenotypes. Simulation results show that these new methods are fast and improve prediction accuracy compared with existing methods in a wide range of settings where effects are shared. Further, in settings where effects are not shared, our methods still perform competitively with state-of-the-art methods. In real data analyses of expression data in the Genotype Tissue Expression (GTEx) project, our methods improve prediction performance on average for all tissues, with the greatest gains in tissues where effects are strongly shared, and in the tissues with smaller sample sizes. While we use gene expression prediction to illustrate our methods, the methods are generally applicable to any multi-phenotype applications, including prediction of polygenic scores and breeding values. Thus, our methods have the potential to provide improvements across fields and organisms.
Collapse
Affiliation(s)
- Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, South Carolina, United States of America
- Department of Genetics and Biochemistry, Clemson University, Clemson, South Carolina, United States of America
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois, United States of America
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Research Computing Center, University of Chicago, Chicago, Illinois, United States of America
| | - Gao Wang
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Neurology, Columbia University, New York, New York, United States of America
- Gertrude H. Sergievsky Center, Columbia University, New York, New York, United States of America
| | - Yuxin Zou
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Regeneron Genetics Center, Regeneron Pharmaceuticals Inc., Tarrytown, New York, United States of America
| | - Abhishek Sarkar
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - Matthew Stephens
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
19
|
Clasen JB, Fikse WF, Su G, Karaman E. Multibreed genomic prediction using summary statistics and a breed-origin-of-alleles approach. Heredity (Edinb) 2023; 131:33-42. [PMID: 37231157 PMCID: PMC10313778 DOI: 10.1038/s41437-023-00619-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 04/11/2023] [Accepted: 04/26/2023] [Indexed: 05/27/2023] Open
Abstract
Because of an increasing interest in crossbreeding between dairy breeds in dairy cattle herds, farmers are requesting breeding values for crossbred animals. However, genomically enhanced breeding values are difficult to predict in crossbred populations because the genetic make-up of crossbred individuals is unlikely to follow the same pattern as for purebreds. Furthermore, sharing genotype and phenotype information between breed populations are not always possible, which means that genetic merit (GM) for crossbred animals may be predicted without the information needed from some pure breeds, resulting in low prediction accuracy. This simulation study investigated the consequences of using summary statistics from single-breed genomic predictions for some or all pure breeds in two- and three-breed rotational crosses, rather than their raw data. A genomic prediction model taking into account the breed-origin of alleles (BOA) was considered. Because of a high genomic correlation between the breeds simulated (0.62-0.87), the prediction accuracies using the BOA approach were similar to a joint model, assuming homogeneous SNP effects for these breeds. Having a reference population with summary statistics available from all pure breeds and full phenotype and genotype information from crossbreds yielded almost as high prediction accuracies (0.720-0.768) as having a reference population with full information from all pure breeds and crossbreds (0.753-0.789). Lacking information from the pure breeds yielded much lower prediction accuracies (0.590-0.676). Furthermore, including crossbred animals in a combined reference population also benefitted prediction accuracies in the purebred animals, especially for the smallest breed population.
Collapse
Affiliation(s)
- J B Clasen
- Department of Animal Breeding and Genetics, Swedish University of Agricultural Sciences, Box 7023, 75007, Uppsala, Sweden.
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark.
| | - W F Fikse
- Växa Sverige, Swedish University of Agricultural Sciences, Ulls väg 26, 756 51, Uppsala, Sweden
| | - G Su
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark
| | - E Karaman
- Center for Quantitative Genetics and Genomics, Aarhus University, C. F. Møllers Allé 8, DK-8000, Aarhus, Denmark
| |
Collapse
|
20
|
Zhao B, Zou F, Zhu H. Cross-trait prediction accuracy of summary statistics in genome-wide association studies. Biometrics 2023; 79:841-853. [PMID: 35278218 PMCID: PMC9464799 DOI: 10.1111/biom.13661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 02/25/2022] [Indexed: 11/27/2022]
Abstract
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are "true," but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.
Collapse
Affiliation(s)
- Bingxin Zhao
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, U.S.A
| | - Fei Zou
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, U.S.A
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, U.S.A
| |
Collapse
|
21
|
Zhai S, Guo B, Wu B, Mehrotra DV, Shen J. Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS. Brief Bioinform 2023:7169140. [PMID: 37200155 DOI: 10.1093/bib/bbad181] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/30/2023] [Accepted: 04/21/2023] [Indexed: 05/20/2023] Open
Abstract
Polygenic risk score (PRS) has been recently developed for predicting complex traits and drug responses. It remains unknown whether multi-trait PRS (mtPRS) methods, by integrating information from multiple genetically correlated traits, can improve prediction accuracy and power for PRS analysis compared with single-trait PRS (stPRS) methods. In this paper, we first review commonly used mtPRS methods and find that they do not directly model the underlying genetic correlations among traits, which has been shown to be useful in guiding multi-trait association analysis in the literature. To overcome this limitation, we propose a mtPRS-PCA method to combine PRSs from multiple traits with weights obtained from performing principal component analysis (PCA) on the genetic correlation matrix. To accommodate various genetic architectures covering different effect directions, signal sparseness and across-trait correlation structures, we further propose an omnibus mtPRS method (mtPRS-O) by combining P values from mtPRS-PCA, mtPRS-ML (mtPRS based on machine learning) and stPRSs using Cauchy Combination Test. Our extensive simulation studies show that mtPRS-PCA outperforms other mtPRS methods in both disease and pharmacogenomics (PGx) genome-wide association studies (GWAS) contexts when traits are similarly correlated, with dense signal effects and in similar effect directions, and mtPRS-O is consistently superior to most other methods due to its robustness under various genetic architectures. We further apply mtPRS-PCA, mtPRS-O and other methods to PGx GWAS data from a randomized clinical trial in the cardiovascular domain and demonstrate performance improvement of mtPRS-PCA in both prediction accuracy and patient stratification as well as the robustness of mtPRS-O in PRS association test.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Bin Guo
- Data and Genome Science, Merck & Co., Inc., Cambridge, MA 02141, USA
| | - Baolin Wu
- Department of Epidemiology and Biostatistics, University of California Irvine, Irvine, CA 92697, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
22
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
23
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
24
|
Dron JS. The clinical utility of polygenic risk scores for combined hyperlipidemia. Curr Opin Lipidol 2023; 34:44-51. [PMID: 36602940 DOI: 10.1097/mol.0000000000000865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
PURPOSE OF REVIEW Combined hyperlipidemia is the most common lipid disorder and is strongly polygenic. Given its prevalence and associated risk for atherosclerotic cardiovascular disease, this review describes the potential for utilizing polygenic risk scores for risk prediction and management of combined hyperlipidemia. RECENT FINDINGS Different diagnostic criteria have led to inconsistent prevalence estimates and missed diagnoses. Given that individuals with combined hyperlipidemia have risk estimates for incident coronary artery disease similar to individuals with familial hypercholesterolemia, early identification and therapeutic management of those affected is crucial. With diagnostic criteria including traits such apolipoprotein B, low-density lipoprotein cholesterol, and triglyceride, polygenic risk scores for these traits strongly associate with combined hyperlipidemia and could be used in combination for clinical risk prediction models and developing specific treatment plans for patients. SUMMARY Polygenic risk scores are effective tools in risk prediction of combined hyperlipidemia, can provide insight into disease pathophysiology, and may be useful in managing and guiding treatment plans for patients. However, efforts to ensure equitable polygenic risk score performance across different genetic ancestry groups is necessary before clinical implementation in order to prevent the exacerbation of racial disparities in the clinic.
Collapse
Affiliation(s)
- Jacqueline S Dron
- Center for Genomic Medicine, Massachusetts General Hospital, Boston
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
25
|
Genetic correlation and gene-based pleiotropy analysis for four major neurodegenerative diseases with summary statistics. Neurobiol Aging 2023; 124:117-128. [PMID: 36740554 DOI: 10.1016/j.neurobiolaging.2022.12.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 03/25/2022] [Accepted: 12/27/2022] [Indexed: 01/02/2023]
Abstract
Recent genome-wide association studies suggested shared genetic components between neurodegenerative diseases. However, pleiotropic association patterns among them remain poorly understood. We here analyzed 4 major neurodegenerative diseases including Alzheimer's disease (AD), Parkinson's disease (PD), frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), and found suggestively positive genetic correlation. We next implemented a gene-centric pleiotropy analysis with a powerful method called PLACO and detected 280 pleiotropic associations (226 unique genes) with these diseases. Functional analyses demonstrated that these genes were enriched in the pancreas, liver, heart, blood, brain, and muscle tissues; and that 42 pleiotropic genes exhibited drug-gene interactions with 341 drugs. Using Mendelian randomization, we discovered that AD and PD can increase the risk of developing ALS, and that AD and ALS can also increase the risk of developing FTD, respectively. Overall, this study provides in-depth insights into shared genetic components and causal relationship among the 4 major neurodegenerative diseases, indicating genetic overlap and causality commonly drive their co-occurrence. It also has important implications on the etiology understanding, drug development and therapeutic targets for neurodegenerative diseases.
Collapse
|
26
|
Ye H, Xu Z, Bello SF, Zhu Q, Kong S, Zheng M, Fang X, Jia X, Xu H, Zhang X, Nie Q. Haplotype analysis of genomic prediction by incorporating genomic pathway information based on high-density SNP marker in Chinese yellow-feathered chicken. Poult Sci 2023; 102:102549. [PMID: 36907129 PMCID: PMC10024239 DOI: 10.1016/j.psj.2023.102549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2022] [Revised: 01/16/2023] [Accepted: 01/27/2023] [Indexed: 02/09/2023] Open
Abstract
Genomic selection using single nucleotide polymorphism (SNP) markers is now intensively investigated in breeding and has been widely utilized for genetic improvement. Currently, several studies have used haplotype (consisting of multiallelic SNPs) for genomic prediction and revealed its performance advantage. In this study, we comprehensively evaluated the performance of haplotype models for genomic prediction in 15 traits, including 6 growth, 5 carcass, and 4 feeding traits in a Chinese yellow-feathered chicken population. We adopted 3 methods to define haplotypes from high-density SNP panels, and our strategy included combining Kyoto Encyclopedia of Genes and Genomes pathway information and considering linkage disequilibrium (LD) information. Our results showed an increase in prediction accuracy due to haplotypes ranging from -0.04∼27.16% in all traits, where the significant improvements were found in 12 traits. The estimates of haplotype epistasis heritability were strongly correlated with the accuracy increase by haplotype models. In addition, incorporating genomic annotation information could further increase the accuracy of the haplotype model, where the further increase in accuracy is significantly relative to the increase of relative haplotype epistasis heritability. The genomic prediction using LD information for constructing haplotypes has the best prediction performance among the 4 traits. These results uncovered that haplotype methods were beneficial for genomic prediction, and the accuracy could be further increased by incorporating genomic annotation information. Moreover, using LD information would potentially improve the performance of genomic prediction.
Collapse
Affiliation(s)
- Haoqiang Ye
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Zhenqiang Xu
- Wen's Nanfang Poultry Breeding Co. Ltd, Guangdong Province, Yunfu 527400, China
| | - Semiu Folaniyi Bello
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Qianghui Zhu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China
| | - Shaofen Kong
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Ming Zheng
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Xiang Fang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Xinzheng Jia
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, Foshan University, Foshan, 528225 China
| | - Haiping Xu
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Xiquan Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China
| | - Qinghua Nie
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science, South China Agricultural University, Guangzhou, 510642 China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642 China.
| |
Collapse
|
27
|
Abstract
Polygenic scores quantify inherited risk by integrating information from many common sites of DNA variation into a single number. Rapid increases in the scale of genetic association studies and new statistical algorithms have enabled development of polygenic scores that meaningfully measure-as early as birth-risk of coronary artery disease. These newer-generation polygenic scores identify up to 8% of the population with triple the normal risk based on genetic variation alone, and these individuals cannot be identified on the basis of family history or clinical risk factors alone. For those identified with increased genetic risk, evidence supports risk reduction with at least two interventions, adherence to a healthy lifestyle and cholesterol-lowering therapies, that can substantially reduce risk. Alongside considerable enthusiasm for the potential of polygenic risk estimation to enable a new era of preventive clinical medicine is recognition of a need for ongoing research into how best to ensure equitable performance across diverse ancestries, how and in whom to assess the scores in clinical practice, as well as randomized trials to confirm clinical utility.
Collapse
Affiliation(s)
- Aniruddh P Patel
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Amit V Khera
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.,Verve Therapeutics, Cambridge, Massachusetts, USA
| |
Collapse
|
28
|
Xia X, Zhang Y, Wei Y, Wang MH. Statistical Methods for Disease Risk Prediction with Genotype Data. Methods Mol Biol 2023; 2629:331-347. [PMID: 36929084 DOI: 10.1007/978-1-0716-2986-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Single-nucleotide polymorphism (SNP) is the basic unit to understand the heritability of complex traits. One attractive application of the susceptible SNPs is to construct prediction models for assessing disease risk. Here, we introduce prediction methods for human traits using SNPs data, including the polygenic risk score (PRS), linear mixed models (LMMs), penalized regressions, and methods for controlling population stratification.
Collapse
Affiliation(s)
- Xiaoxuan Xia
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | | | - Yingying Wei
- Department of Statistics, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong
| | - Maggie Haitian Wang
- JC School of Public Health and Primary Care, the Chinese University of Hong Kong (CUHK), Shatin, Hong Kong.
- CUHK Shenzhen Institute, Shenzhen, China.
| |
Collapse
|
29
|
Elgart M, Goodman MO, Isasi C, Chen H, Morrison AC, de Vries PS, Xu H, Manichaikul AW, Guo X, Franceschini N, Psaty BM, Rich SS, Rotter JI, Lloyd-Jones DM, Fornage M, Correa A, Heard-Costa NL, Vasan RS, Hernandez R, Kaplan RC, Redline S, Sofer T. Correlations between complex human phenotypes vary by genetic background, gender, and environment. Cell Rep Med 2022; 3:100844. [PMID: 36513073 PMCID: PMC9797952 DOI: 10.1016/j.xcrm.2022.100844] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 07/11/2022] [Accepted: 11/09/2022] [Indexed: 12/15/2022]
Abstract
We develop a closed-form Haseman-Elston estimator for genetic and environmental correlation coefficients between complex phenotypes, which we term HEc, that is as precise as GCTA yet ∼20× faster. We estimate genetic and environmental correlations between over 7,000 phenotype pairs in subgroups from the Trans-Omics in Precision Medicine (TOPMed) program. We demonstrate substantial differences in both heritabilities and genetic correlations for multiple phenotypes and phenotype pairs between individuals of self-reported Black, Hispanic/Latino, and White backgrounds. We similarly observe differences in many of the genetic and environmental correlations between genders. To estimate the contribution of genetics to the observed phenotypic correlation, we introduce "fractional genetic correlation" as the fraction of phenotypic correlation explained by genetics. Finally, we quantify the enrichment of correlations between phenotypic domains, each of which is comprised of multiple phenotypes. Altogether, we demonstrate that the observed correlations between complex human phenotypes depend on the genetic background of the individuals, their gender, and their environment.
Collapse
Affiliation(s)
- Michael Elgart
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Matthew O Goodman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Carmen Isasi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA; Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul S de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Huichun Xu
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | | | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA; Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Adolfo Correa
- Department of Population Health Science, University of Mississippi Medical Center, Jackson, MS, USA
| | - Nancy L Heard-Costa
- Boston University and National Heart Lung and Blood Institute's Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA
| | - Ramachandran S Vasan
- Boston University and National Heart Lung and Blood Institute's Framingham Heart Study, Framingham, MA, USA; Preventive Medicine & Epidemiology, and Cardiovascular Medicine, Medicine, Boston University School of Medicine, and Epidemiology, Boston University School of Public Health, Boston, MA, USA
| | - Ryan Hernandez
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA; Fred Hutchinson Cancer Research Center, Division of Public Health Sciences, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
30
|
Tielbeek JJ, Uffelmann E, Williams BS, Colodro-Conde L, Gagnon É, Mallard TT, Levitt BE, Jansen PR, Johansson A, Sallis HM, Pistis G, Saunders GRB, Allegrini AG, Rimfeld K, Konte B, Klein M, Hartmann AM, Salvatore JE, Nolte IM, Demontis D, Malmberg ALK, Burt SA, Savage JE, Sugden K, Poulton R, Harris KM, Vrieze S, McGue M, Iacono WG, Mota NR, Mill J, Viana JF, Mitchell BL, Morosoli JJ, Andlauer TFM, Ouellet-Morin I, Tremblay RE, Côté SM, Gouin JP, Brendgen MR, Dionne G, Vitaro F, Lupton MK, Martin NG, Castelao E, Räikkönen K, Eriksson JG, Lahti J, Hartman CA, Oldehinkel AJ, Snieder H, Liu H, Preisig M, Whipp A, Vuoksimaa E, Lu Y, Jern P, Rujescu D, Giegling I, Palviainen T, Kaprio J, Harden KP, Munafò MR, Morneau-Vaillancourt G, Plomin R, Viding E, Boutwell BB, Aliev F, Dick DM, Popma A, Faraone SV, Børglum AD, Medland SE, Franke B, Boivin M, Pingault JB, Glennon JC, Barnes JC, Fisher SE, Moffitt TE, Caspi A, Polderman TJC, Posthuma D. Uncovering the genetic architecture of broad antisocial behavior through a genome-wide association study meta-analysis. Mol Psychiatry 2022; 27:4453-4463. [PMID: 36284158 PMCID: PMC10902879 DOI: 10.1038/s41380-022-01793-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 08/03/2022] [Accepted: 09/09/2022] [Indexed: 01/14/2023]
Abstract
Despite the substantial heritability of antisocial behavior (ASB), specific genetic variants robustly associated with the trait have not been identified. The present study by the Broad Antisocial Behavior Consortium (BroadABC) meta-analyzed data from 28 discovery samples (N = 85,359) and five independent replication samples (N = 8058) with genotypic data and broad measures of ASB. We identified the first significant genetic associations with broad ASB, involving common intronic variants in the forkhead box protein P2 (FOXP2) gene (lead SNP rs12536335, p = 6.32 × 10-10). Furthermore, we observed intronic variation in Foxp2 and one of its targets (Cntnap2) distinguishing a mouse model of pathological aggression (BALB/cJ strain) from controls (BALB/cByJ strain). Polygenic risk score (PRS) analyses in independent samples revealed that the genetic risk for ASB was associated with several antisocial outcomes across the lifespan, including diagnosis of conduct disorder, official criminal convictions, and trajectories of antisocial development. We found substantial genetic correlations of ASB with mental health (depression rg = 0.63, insomnia rg = 0.47), physical health (overweight rg = 0.19, waist-to-hip ratio rg = 0.32), smoking (rg = 0.54), cognitive ability (intelligence rg = -0.40), educational attainment (years of schooling rg = -0.46) and reproductive traits (age at first birth rg = -0.58, father's age at death rg = -0.54). Our findings provide a starting point toward identifying critical biosocial risk mechanisms for the development of ASB.
Collapse
Affiliation(s)
- Jorim J Tielbeek
- Center for Neurogenomics and Cognitive Research, Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands.
| | - Emil Uffelmann
- Center for Neurogenomics and Cognitive Research, Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
| | - Benjamin S Williams
- Department of Psychology and Neuroscience, Trinity College of Arts and Sciences, Duke University, 2020 West Main Street, Durham, NC, 27705, USA
| | - Lucía Colodro-Conde
- Psychiatric Genetics, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Éloi Gagnon
- Research Unit on Children's Psychosocial Maladjustment, École de psychologie, Université Laval, 2523 Allée des Bibliothèques, Quebec City, QC, G1V 0A6, Canada
| | - Travis T Mallard
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Brandt E Levitt
- Carolina Population Center, University of North Carolina at Chapel Hill, 123 Franklin St, Chapel Hill, NC, 27516, USA
| | - Philip R Jansen
- Center for Neurogenomics and Cognitive Research, Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
| | - Ada Johansson
- Department of Psychology, Faculty of Arts, Psychology, and Theology, Åbo Akademi University, Tuomiokirkontori 3, FI-20500, Turku, Finland
| | - Hannah M Sallis
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Road, Bristol, BS8 2BN, UK
| | - Giorgio Pistis
- Center for Psychiatric Epidemiology and Psychopathology, Department of Psychiatry, Lausanne University Hospital and University of Lausanne, Route de Cery 25, CH-1008, Prilly, Vaud, Switzerland
| | - Gretchen R B Saunders
- Department of Psychology, University of Minnesota, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - Andrea G Allegrini
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, DeCrespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - Kaili Rimfeld
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, DeCrespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - Bettina Konte
- Department of Psychiatry and Psychotherapy, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - Marieke Klein
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Geert Groteplein 10, 6500 HB, Nijmegen, The Netherlands
| | - Annette M Hartmann
- Department of Psychiatry and Psychotherapy, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - Jessica E Salvatore
- Department of Psychiatry, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ, USA
| | - Ilja M Nolte
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9700 RB, Groningen, The Netherlands
| | - Ditte Demontis
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, 8000, Aarhus C, Aarhus, Denmark
| | - Anni L K Malmberg
- Department of Psychology and Logopedics, University of Helsinki, Haartmaninkatu 3, 00014, Helsinki, Finland
| | | | - Jeanne E Savage
- Center for Neurogenomics and Cognitive Research, Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
| | - Karen Sugden
- Department of Psychology and Neuroscience, Trinity College of Arts and Sciences, Duke University, 2020 West Main Street, Durham, NC, 27705, USA
| | - Richie Poulton
- Dunedin Multidisciplinary Health and Development Research Unit, Department of Psychology, Dunedin, New Zealand
| | - Kathleen Mullan Harris
- Department of Sociology, University of North Carolina at Chapel Hill, CB# 3210, 201 Hamilton Hall, Chapel Hill, NC, 27599, USA
| | - Scott Vrieze
- Department of Psychology, University of Minnesota, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - Matt McGue
- Department of Psychology, University of Minnesota, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - William G Iacono
- Department of Psychology, University of Minnesota, 75 E. River Road, Minneapolis, MN, 55455, USA
| | - Nina Roth Mota
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Geert Groteplein 10, 6500 HB, Nijmegen, The Netherlands
| | - Jonathan Mill
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Joana F Viana
- The Institute of Metabolism and Systems Research (IMSR), University of Birmingham, Edgbaston, Birmingham, UK
| | - Brittany L Mitchell
- Genetic Epidemiology, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Jose J Morosoli
- Psychiatric Genetics, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Till F M Andlauer
- Department of Neurology, Technical University of Munich, 22 Ismaninger St., 81675, Munich, Germany
| | - Isabelle Ouellet-Morin
- Research Unit on Children's Psychosocial Maladjustment, École de criminologie, Université of Montreal, 3150 Rue Jean-Brillant, Montreal, QC, H3T 1N8, Canada
| | - Richard E Tremblay
- Research Unit on Children's Psychosocial Maladjustment, Département de pédiatrie et de psychologie, University of Montreal, 90 Avenue Vincent d'Indy, Montreal, QC, H2V 2S9, Canada
| | - Sylvana M Côté
- Research Unit on Children's Psychosocial Maladjustment, CHU Ste-Justine Research Center and Department of Social and Preventive Medicine, University of Montreal, 3175 Chemin de la Côte Ste-Catherine, Montreal, QC, H3T 1C5, Canada
| | - Jean-Philippe Gouin
- Department of Psychology, Concordia University, 7141 Sherbrooke St. West, Montreal, QC, H4B 1R6, Canada
| | - Mara R Brendgen
- Research Unit on Children's Psychosocial Maladjustment, Département de psychologie, Université du Québec à Montréal, CP 8888 succursale Centre-ville, Montreal, QC, H3C 3P8, Canada
| | - Ginette Dionne
- Research Unit on Children's Psychosocial Maladjustment, École de psychologie, Université Laval, 2523 Allée des Bibliothèques, Quebec City, QC, G1V 0A6, Canada
| | - Frank Vitaro
- Research Unit on Children's Psychosocial Maladjustment, CHU Sainte-Justine Research Center and University of Montreal, 3175 Chemin de la Côte Ste-Catherine, Montreal, QC, H3T 1C5, Canada
| | - Michelle K Lupton
- Genetic Epidemiology, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Nicholas G Martin
- Genetic Epidemiology, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Enrique Castelao
- Center for Psychiatric Epidemiology and Psychopathology, Department of Psychiatry, Lausanne University Hospital and University of Lausanne, Route de Cery 25, CH-1008, Prilly, Vaud, Switzerland
| | - Katri Räikkönen
- Department of Psychology and Logopedics, University of Helsinki, Haartmaninkatu 3, 00014, Helsinki, Finland
| | - Johan G Eriksson
- Department of General Practice and Primary Health Care, University of Helsinki, Tukholmankatu 8 B, Helsinki, Finland
| | - Jari Lahti
- Department of Psychology and Logopedics, University of Helsinki, Haartmaninkatu 3, 00014, Helsinki, Finland
| | - Catharina A Hartman
- Interdisciplinary Center Psychopathology and Emotion Regulation (ICPE), University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9700 RB, Groningen, The Netherlands
| | - Albertine J Oldehinkel
- Interdisciplinary Center Psychopathology and Emotion Regulation (ICPE), University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9700 RB, Groningen, The Netherlands
| | - Harold Snieder
- Department of Epidemiology, University of Groningen, University Medical Center Groningen, Hanzeplein 1, 9700 RB, Groningen, The Netherlands
| | - Hexuan Liu
- School of Criminal Justice, University of Cincinnati, 2840 Bearcat Way, Cincinnati, OH, 45221, USA
| | - Martin Preisig
- Center for Psychiatric Epidemiology and Psychopathology, Department of Psychiatry, Lausanne University Hospital and University of Lausanne, Route de Cery 25, CH-1008, Prilly, Vaud, Switzerland
| | - Alyce Whipp
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, PO Box 4, (Yliopistonkatu 3), 00014, Helsinki, Finland
| | - Eero Vuoksimaa
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, PO Box 4, (Yliopistonkatu 3), 00014, Helsinki, Finland
| | - Yi Lu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Nobels Väg 12A, 171 77, Stockholm, Sweden
| | - Patrick Jern
- Department of Psychology, Faculty of Arts, Psychology, and Theology, Åbo Akademi University, Tuomiokirkontori 3, FI-20500, Turku, Finland
| | - Dan Rujescu
- Department of Psychiatry and Psychotherapy, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - Ina Giegling
- Department of Psychiatry and Psychotherapy, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - Teemu Palviainen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, PO Box 4, (Yliopistonkatu 3), 00014, Helsinki, Finland
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, PO Box 4, (Yliopistonkatu 3), 00014, Helsinki, Finland
| | - Kathryn Paige Harden
- Department of Psychology and Population Research Center, University of Texas at Austin, 108 E Dean Keeton Stop #A8000, Austin, TX, 78712, USA
| | - Marcus R Munafò
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield Road, Bristol, BS8 2BN, UK
| | - Geneviève Morneau-Vaillancourt
- Research Unit on Children's Psychosocial Maladjustment, École de psychologie, Université Laval, 2523 Allée des Bibliothèques, Quebec City, QC, G1V 0A6, Canada
| | - Robert Plomin
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, DeCrespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - Essi Viding
- Division of Psychology and Language Sciences, University College London, London, UK
| | - Brian B Boutwell
- School of Applied Sciences, University of Mississippi, John D. Bower School of Population Health, University of Mississippi Medical Center, 84 Dormitory Row West, University, MS, 38677, USA
| | - Fazil Aliev
- Department of Psychology, Virginia Commonwealth University, Box 842018, 806W Franklin St, Richmond, VA, 23284, USA
| | - Danielle M Dick
- Department of Psychology, Virginia Commonwealth University, Box 842018, 806W Franklin St, Richmond, VA, 23284, USA
| | - Arne Popma
- Amsterdam UMC, VKC Psyche, Child and Adolescent Psychiatry & Psychosocial Care, Amsterdam, The Netherlands
| | - Stephen V Faraone
- Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
| | - Anders D Børglum
- iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, 8000, Aarhus C, Aarhus, Denmark
| | - Sarah E Medland
- Psychiatric Genetics, Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Herston, Brisbane, QLD, 4006, Australia
| | - Barbara Franke
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaivour, Radboud University Medical Center, Geert Grooteplein 10, 6525 GA, Nijmegen, The Netherlands
| | - Michel Boivin
- Research Unit on Children's Psychosocial Maladjustment, École de psychologie, Université Laval, 2523 Allée des Bibliothèques, Quebec City, QC, G1V 0A6, Canada
| | - Jean-Baptiste Pingault
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Jeffrey C Glennon
- Conway Institute of Biomolecular and Biomedical Sciences, School of Medicine, University College Dublin, Dublin, Ireland
| | - J C Barnes
- School of Criminal Justice, University of Cincinnati, 2840 Bearcat Way, Cincinnati, OH, 45221, USA
| | - Simon E Fisher
- Language and Genetics Department, Max Planck Institute for Psycholinguistics, Wundtlaan 1, 6525 XD, Nijmegen, The Netherlands
| | - Terrie E Moffitt
- Department of Psychology and Neuroscience, Trinity College of Arts and Sciences, Duke University, 2020 West Main Street, Durham, NC, 27705, USA
| | - Avshalom Caspi
- Department of Psychology and Neuroscience, Trinity College of Arts and Sciences, Duke University, 2020 West Main Street, Durham, NC, 27705, USA
| | - Tinca J C Polderman
- Amsterdam UMC, VKC Psyche, Child and Adolescent Psychiatry & Psychosocial Care, Amsterdam, The Netherlands
| | - Danielle Posthuma
- Center for Neurogenomics and Cognitive Research, Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, De Boelelaan 1105, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|
31
|
Allegrini AG, Baldwin JR, Barkhuizen W, Pingault JB. Research Review: A guide to computing and implementing polygenic scores in developmental research. J Child Psychol Psychiatry 2022; 63:1111-1124. [PMID: 35354222 PMCID: PMC10108570 DOI: 10.1111/jcpp.13611] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 02/28/2022] [Accepted: 03/04/2022] [Indexed: 12/14/2022]
Abstract
The increasing availability of genotype data in longitudinal population- and family-based samples provides opportunities for using polygenic scores (PGS) to study developmental questions in child and adolescent psychology and psychiatry. Here, we aim to provide a comprehensive overview of how PGS can be generated and implemented in developmental psycho(patho)logy, with a focus on longitudinal designs. As such, the paper is organized into three parts: First, we provide a formal definition of polygenic scores and related concepts, focusing on assumptions and limitations. Second, we give a general overview of the methods used to compute polygenic scores, ranging from the classic approach to more advanced methods. We include recommendations and reference resources available to researchers aiming to conduct PGS analyses. Finally, we focus on the practical applications of PGS in the analysis of longitudinal data. We describe how PGS have been used to research developmental outcomes, and how they can be applied to longitudinal data to address developmental questions.
Collapse
Affiliation(s)
- Andrea G Allegrini
- Division of Psychology and Language Sciences, Department of Clinical, Educational and Health Psychology, University College London, London, UK.,Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Jessie R Baldwin
- Division of Psychology and Language Sciences, Department of Clinical, Educational and Health Psychology, University College London, London, UK.,Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Wikus Barkhuizen
- Division of Psychology and Language Sciences, Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Jean-Baptiste Pingault
- Division of Psychology and Language Sciences, Department of Clinical, Educational and Health Psychology, University College London, London, UK.,Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| |
Collapse
|
32
|
Guo X, Han J, Song Y, Yin Z, Liu S, Shang X. Using expression quantitative trait loci data and graph-embedded neural networks to uncover genotype–phenotype interactions. Front Genet 2022; 13:921775. [PMID: 36046233 PMCID: PMC9421127 DOI: 10.3389/fgene.2022.921775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2022] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Motivation: A central goal of current biology is to establish a complete functional link between the genotype and phenotype, known as the so-called genotype–phenotype map. With the continuous development of high-throughput technology and the decline in sequencing costs, multi-omics analysis has become more widely employed. While this gives us new opportunities to uncover the correlation mechanisms between single-nucleotide polymorphism (SNP), genes, and phenotypes, multi-omics still faces certain challenges, specifically: 1) When the sample size is large enough, the number of omics types is often not large enough to meet the requirements of multi-omics analysis; 2) each omics’ internal correlations are often unclear, such as the correlation between genes in genomics; 3) when analyzing a large number of traits (p), the sample size (n) is often smaller than p, n << p, hindering the application of machine learning methods in the classification of disease outcomes.Results: To solve these issues with multi-omics and build a robust classification model, we propose a graph-embedded deep neural network (G-EDNN) based on expression quantitative trait loci (eQTL) data, which achieves sparse connectivity between network layers to prevent overfitting. The correlation within each omics is also considered such that the model more closely resembles biological reality. To verify the capabilities of this method, we conducted experimental analysis using the GSE28127 and GSE95496 data sets from the Gene Expression Omnibus (GEO) database, tested various neural network architectures, and used prior data for feature selection and graph embedding. Results show that the proposed method could achieve a high classification accuracy and easy-to-interpret feature selection. This method represents an extended application of genotype–phenotype association analysis in deep learning networks.
Collapse
Affiliation(s)
- Xinpeng Guo
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
- School of Air and Missile Defense, Air Force Engineering University, Xi’an, China
| | - Jinyu Han
- School of Economics and Management, Chang ‘an University, Xi’an, China
| | - Yafei Song
- School of Air and Missile Defense, Air Force Engineering University, Xi’an, China
| | - Zhilei Yin
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
| | - Shuaichen Liu
- School of Marine Science and Technology, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, China
- *Correspondence: Xuequn Shang,
| |
Collapse
|
33
|
Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. Annu Rev Biomed Data Sci 2022; 5:293-320. [PMID: 35576555 PMCID: PMC9828290 DOI: 10.1146/annurev-biodatasci-111721-074830] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alicia R Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA;
- Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| |
Collapse
|
34
|
Hujoel ML, Loh PR, Neale BM, Price AL. Incorporating family history of disease improves polygenic risk scores in diverse populations. CELL GENOMICS 2022; 2:100152. [PMID: 35935918 PMCID: PMC9351615 DOI: 10.1016/j.xgen.2022.100152] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 05/22/2022] [Accepted: 06/09/2022] [Indexed: 01/04/2023]
Abstract
Polygenic risk scores (PRSs) derived from genotype data and family history (FH) of disease provide valuable information for predicting disease risk, but PRSs perform poorly when applied to diverse populations. Here, we explore methods for combining both types of information (PRS-FH) in UK Biobank data. PRSs were trained using all British individuals (n = 409,000), and target samples consisted of unrelated non-British Europeans (n = 42,000), South Asians (n = 7,000), or Africans (n = 7,000). We evaluated PRS, FH, and PRS-FH using liability-scale R 2, primarily focusing on 3 well-powered diseases (type 2 diabetes, hypertension, and depression). PRS attained average prediction R 2s of 5.8%, 4.0%, and 0.53% in non-British Europeans, South Asians, and Africans, confirming poor cross-population transferability. In contrast, PRS-FH attained average prediction R 2s of 13%, 12%, and 10%, respectively, representing a large improvement in Europeans and an extremely large improvement in Africans. In conclusion, including family history improves the accuracy of polygenic risk scores, particularly in diverse populations.
Collapse
Affiliation(s)
- Margaux L.A. Hujoel
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Po-Ru Loh
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Benjamin M. Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alkes L. Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
35
|
Khunsriraksakul C, Markus H, Olsen NJ, Carrel L, Jiang B, Liu DJ. Construction and Application of Polygenic Risk Scores in Autoimmune Diseases. Front Immunol 2022; 13:889296. [PMID: 35833142 PMCID: PMC9271862 DOI: 10.3389/fimmu.2022.889296] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified hundreds of genetic variants associated with autoimmune diseases and provided unique mechanistic insights and informed novel treatments. These individual genetic variants on their own typically confer a small effect of disease risk with limited predictive power; however, when aggregated (e.g., via polygenic risk score method), they could provide meaningful risk predictions for a myriad of diseases. In this review, we describe the recent advances in GWAS for autoimmune diseases and the practical application of this knowledge to predict an individual’s susceptibility/severity for autoimmune diseases such as systemic lupus erythematosus (SLE) via the polygenic risk score method. We provide an overview of methods for deriving different polygenic risk scores and discuss the strategies to integrate additional information from correlated traits and diverse ancestries. We further advocate for the need to integrate clinical features (e.g., anti-nuclear antibody status) with genetic profiling to better identify patients at high risk of disease susceptibility/severity even before clinical signs or symptoms develop. We conclude by discussing future challenges and opportunities of applying polygenic risk score methods in clinical care.
Collapse
Affiliation(s)
- Chachrit Khunsriraksakul
- Graduate Program in Bioinformatics and Genomics, Pennsylvania State University College of Medicine, Hershey, PA, United States
- Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Havell Markus
- Graduate Program in Bioinformatics and Genomics, Pennsylvania State University College of Medicine, Hershey, PA, United States
- Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Nancy J. Olsen
- Department of Medicine, Division of Rheumatology, Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Laura Carrel
- Department of Biochemistry and Molecular Biology, Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Bibo Jiang
- Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, United States
| | - Dajiang J. Liu
- Institute for Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA, United States
- Department of Public Health Sciences, Pennsylvania State University College of Medicine, Hershey, PA, United States
- *Correspondence: Dajiang J. Liu,
| |
Collapse
|
36
|
Kurniansyah N, Goodman MO, Kelly TN, Elfassy T, Wiggins KL, Bis JC, Guo X, Palmas W, Taylor KD, Lin HJ, Haessler J, Gao Y, Shimbo D, Smith JA, Yu B, Feofanova EV, Smit RAJ, Wang Z, Hwang SJ, Liu S, Wassertheil-Smoller S, Manson JE, Lloyd-Jones DM, Rich SS, Loos RJF, Redline S, Correa A, Kooperberg C, Fornage M, Kaplan RC, Psaty BM, Rotter JI, Arnett DK, Morrison AC, Franceschini N, Levy D, Sofer T. A multi-ethnic polygenic risk score is associated with hypertension prevalence and progression throughout adulthood. Nat Commun 2022; 13:3549. [PMID: 35729114 PMCID: PMC9213527 DOI: 10.1038/s41467-022-31080-2] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 05/31/2022] [Indexed: 12/12/2022] Open
Abstract
In a multi-stage analysis of 52,436 individuals aged 17-90 across diverse cohorts and biobanks, we train, test, and evaluate a polygenic risk score (PRS) for hypertension risk and progression. The PRS is trained using genome-wide association studies (GWAS) for systolic, diastolic blood pressure, and hypertension, respectively. For each trait, PRS is selected by optimizing the coefficient of variation (CV) across estimated effect sizes from multiple potential PRS using the same GWAS, after which the 3 trait-specific PRSs are combined via an unweighted sum called "PRSsum", forming the HTN-PRS. The HTN-PRS is associated with both prevalent and incident hypertension at 4-6 years of follow up. This association is further confirmed in age-stratified analysis. In an independent biobank of 40,201 individuals, the HTN-PRS is confirmed to be predictive of increased risk for coronary artery disease, ischemic stroke, type 2 diabetes, and chronic kidney disease.
Collapse
Affiliation(s)
- Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Matthew O Goodman
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tanika N Kelly
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
| | - Tali Elfassy
- Department of Medicine, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Kerri L Wiggins
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Walter Palmas
- Department of Medicine, Columbia University Medical Center, New York, NY, USA
| | - Kent D Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Henry J Lin
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jeffrey Haessler
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Yan Gao
- The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Daichi Shimbo
- Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
| | - Jennifer A Smith
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Elena V Feofanova
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Roelof A J Smit
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Shih-Jen Hwang
- Department of Biostatistics, Boston University, Boston, MA, USA
| | - Simin Liu
- Center for Global Cardiometabolic Health and Departments of Epidemiology, Medicine, and Surgery, Brown University, Providence, RI, USA
| | - Sylvia Wassertheil-Smoller
- Department of Epidemiology & Population Health, Department of Pediatrics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - JoAnn E Manson
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Adolfo Correa
- Departments of Medicine and Pediatrics, University of Mississippi Medical Center, Jackson, MS, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Robert C Kaplan
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Daniel Levy
- The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
37
|
Jighly A, Benhajali H, Liu Z, Goddard ME. MetaGS: an accurate method to impute and combine SNP effects across populations using summary statistics. Genet Sel Evol 2022; 54:37. [PMID: 35655152 PMCID: PMC9164759 DOI: 10.1186/s12711-022-00725-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 05/02/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Meta-analysis describes a category of statistical methods that aim at combining the results of multiple studies to increase statistical power by exploiting summary statistics. Different industries that use genomic prediction do not share their raw data due to logistic or privacy restrictions, which can limit the size of their reference populations and creates a need for a practical meta-analysis method. RESULTS We developed a meta-analysis, named MetaGS, that duplicates the results of multi-trait best linear unbiased prediction (mBLUP) analysis without accessing raw data. MetaGS exploits the correlations among different populations to produce more accurate population-specific single nucleotide polymorphism (SNP) effects. The method improves SNP effect estimations for a given population depending on its relations to other populations. MetaGS was tested on milk, fat and protein yield data of Australian Holstein and Jersey cattle and it generated very similar genomic estimated breeding values to those produced using the mBLUP method for all traits in both breeds. One of the major difficulties when combining SNP effects across populations is the use of different variants for the populations, which limits the applications of meta-analysis in practice. We solved this issue by developing a method to impute missing summary statistics without using raw data. Our results showed that imputing summary statistics can be done with high accuracy (r > 0.9) even when more than 70% of the SNPs were missing with a minimal effect on prediction accuracy. CONCLUSIONS We demonstrated that MetaGS can replace the mBLUP model when raw data cannot be shared, which can lead to more flexible collaborations compared to the single-trait BLUP model.
Collapse
Affiliation(s)
- Abdulqader Jighly
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.
| | - Haifa Benhajali
- Department of Animal Breeding and Genetics, Interbull Centre, Swedish University of Agricultural Sciences, Box 7023, 750 07, Uppsala, Sweden
| | - Zengting Liu
- IT Solutions for Animal Production (vit), Heinrich-Schroeder-Weg 1, 27283, Verden, Germany
| | - Mike E Goddard
- Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC, 3083, Australia.,Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, 3010, Australia
| |
Collapse
|
38
|
Ballard JL, O'Connor LJ. Shared components of heritability across genetically correlated traits. Am J Hum Genet 2022; 109:989-1006. [PMID: 35477001 PMCID: PMC9247834 DOI: 10.1016/j.ajhg.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 04/01/2022] [Indexed: 11/01/2022] Open
Abstract
Most disease-associated genetic variants are pleiotropic, affecting multiple genetically correlated traits. Their pleiotropic associations can be mechanistically informative: if many variants have similar patterns of association, they may act via similar pleiotropic mechanisms, forming a shared component of heritability. We developed pleiotropic decomposition regression (PDR) to identify shared components and their underlying genetic variants. We validated PDR on simulated data and identified limitations of existing methods in recovering the true components. We applied PDR to three clusters of five to six traits genetically correlated with coronary artery disease (CAD), asthma, and type II diabetes (T2D), producing biologically interpretable components. For CAD, PDR identified components related to BMI, hypertension, and cholesterol, and it clarified the relationship among these highly correlated risk factors. We assigned variants to components, calculated their posterior-mean effect sizes, and performed out-of-sample validation. Our posterior-mean effect sizes pool statistical power across traits and substantially boost the correlation (r2) between true and estimated effect sizes (compared with the original summary statistics) by 94% and 70% for asthma and T2D out of sample, respectively, and by a predicted 300% for CAD.
Collapse
Affiliation(s)
- Jenna Lee Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Luke Jen O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
39
|
Feldmann MJ, Piepho HP, Knapp SJ. Average semivariance directly yields accurate estimates of the genomic variance in complex trait analyses. G3 GENES|GENOMES|GENETICS 2022; 12:6571389. [PMID: 35442424 PMCID: PMC9157152 DOI: 10.1093/g3journal/jkac080] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 03/17/2022] [Indexed: 11/23/2022]
Abstract
Many important traits in plants, animals, and microbes are polygenic and challenging to improve through traditional marker-assisted selection. Genomic prediction addresses this by incorporating all genetic data in a mixed model framework. The primary method for predicting breeding values is genomic best linear unbiased prediction, which uses the realized genomic relationship or kinship matrix (K) to connect genotype to phenotype. Genomic relationship matrices share information among entries to estimate the observed entries’ genetic values and predict unobserved entries’ genetic values. One of the main parameters of such models is genomic variance (σg2), or the variance of a trait associated with a genome-wide sample of DNA polymorphisms, and genomic heritability (hg2); however, the seminal papers introducing different forms of K often do not discuss their effects on the model estimated variance components despite their importance in genetic research and breeding. Here, we discuss the effect of several standard methods for calculating the genomic relationship matrix on estimates of σg2 and hg2. With current approaches, we found that the genomic variance tends to be either overestimated or underestimated depending on the scaling and centering applied to the marker matrix (Z), the value of the average diagonal element of K, and the assortment of alleles and heterozygosity (H) in the observed population. Using the average semivariance, we propose a new matrix, KASV, that directly yields accurate estimates of σg2 and hg2 in the observed population and produces best linear unbiased predictors equivalent to routine methods in plants and animals.
Collapse
Affiliation(s)
- Mitchell J Feldmann
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| | - Hans-Peter Piepho
- Biostatistics Unit, Institute of Crop Science, University of Hohenheim , 70593 Stuttgart, Germany
| | - Steven J Knapp
- Department of Plant Sciences, University of California , Davis, CA 95616, USA
| |
Collapse
|
40
|
Xiao J, Cai M, Hu X, Wan X, Chen G, Yang C. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 2022; 38:1947-1955. [PMID: 35040939 DOI: 10.1093/bioinformatics/btac029] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 11/16/2021] [Accepted: 01/12/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION As increasing sample sizes from genome-wide association studies (GWASs), polygenic risk scores (PRSs) have shown great potential in personalized medicine with disease risk prediction, prevention and treatment. However, the PRS constructed using European samples becomes less accurate when it is applied to individuals from non-European populations. It is an urgent task to improve the accuracy of PRSs in under-represented populations, such as African populations and East Asian populations. RESULTS In this article, we propose a cross-population and cross-phenotype (XPXP) method for construction of PRSs in under-represented populations. XPXP can construct accurate PRSs by leveraging biobank-scale datasets in European populations and multiple GWASs of genetically correlated phenotypes. XPXP also allows to incorporate population-specific and phenotype-specific effects, and thus further improves the accuracy of PRS. Through comprehensive simulation studies and real data analysis, we demonstrated that our XPXP outperformed existing PRS approaches. We showed that the height PRSs constructed by XPXP achieved 9% and 18% improvement over the runner-up method in terms of predicted R2 in East Asian and African populations, respectively. We also showed that XPXP substantially improved the stratification ability in identifying individuals at high genetic risk of type 2 diabetes. AVAILABILITY AND IMPLEMENTATION The XPXP software and all analysis code are available at github.com/YangLabHKUST/XPXP. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiashun Xiao
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Mingxuan Cai
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xianghong Hu
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | - Xiang Wan
- Shenzhen Research Institute of Big Data, Shenzhen 518172, China.,Pazhou Lab, Guangzhou 510330, China
| | - Gang Chen
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Can Yang
- Guangzhou HKUST Fok Ying Tung Research Institute, Guangzhou 511458, China.,Department of Mathematics, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| |
Collapse
|
41
|
Zhu X, Zhu L, Wang H, Cooper RS, Chakravarti A. Genome-wide pleiotropy analysis identifies novel blood pressure variants and improves its polygenic risk scores. Genet Epidemiol 2022; 46:105-121. [PMID: 34989438 PMCID: PMC8863647 DOI: 10.1002/gepi.22440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/07/2021] [Indexed: 01/21/2023]
Abstract
Systolic and diastolic blood pressure (S/DBP) are highly correlated modifiable risk factors for cardiovascular disease (CVD). We report here a bidirectional Mendelian Randomization (MR) and horizontal pleiotropy analysis of S/DBP summary statistics from the UK Biobank (UKB)-International Consortium for Blood Pressure (ICBP) (UKB-ICBP) BP genome-wide association study and construct a composite genetic risk score (GRS) by including pleiotropic variants. The composite GRS captures greater (1.11-3.26 fold) heritability for BP traits and increases (1.09- and 2.01-fold) Nagelkerke's R2 for hypertension and CVD. We replicated 118 novel BP horizontal pleiotropic variants including 18 novel BP loci using summary statistics from the Million Veteran Program (MVP) study. An additional 219 novel BP signals and 40 novel loci were identified after a meta-analysis of the UKB-ICBP and MVP summary statistics but without further independent replication. Our study provides further insight into BP regulation and provides a novel way to construct a GRS by including pleiotropic variants for other complex diseases.
Collapse
Affiliation(s)
- Xiaofeng Zhu
- Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandOhioUSA
| | - Luke Zhu
- Department of Medicine, Center for Human Genetics & GenomicsNew York University Langone HealthNew YorkNew YorkUSA
| | - Heming Wang
- Division of Sleep and Circadian DisordersBrigham and Women's HospitalBostonMassachusettsUSA
| | - Richard S. Cooper
- Department of Public Health Sciences, Stritch School of MedicineLoyola University ChicagoMaywoodIllinoisUSA
| | - Aravinda Chakravarti
- Department of Medicine, Center for Human Genetics & GenomicsNew York University Langone HealthNew YorkNew YorkUSA
| |
Collapse
|
42
|
Yang S, Zhou X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief Bioinform 2022; 23:6534383. [PMID: 35193147 DOI: 10.1093/bib/bbac039] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/29/2021] [Accepted: 01/26/2022] [Indexed: 01/02/2023] Open
Abstract
Polygenic scores (PGS) are important tools for carrying out genetic prediction of common diseases and disease related complex traits, facilitating the development of precision medicine. Unfortunately, despite the critical importance of PGS and the vast number of PGS methods recently developed, few comprehensive comparison studies have been performed to evaluate the effectiveness of PGS methods. To fill this critical knowledge gap, we performed a comprehensive comparison study on 12 different PGS methods through internal evaluations on 25 quantitative and 25 binary traits within the UK Biobank with sample sizes ranging from 147 408 to 336 573, and through external evaluations via 25 cross-study and 112 cross-ancestry analyses on summary statistics from multiple genome-wide association studies with sample sizes ranging from 1415 to 329 345. We evaluate the prediction accuracy, computational scalability, as well as robustness and transferability of different PGS methods across datasets and/or genetic ancestries, providing important guidelines for practitioners in choosing PGS methods. Besides method comparison, we present a simple aggregation strategy that combines multiple PGS from different methods to take advantage of their distinct benefits to achieve stable and superior prediction performance. To facilitate future applications of PGS, we also develop a PGS webserver (http://www.pgs-server.com/) that allows users to upload summary statistics and choose different PGS methods to fit the data directly. We hope that our results, method and webserver will facilitate the routine application of PGS across different research areas.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
43
|
Chung W. Statistical models and computational tools for predicting complex traits and diseases. Genomics Inform 2022; 19:e36. [PMID: 35012283 PMCID: PMC8752975 DOI: 10.5808/gi.21053] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 11/01/2021] [Indexed: 12/30/2022] Open
Abstract
Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.
Collapse
Affiliation(s)
- Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
44
|
Kasyanov E, Rakitko A, Rukavishnikov G, Golimbet V, Shmukler A, Iliinsky V, Neznanov N, Kibitov A, Mazo G. Contemporary GWAS studies of depression: the critical role of phenotyping. Zh Nevrol Psikhiatr Im S S Korsakova 2022; 122:50-61. [DOI: 10.17116/jnevro202212201150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
45
|
Al‐Soufi L, Martorell L, Moltó M, González‐Peñas J, García‐Portilla MP, Arrojo M, Rivero O, Gutiérrez‐Zotes A, Nácher J, Muntané G, Paz E, Páramo M, Bobes J, Arango C, Sanjuan J, Vilella E, Costas J. A polygenic approach to the association between smoking and schizophrenia. Addict Biol 2022; 27:e13104. [PMID: 34779080 DOI: 10.1111/adb.13104] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 08/18/2021] [Accepted: 09/20/2021] [Indexed: 11/30/2022]
Abstract
Smoking prevalence in schizophrenia is considerably larger than in general population, playing an important role in early mortality. We compared the polygenic contribution to smoking in schizophrenic patients and controls to assess if genetic factors may explain the different prevalence. Polygenic risk scores (PRSs) for smoking initiation and four genetically correlated traits were calculated in 1108 schizophrenic patients (64.4% smokers) and 1584 controls (31.1% smokers). PRSs for smoking initiation, educational attainment, body mass index and age at first birth were associated with smoking in patients and controls, explaining a similar percentage of variance in both groups. Attention-deficit hyperactivity disorder (ADHD) PRS was associated with smoking only in schizophrenia. This association remained significant after adjustment by psychiatric cross-disorder PRS. A PRS combining all the traits was more explanative than smoking initiation PRS alone, indicating that genetic susceptibility to the other traits plays an additional role in smoking behaviour. Smoking initiation PRS was also associated with schizophrenia in the whole sample, but the significance was lost after adjustment for smoking status. This same pattern was observed in the analysis of specific SNPs at the CHRNA5-CHRNA3-CHRNB4 cluster associated with both traits. Overall, the results indicate that the same genetic factors are involved in smoking susceptibility in schizophrenia and in general population and are compatible with smoking acting, directly or indirectly, as a risk factor for schizophrenia that contributes to the high prevalence of smoking in these patients. The contrasting results for ADHD PRS may be related to higher ADHD symptomatology in schizophrenic patients.
Collapse
Affiliation(s)
- Laila Al‐Soufi
- Psychiatric Genetics Group Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Santiago de Compostela Spain
- Department of Zoology, Genetics and Physical Anthropology Universidade de Santiago de Compostela (USC) Santiago de Compostela Spain
| | - Lourdes Martorell
- Hospital Universitari Institut Pere Mata (HUIPM); Institut d'Investigació Sanitària Pere Virgili (IISPV); Universitat Rovira i Virgili (URV) Reus Spain
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
| | - M.Dolores Moltó
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- INCLIVA Biomedical Research Institute Fundación Investigación Hospital Clínico de Valencia Valencia Spain
- Department of Genetics Universitat de València Valencia Spain
| | - Javier González‐Peñas
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, School of Medicine Universidad Complutense de Madrid, Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM) Madrid Spain
| | - Ma Paz García‐Portilla
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- Department of Psychiatry, Universidad de Oviedo; Instituto de Investigación Sanitaria del Principado de Asturias (ISPA); Instituto Universitario de Neurociencias del Principado de Asturias (INEUROPA); Servicio de Salud del Principado de Asturias (SESPA) Oviedo Spain
| | - Manuel Arrojo
- Psychiatric Genetics Group Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Santiago de Compostela Spain
- Servizo de Psiquiatría, Complexo Hospitalario Universitario de Santiago de Compostela Servizo Galego de Saúde (SERGAS) Santiago de Compostela Spain
| | - Olga Rivero
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- INCLIVA Biomedical Research Institute Fundación Investigación Hospital Clínico de Valencia Valencia Spain
- Department of Genetics Universitat de València Valencia Spain
| | - Alfonso Gutiérrez‐Zotes
- Hospital Universitari Institut Pere Mata (HUIPM); Institut d'Investigació Sanitària Pere Virgili (IISPV); Universitat Rovira i Virgili (URV) Reus Spain
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
| | - Juan Nácher
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- INCLIVA Biomedical Research Institute Fundación Investigación Hospital Clínico de Valencia Valencia Spain
- Department of Cell Biology, Interdisciplinary Research Structure for Biotechnology and Biomedicine (BIOTECMED) Universitat de València Valencia Spain
| | - Gerard Muntané
- Hospital Universitari Institut Pere Mata (HUIPM); Institut d'Investigació Sanitària Pere Virgili (IISPV); Universitat Rovira i Virgili (URV) Reus Spain
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
| | - Eduardo Paz
- Psychiatric Genetics Group Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Santiago de Compostela Spain
- Servizo de Psiquiatría, Complexo Hospitalario Universitario de Santiago de Compostela Servizo Galego de Saúde (SERGAS) Santiago de Compostela Spain
| | - Mario Páramo
- Psychiatric Genetics Group Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Santiago de Compostela Spain
- Servizo de Psiquiatría, Complexo Hospitalario Universitario de Santiago de Compostela Servizo Galego de Saúde (SERGAS) Santiago de Compostela Spain
| | - Julio Bobes
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- Department of Psychiatry, Universidad de Oviedo; Instituto de Investigación Sanitaria del Principado de Asturias (ISPA); Instituto Universitario de Neurociencias del Principado de Asturias (INEUROPA); Servicio de Salud del Principado de Asturias (SESPA) Oviedo Spain
| | - Celso Arango
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- Department of Child and Adolescent Psychiatry, Institute of Psychiatry and Mental Health, Hospital General Universitario Gregorio Marañón, School of Medicine Universidad Complutense de Madrid, Instituto de Investigación Sanitaria Gregorio Marañón (IiSGM) Madrid Spain
| | - Julio Sanjuan
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
- INCLIVA Biomedical Research Institute Fundación Investigación Hospital Clínico de Valencia Valencia Spain
- Department of Psychiatric, School of Medicine Universitat de València Valencia Spain
| | - Elisabet Vilella
- Hospital Universitari Institut Pere Mata (HUIPM); Institut d'Investigació Sanitària Pere Virgili (IISPV); Universitat Rovira i Virgili (URV) Reus Spain
- Spanish Mental Health Research Network (CIBERSAM) Madrid Spain
| | - Javier Costas
- Psychiatric Genetics Group Instituto de Investigación Sanitaria de Santiago de Compostela (IDIS) Santiago de Compostela Spain
- Servizo Galego de Saúde (SERGAS) Complexo Hospitalario Universitario de Santiago de Compostela (CHUS) Santiago de Compostela Spain
| |
Collapse
|
46
|
Shin J, Zhou X, Tan JTM, Hyppönen E, Benyamin B, Lee SH. Lifestyle Modifies the Diabetes-Related Metabolic Risk, Conditional on Individual Genetic Differences. Front Genet 2022; 13:759309. [PMID: 35356427 PMCID: PMC8959634 DOI: 10.3389/fgene.2022.759309] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 01/10/2022] [Indexed: 12/26/2022] Open
Abstract
Metabolic syndrome is a group of heritable metabolic traits that are highly associated with type 2 diabetes (T2DM). Classical interventions to T2DM include individual self-management of environmental risk factors, such as improving diet quality, increasing physical activity, and reducing smoking and alcohol consumption, which decreases the risk of developing metabolic syndrome. However, it is poorly understood how the phenotypes of diabetes-related metabolic traits change with respect to lifestyle modifications at the individual level. In the analysis, we used 12 diabetes-related metabolic traits and eight lifestyle covariates from the UK Biobank comprising 288,837 white British participants genotyped for 1,133,273 genome-wide single nucleotide polymorphisms. We found 16 GxE interactions. Modulation of genetic effects by physical activity was seen for four traits (glucose, HbA1c, C-reactive protein, systolic blood pressure) and by alcohol and smoking for three (BMI, glucose, waist-hip ratio and BMI and diastolic and systolic blood pressure, respectively). We also found a number of significant phenotypic modulations by the lifestyle covariates, which were not attributed to the genetic effects in the model. Overall, modulation in the metabolic risk in response to the level of lifestyle covariates was clearly observed, and its direction and magnitude were varied depending on individual differences. We also showed that the metabolic risk inferred by our model was notably higher in T2DM prospective cases than controls. Our findings highlight the importance of individual genetic differences in the prevention and management of diabetes and suggest that the one-size-fits-all approach may not benefit all.
Collapse
Affiliation(s)
- Jisu Shin
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, Australia.,UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia.,National Cancer Center, Goyang-si, South Korea
| | - Xuan Zhou
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, Australia.,UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia
| | - Joanne T M Tan
- Vascular Research Centre, Heart and Vascular Health Program, Lifelong Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia.,Adelaide Medical School, University of Adelaide, Adelaide, SA, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, Australia.,UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia.,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, Australia.,UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia.,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA, Australia.,UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, Australia.,South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| |
Collapse
|
47
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
48
|
Márquez-Luna C, Gazal S, Loh PR, Kim SS, Furlotte N, Auton A, Price AL. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat Commun 2021; 12:6052. [PMID: 34663819 PMCID: PMC8523709 DOI: 10.1038/s41467-021-25171-9] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 07/16/2021] [Indexed: 12/23/2022] Open
Abstract
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R2 = 0.144; highest R2 = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R2 to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
Collapse
Affiliation(s)
- Carla Márquez-Luna
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Steven Gazal
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Charles R. Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Po-Ru Loh
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel S Kim
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | - Alkes L Price
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
49
|
Rohde PD, Nyegaard M, Kjolby M, Sørensen P. Multi-Trait Genomic Risk Stratification for Type 2 Diabetes. Front Med (Lausanne) 2021; 8:711208. [PMID: 34568370 PMCID: PMC8455930 DOI: 10.3389/fmed.2021.711208] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 08/05/2021] [Indexed: 01/14/2023] Open
Abstract
Type 2 diabetes mellitus (T2DM) is continuously rising with more disease cases every year. T2DM is a chronic disease with many severe comorbidities and therefore remains a burden for the patient and the society. Disease prevention, early diagnosis, and stratified treatment are important elements in slowing down the increase in diabetes prevalence. T2DM has a substantial genetic component with an estimated heritability of 40-70%, and more than 500 genetic loci have been associated with T2DM. Because of the intrinsic genetic basis of T2DM, one tool for risk assessment is genome-wide genetic risk scores (GRS). Current GRS only account for a small proportion of the T2DM risk; thus, better methods are warranted for more accurate risk assessment. T2DM is correlated with several other diseases and complex traits, and incorporating this information by adjusting effect size of the included markers could improve risk prediction. The aim of this study was to develop multi-trait (MT)-GRS leveraging correlated information. We used phenotype and genotype information from the UK Biobank, and summary statistics from two independent T2DM studies. Marker effects for T2DM and seven correlated traits, namely, height, body mass index, pulse rate, diastolic and systolic blood pressure, smoking status, and information on current medication use, were estimated (i.e., by logistic and linear regression) within the UK Biobank. These summary statistics, together with the two independent training summary statistics, were incorporated into the MT-GRS prediction in different combinations. The prediction accuracy of the MT-GRS was improved by 12.5% compared to the single-trait GRS. Testing the MT-GRS strategy in two independent T2DM studies resulted in an elevated accuracy by 50-94%. Finally, combining the seven information traits with the two independent T2DM studies further increased the prediction accuracy by 34%. Across comparisons, body mass index and current medication use were the two traits that displayed the largest weights in construction of the MT-GRS. These results explicitly demonstrate the added benefit of leveraging correlated information when constructing genetic scores. In conclusion, constructing GRS not only based on the disease itself but incorporating genomic information from other correlated traits as well is strongly advisable for obtaining improved individual risk stratification.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.,Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Mette Nyegaard
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark.,Department of Biomedicine, Aarhus University, Aarhus, Denmark
| | - Mads Kjolby
- Department of Biomedicine, Aarhus University, Aarhus, Denmark.,Department of Population Health and Genomics, University of Dundee, Dundee, United Kingdom.,Department of Clinical Pharmacology, Aarhus University Hospital, Aarhus, Denmark.,Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Peter Sørensen
- Centre for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| |
Collapse
|
50
|
Wang T, Lu H, Zeng P. Identifying pleiotropic genes for complex phenotypes with summary statistics from a perspective of composite null hypothesis testing. Brief Bioinform 2021; 23:6375058. [PMID: 34571531 DOI: 10.1093/bib/bbab389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/06/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022] Open
Abstract
Pleiotropy has important implication on genetic connection among complex phenotypes and facilitates our understanding of disease etiology. Genome-wide association studies provide an unprecedented opportunity to detect pleiotropic associations; however, efficient pleiotropy test methods are still lacking. We here consider pleiotropy identification from a methodological perspective of high-dimensional composite null hypothesis and propose a powerful gene-based method called MAIUP. MAIUP is constructed based on the traditional intersection-union test with two sets of independent P-values as input and follows a novel idea that was originally proposed under the high-dimensional mediation analysis framework. The key improvement of MAIUP is that it takes the composite null nature of pleiotropy test into account by fitting a three-component mixture null distribution, which can ultimately generate well-calibrated P-values for effective control of family-wise error rate and false discover rate. Another attractive advantage of MAIUP is its ability to effectively address the issue of overlapping subjects commonly encountered in association studies. Simulation studies demonstrate that compared with other methods, only MAIUP can maintain correct type I error control and has higher power across a wide range of scenarios. We apply MAIUP to detect shared associated genes among 14 psychiatric disorders with summary statistics and discover many new pleiotropic genes that are otherwise not identified if failing to account for the issue of composite null hypothesis testing. Functional and enrichment analyses offer additional evidence supporting the validity of these identified pleiotropic genes associated with psychiatric disorders. Overall, MAIUP represents an efficient method for pleiotropy identification.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|