1
|
Kelemen M, Vigorito E, Fachal L, Anderson CA, Wallace C. shaPRS: Leveraging shared genetic effects across traits or ancestries improves accuracy of polygenic scores. Am J Hum Genet 2024; 111:1006-1017. [PMID: 38703768 PMCID: PMC11179256 DOI: 10.1016/j.ajhg.2024.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/06/2024] Open
Abstract
We present shaPRS, a method that leverages widespread pleiotropy between traits or shared genetic effects across ancestries, to improve the accuracy of polygenic scores. The method uses genome-wide summary statistics from two diseases or ancestries to improve the genetic effect estimate and standard error at SNPs where there is homogeneity of effect between the two datasets. When there is significant evidence of heterogeneity, the genetic effect from the disease or population closest to the target population is maintained. We show via simulation and a series of real-world examples that shaPRS substantially enhances the accuracy of polygenic risk scores (PRSs) for complex diseases and greatly improves PRS performance across ancestries. shaPRS is a PRS pre-processing method that is agnostic to the actual PRS generation method, and as a result, it can be integrated into existing PRS generation pipelines and continue to be applied as more performant PRS methods are developed over time.
Collapse
Affiliation(s)
- Martin Kelemen
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK; Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK.
| | - Elena Vigorito
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Laura Fachal
- Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | | | - Chris Wallace
- Cambridge Institute of Therapeutic Immunology & Infectious Disease, University of Cambridge, Cambridge, UK; MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
2
|
Capalbo A, de Wert G, Mertes H, Klausner L, Coonen E, Spinella F, Van de Velde H, Viville S, Sermon K, Vermeulen N, Lencz T, Carmi S. Screening embryos for polygenic disease risk: a review of epidemiological, clinical, and ethical considerations. Hum Reprod Update 2024:dmae012. [PMID: 38805697 DOI: 10.1093/humupd/dmae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/25/2024] [Indexed: 05/30/2024] Open
Abstract
BACKGROUND The genetic composition of embryos generated by in vitro fertilization (IVF) can be examined with preimplantation genetic testing (PGT). Until recently, PGT was limited to detecting single-gene, high-risk pathogenic variants, large structural variants, and aneuploidy. Recent advances have made genome-wide genotyping of IVF embryos feasible and affordable, raising the possibility of screening embryos for their risk of polygenic diseases such as breast cancer, hypertension, diabetes, or schizophrenia. Despite a heated debate around this new technology, called polygenic embryo screening (PES; also PGT-P), it is already available to IVF patients in some countries. Several articles have studied epidemiological, clinical, and ethical perspectives on PES; however, a comprehensive, principled review of this emerging field is missing. OBJECTIVE AND RATIONALE This review has four main goals. First, given the interdisciplinary nature of PES studies, we aim to provide a self-contained educational background about PES to reproductive specialists interested in the subject. Second, we provide a comprehensive and critical review of arguments for and against the introduction of PES, crystallizing and prioritizing the key issues. We also cover the attitudes of IVF patients, clinicians, and the public towards PES. Third, we distinguish between possible future groups of PES patients, highlighting the benefits and harms pertaining to each group. Finally, our review, which is supported by ESHRE, is intended to aid healthcare professionals and policymakers in decision-making regarding whether to introduce PES in the clinic, and if so, how, and to whom. SEARCH METHODS We searched for PubMed-indexed articles published between 1/1/2003 and 1/3/2024 using the terms 'polygenic embryo screening', 'polygenic preimplantation', and 'PGT-P'. We limited the review to primary research papers in English whose main focus was PES for medical conditions. We also included papers that did not appear in the search but were deemed relevant. OUTCOMES The main theoretical benefit of PES is a reduction in lifetime polygenic disease risk for children born after screening. The magnitude of the risk reduction has been predicted based on statistical modelling, simulations, and sibling pair analyses. Results based on all methods suggest that under the best-case scenario, large relative risk reductions are possible for one or more diseases. However, as these models abstract several practical limitations, the realized benefits may be smaller, particularly due to a limited number of embryos and unclear future accuracy of the risk estimates. PES may negatively impact patients and their future children, as well as society. The main personal harms are an unindicated IVF treatment, a possible reduction in IVF success rates, and patient confusion, incomplete counselling, and choice overload. The main possible societal harms include discarded embryos, an increasing demand for 'designer babies', overemphasis of the genetic determinants of disease, unequal access, and lower utility in people of non-European ancestries. Benefits and harms will vary across the main potential patient groups, comprising patients already requiring IVF, fertile people with a history of a severe polygenic disease, and fertile healthy people. In the United States, the attitudes of IVF patients and the public towards PES seem positive, while healthcare professionals are cautious, sceptical about clinical utility, and concerned about patient counselling. WIDER IMPLICATIONS The theoretical potential of PES to reduce risk across multiple polygenic diseases requires further research into its benefits and harms. Given the large number of practical limitations and possible harms, particularly unnecessary IVF treatments and discarded viable embryos, PES should be offered only within a research context before further clarity is achieved regarding its balance of benefits and harms. The gap in attitudes between healthcare professionals and the public needs to be narrowed by expanding public and patient education and providing resources for informative and unbiased genetic counselling.
Collapse
Affiliation(s)
- Antonio Capalbo
- Juno Genetics, Department of Reproductive Genetics, Rome, Italy
- Center for Advanced Studies and Technology (CAST), Department of Medical Genetics, "G. d'Annunzio" University of Chieti-Pescara, Chieti, Italy
| | - Guido de Wert
- Department of Health, Ethics & Society, CAPHRI-School for Public Health and Primary Care and GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Heidi Mertes
- Department of Philosophy and Moral Sciences, Ghent University, Ghent, Belgium
- Department of Public Health and Primary Care, Ghent University, Ghent, Belgium
| | - Liraz Klausner
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Edith Coonen
- Departments of Clinical Genetics and Reproductive Medicine, Maastricht University Medical Centre, Maastricht, The Netherlands
- School for Oncology and Developmental Biology, GROW, Maastricht University, Maastricht, The Netherlands
| | - Francesca Spinella
- Eurofins GENOMA Group Srl, Molecular Genetics Laboratories, Department of Scientific Communication, Rome, Italy
| | - Hilde Van de Velde
- Research Group Genetics Reproduction and Development (GRAD), Vrije Universiteit Brussel, Brussel, Belgium
- Brussels IVF, UZ Brussel, Brussel, Belgium
| | - Stephane Viville
- Laboratoire de Génétique Médicale LGM, Institut de Génétique Médicale d'Alsace IGMA, INSERM UMR 1112, Université de Strasbourg, France
- Laboratoire de Diagnostic Génétique, Unité de Génétique de l'infertilité (UF3472), Hôpitaux Universitaires de Strasbourg, Strasbourg, France
| | - Karen Sermon
- Research Group Genetics Reproduction and Development (GRAD), Vrije Universiteit Brussel, Brussel, Belgium
| | | | - Todd Lencz
- Institute of Behavioral Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA
- Departments of Psychiatry and Molecular Medicine, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY 11549, USA
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
3
|
Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, van Heel DA, Wang Y, Martin AR, Lee SH, Natarajan P. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. CELL GENOMICS 2024; 4:100523. [PMID: 38508198 PMCID: PMC11019356 DOI: 10.1016/j.xgen.2024.100523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/15/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Polygenic risk scores (PRSs) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. We propose PRSmix, a framework that leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture for 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% confidence interval [CI], [1.10; 1.3]; p = 9.17 × 10-5) and 1.19-fold (95% CI, [1.11; 1.27]; p = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI, [1.40; 2.04]; p = 7.58 × 10-6) and 1.42-fold (95% CI, [1.25; 1.59]; p = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously cross-trait-combination methods with scores from pre-defined correlated traits, we demonstrated that our method improved prediction accuracy for coronary artery disease up to 3.27-fold (95% CI, [2.1; 4.44]; p value after false discovery rate (FDR) correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Collapse
Affiliation(s)
- Buu Truong
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Leland E Hull
- Division of General Internal Medicine, Massachusetts General Hospital, 100 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Yunfeng Ruan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Whitney Hornsby
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Hilary Martin
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Ying Wang
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA 5000, Australia
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA.
| |
Collapse
|
4
|
Norland K, Schaid DJ, Kullo IJ. A linear weighted combination of polygenic scores for a broad range of traits improves prediction of coronary heart disease. Eur J Hum Genet 2024; 32:209-214. [PMID: 37752310 PMCID: PMC10853172 DOI: 10.1038/s41431-023-01463-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 08/07/2023] [Accepted: 09/13/2023] [Indexed: 09/28/2023] Open
Abstract
Polygenic scores (PGS) for coronary heart disease (CHD) are constructed using GWAS summary statistics for CHD. However, pleiotropy is pervasive in biology and disease-associated variants often share etiologic pathways with multiple traits. Therefore, incorporating GWAS summary statistics of additional traits could improve the performance of PGS for CHD. Using lasso regression models, we developed two multi-PGS for CHD: 1) multiPGSCHD, utilizing GWAS summary statistics for CHD, its risk factors, and other ASCVD as training data and the UK Biobank for tuning, and 2) extendedPGSCHD, using existing PGS for a broader range of traits in the PGS Catalog as training data and the Atherosclerosis Risk in Communities Study (ARIC) cohort for tuning. We evaluated the performance of multiPGSCHD and extendedPGSCHD in the Mayo Clinic Biobank, an independent cohort of 43,578 adults of European ancestry which included 4,479 CHD cases and 39,099 controls. In the Mayo Clinic Biobank, a 1 SD increase in multiPGSCHD and extendedPGSCHD was associated with a 1.66-fold (95% CI: 1.60-1.71) and 1.70-fold (95% CI: 1.64-1.76) increased odds of CHD, respectively, in models that included age, sex, and 10 PCs, whereas an already published PGS for CHD (CHD_PRSCS) increased the odds by 1.50 (95% CI: 1.45-1.56). In the highest deciles of extendedPGSCHD, multiPGSCHD, and CHD_PRSCS, 18.4%, 17.5%, and 16.3% of patients had CHD, respectively.
Collapse
Affiliation(s)
- Kristjan Norland
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA.
- Gonda Vascular Center, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
5
|
Lee A, Seo J, Park S, Cho Y, Kim G, Li J, Liang L, Park T, Chung W. Type 2 diabetes and its genetic susceptibility are associated with increased severity and mortality of COVID-19 in UK Biobank. Commun Biol 2024; 7:122. [PMID: 38267566 PMCID: PMC10808197 DOI: 10.1038/s42003-024-05799-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 01/09/2024] [Indexed: 01/26/2024] Open
Abstract
Type 2 diabetes (T2D) is known as one of the important risk factors for the severity and mortality of COVID-19. Here, we evaluate the impact of T2D and its genetic susceptibility on the severity and mortality of COVID-19, using 459,119 individuals in UK Biobank. Utilizing the polygenic risk scores (PRS) for T2D, we identified a significant association between T2D or T2D PRS, and COVID-19 severity. We further discovered the efficacy of vaccination and the pivotal role of T2D-related genetics in the pathogenesis of severe COVID-19. Moreover, we found that individuals with T2D or those in the high T2D PRS group had a significantly increased mortality rate. We also observed that the mortality rate for SARS-CoV-2-infected patients was approximately 2 to 7 times higher than for those not infected, depending on the time of infection. These findings emphasize the potential of T2D PRS in estimating the severity and mortality of COVID-19.
Collapse
Affiliation(s)
- Aeyeon Lee
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Jieun Seo
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Seunghwan Park
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
- Institute of Genetic Epidemiology, Basgenbio Co. Ltd., Seoul, 04167, Korea
| | - Youngkwang Cho
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Gaeun Kim
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Jun Li
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Liming Liang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, Korea.
| | - Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea.
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
6
|
Dahl A, Thompson M, An U, Krebs M, Appadurai V, Border R, Bacanu SA, Werge T, Flint J, Schork AJ, Sankararaman S, Kendler KS, Cai N. Phenotype integration improves power and preserves specificity in biobank-based genetic studies of major depressive disorder. Nat Genet 2023; 55:2082-2093. [PMID: 37985818 PMCID: PMC10703686 DOI: 10.1038/s41588-023-01559-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/18/2023] [Indexed: 11/22/2023]
Abstract
Biobanks often contain several phenotypes relevant to diseases such as major depressive disorder (MDD), with partly distinct genetic architectures. Researchers face complex tradeoffs between shallow (large sample size, low specificity/sensitivity) and deep (small sample size, high specificity/sensitivity) phenotypes, and the optimal choices are often unclear. Here we propose to integrate these phenotypes to combine the benefits of each. We use phenotype imputation to integrate information across hundreds of MDD-relevant phenotypes, which significantly increases genome-wide association study (GWAS) power and polygenic risk score (PRS) prediction accuracy of the deepest available MDD phenotype in UK Biobank, LifetimeMDD. We demonstrate that imputation preserves specificity in its genetic architecture using a novel PRS-based pleiotropy metric. We further find that integration via summary statistics also enhances GWAS power and PRS predictions, but can introduce nonspecific genetic effects depending on input. Our work provides a simple and scalable approach to improve genetic studies in large biobanks by integrating shallow and deep phenotypes.
Collapse
Affiliation(s)
- Andrew Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| | - Michael Thompson
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ulzee An
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Morten Krebs
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
| | - Vivek Appadurai
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
| | - Richard Border
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Silviu-Alin Bacanu
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
- Lundbeck Foundation GeoGenetics Centre, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Jonathan Flint
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Center-Sct Hans, Copenhagen University Hospital-Mental Health Services CPH, Copenhagen, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Sciences, Copenhagen University, Copenhagen, Denmark
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kenneth S Kendler
- Virginia Institute for Psychiatric and Behavioral Genetics and Department of Psychiatry, Virginia Commonwealth University, Richmond, VA, USA
| | - Na Cai
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg, Germany.
- Computational Health Centre, Helmholtz Zentrum München, Neuherberg, Germany.
- School of Medicine, Technical University of Munich, Munich, Germany.
| |
Collapse
|
7
|
Zhai S, Mehrotra DV, Shen J. Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities. Brief Bioinform 2023; 25:bbad470. [PMID: 38152980 PMCID: PMC10782924 DOI: 10.1093/bib/bbad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
Polygenic risk scores (PRSs) have emerged as promising tools for the prediction of human diseases and complex traits in disease genome-wide association studies (GWAS). Applying PRSs to pharmacogenomics (PGx) studies has begun to show great potential for improving patient stratification and drug response prediction. However, there are unique challenges that arise when applying PRSs to PGx GWAS beyond those typically encountered in disease GWAS (e.g. Eurocentric or trans-ethnic bias). These challenges include: (i) the lack of knowledge about whether PGx or disease GWAS/variants should be used in the base cohort (BC); (ii) the small sample sizes in PGx GWAS with corresponding low power and (iii) the more complex PRS statistical modeling required for handling both prognostic and predictive effects simultaneously. To gain insights in this landscape about the general trends, challenges and possible solutions, we first conduct a systematic review of both PRS applications and PRS method development in PGx GWAS. To further address the challenges, we propose (i) a novel PRS application strategy by leveraging both PGx and disease GWAS summary statistics in the BC for PRS construction and (ii) a new Bayesian method (PRS-PGx-Bayesx) to reduce Eurocentric or cross-population PRS prediction bias. Extensive simulations are conducted to demonstrate their advantages over existing PRS methods applied in PGx GWAS. Our systematic review and methodology research work not only highlights current gaps and key considerations while applying PRS methods to PGx GWAS, but also provides possible solutions for better PGx PRS applications and future research.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
8
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
9
|
Albiñana C, Zhu Z, Schork AJ, Ingason A, Aschard H, Brikell I, Bulik CM, Petersen LV, Agerbo E, Grove J, Nordentoft M, Hougaard DM, Werge T, Børglum AD, Mortensen PB, McGrath JJ, Neale BM, Privé F, Vilhjálmsson BJ. Multi-PGS enhances polygenic prediction by combining 937 polygenic scores. Nat Commun 2023; 14:4702. [PMID: 37543680 PMCID: PMC10404269 DOI: 10.1038/s41467-023-40330-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/21/2023] [Indexed: 08/07/2023] Open
Abstract
The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.
Collapse
Affiliation(s)
- Clara Albiñana
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
| | - Zhihong Zhu
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Andrew J Schork
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- The Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Andrés Ingason
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
| | - Hugues Aschard
- Department of Computational Biology, Institut Pasteur, Université de Paris, 25-28 Rue du Dr Roux, 75015, Paris, France
| | - Isabell Brikell
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Cynthia M Bulik
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27514, USA
| | - Liselotte V Petersen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Esben Agerbo
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Jakob Grove
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark
| | - Merete Nordentoft
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Copenhagen Research Centre on Mental Health (CORE), University of Copenhagen, Copenhagen, Denmark
| | - David M Hougaard
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Center for Neonatal Screening, Department for Congenital Disorders, Statens Serum Institut, 2300, Copenhagen S, Denmark
| | - Thomas Werge
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, 2100, Denmark
- Lundbeck Foundation Centre for GeoGenetics, GLOBE Institute, University of Copenhagen, 1350, Copenhagen K, Denmark
| | - Anders D Børglum
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- Department of Biomedicine and Center for Integrative Sequencing, iSEQ, Aarhus University, 8000, Aarhus C, Denmark
- Center for Genomics and Personalized Medicine, Aarhus University, 8000, Aarhus C, Denmark
| | - Preben Bo Mortensen
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Brisbane, QLD, 4076, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Florian Privé
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark
| | - Bjarni J Vilhjálmsson
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, 8210, Aarhus V, Denmark.
- National Centre for Register-Based Research, Aarhus University, 8210, Aarhus V, Denmark.
- Bioinformatics Research Centre, Aarhus University, 8000, Aarhus C, Denmark.
- Novo Nordisk Foundation Center for Genomic Mechanisms, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
10
|
Zhai S, Guo B, Wu B, Mehrotra DV, Shen J. Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS. Brief Bioinform 2023:7169140. [PMID: 37200155 DOI: 10.1093/bib/bbad181] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/30/2023] [Accepted: 04/21/2023] [Indexed: 05/20/2023] Open
Abstract
Polygenic risk score (PRS) has been recently developed for predicting complex traits and drug responses. It remains unknown whether multi-trait PRS (mtPRS) methods, by integrating information from multiple genetically correlated traits, can improve prediction accuracy and power for PRS analysis compared with single-trait PRS (stPRS) methods. In this paper, we first review commonly used mtPRS methods and find that they do not directly model the underlying genetic correlations among traits, which has been shown to be useful in guiding multi-trait association analysis in the literature. To overcome this limitation, we propose a mtPRS-PCA method to combine PRSs from multiple traits with weights obtained from performing principal component analysis (PCA) on the genetic correlation matrix. To accommodate various genetic architectures covering different effect directions, signal sparseness and across-trait correlation structures, we further propose an omnibus mtPRS method (mtPRS-O) by combining P values from mtPRS-PCA, mtPRS-ML (mtPRS based on machine learning) and stPRSs using Cauchy Combination Test. Our extensive simulation studies show that mtPRS-PCA outperforms other mtPRS methods in both disease and pharmacogenomics (PGx) genome-wide association studies (GWAS) contexts when traits are similarly correlated, with dense signal effects and in similar effect directions, and mtPRS-O is consistently superior to most other methods due to its robustness under various genetic architectures. We further apply mtPRS-PCA, mtPRS-O and other methods to PGx GWAS data from a randomized clinical trial in the cardiovascular domain and demonstrate performance improvement of mtPRS-PCA in both prediction accuracy and patient stratification as well as the robustness of mtPRS-O in PRS association test.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Bin Guo
- Data and Genome Science, Merck & Co., Inc., Cambridge, MA 02141, USA
| | - Baolin Wu
- Department of Epidemiology and Biostatistics, University of California Irvine, Irvine, CA 92697, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
11
|
Xiao X, Wu Q. The clinical utility of the BMD-related comprehensive genome-wide polygenic score in identifying individuals with a high risk of osteoporotic fractures. Osteoporos Int 2023; 34:681-692. [PMID: 36622390 PMCID: PMC11225087 DOI: 10.1007/s00198-022-06654-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 12/20/2022] [Indexed: 01/10/2023]
Abstract
The potential of bone mineral density (BMD)-related genome-wide polygenic score (PGS) in identifying individuals with a high risk of fractures remains unclear. This study suggests that an efficient PGS enables the identification of strata with up to a 1.5-fold difference in fracture incidence. Incorporating PGS into clinical diagnosis is anticipated to increase the population-level screening benefits. PURPOSE This study sought to construct genome-wide polygenic scores for femoral neck and total body BMD and to estimate their potential in identifying individuals with a high risk of osteoporotic fractures. METHODS Genome-wide polygenic scores were developed and validated for femoral neck and total body BMD. We externally tested the PGSs, both by themselves and in combination with available clinical risk factors, in 455,663 European ancestry individuals from the UK Biobank. The predictive accuracy of the developed genome-wide PGS was also compared with previously published restricted PGS employed in fracture risk assessment. RESULTS For each unit decrease in PGSs, the genome-wide PGSs were associated with up to 1.17-fold increased fracture risk. Out of four studied PGSs, [Formula: see text] (HR: 1.03; 95%CI 1.01-1.05, p = 0.001) had the weakest and the [Formula: see text] (HR: 1.17; 95%CI 1.15-1.19, p < 0.0001) had the strongest association with an incident fracture. In the reclassification analysis, compared to the FRAX base model, the models with [Formula: see text], [Formula: see text], [Formula: see text], and [Formula: see text] improved the reclassification of fracture by 1.2% (95% CI, 1.0 to 1.3%), 0.2% (95% CI, 0.1 to 0.3%), 1.4% (95% CI, 1.3 to 1.5%), and 2.2% (95% CI, 2.1 to 2.4%), respectively. CONCLUSIONS Our findings suggested that an efficient PGS estimate enables the identification of strata with up to a 1.7-fold difference in fracture incidence. Incorporating PGS information into clinical diagnosis is anticipated to increase the benefits of screening programs at the population level.
Collapse
Affiliation(s)
- Xiangxue Xiao
- Nevada Institute of Personalized Medicine, College of Science, University of Nevada, Las Vegas, NV, USA
- Department of Epidemiology and Biostatistics, School of Public Health, University of Nevada Las Vegas, Las Vegas, NV, USA
| | - Qing Wu
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
12
|
Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, van Heel DA, Wang Y, Martin AR, Lee SH, Natarajan P. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23286110. [PMID: 36865265 PMCID: PMC9980241 DOI: 10.1101/2023.02.21.23286110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
Abstract
Polygenic risk scores (PRS) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. Validation and transferability of existing PRS across independent datasets and diverse ancestries are limited, which hinders the practical utility and exacerbates health disparities. We propose PRSmix, a framework that evaluates and leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture. We applied PRSmix to 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% CI: [1.10; 1.3]; P-value = 9.17 × 10-5) and 1.19-fold (95% CI: [1.11; 1.27]; P-value = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI: [1.40; 2.04]; P-value = 7.58 × 10-6) and 1.42-fold (95% CI: [1.25; 1.59]; P-value = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously established cross-trait-combination method with scores from pre-defined correlated traits, we demonstrated that our method can improve prediction accuracy for coronary artery disease up to 3.27-fold (95% CI: [2.1; 4.44]; P-value after FDR correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Collapse
Affiliation(s)
- Buu Truong
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Leland E. Hull
- Division of General Internal Medicine, 100 Cambridge Street,
Massachusetts General Hospital, Boston, MA, 02114
- Department of Medicine, Harvard Medical School, 25 Shattuck
Street, Boston, MA 02115
| | - Yunfeng Ruan
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute,
Cambridge, UK
| | - Whitney Hornsby
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
| | - Hilary Martin
- Department of Human Genetics, Wellcome Sanger Institute,
Cambridge, UK
| | - David A. van Heel
- Blizard Institute, Barts and the London School of Medicine and
Dentistry, Queen Mary University of London, London, UK
| | - Ying Wang
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Stanley Center for Psychiatric Research, Broad Institute of
Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA, USA
| | - Alicia R. Martin
- Stanley Center for Psychiatric Research, Broad Institute of
Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA, USA
| | - S. Hong Lee
- Australian Centre for Precision Health, University of South
Australia Cancer Research Institute, University of South Australia, Adelaide, SA, 5000,
Australia
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular
Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA
02142
- Center for Genomic Medicine and Cardiovascular Research Center,
Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114
- Department of Medicine, Harvard Medical School, 25 Shattuck
Street, Boston, MA 02115
| |
Collapse
|
13
|
Abstract
Polygenic scores quantify inherited risk by integrating information from many common sites of DNA variation into a single number. Rapid increases in the scale of genetic association studies and new statistical algorithms have enabled development of polygenic scores that meaningfully measure-as early as birth-risk of coronary artery disease. These newer-generation polygenic scores identify up to 8% of the population with triple the normal risk based on genetic variation alone, and these individuals cannot be identified on the basis of family history or clinical risk factors alone. For those identified with increased genetic risk, evidence supports risk reduction with at least two interventions, adherence to a healthy lifestyle and cholesterol-lowering therapies, that can substantially reduce risk. Alongside considerable enthusiasm for the potential of polygenic risk estimation to enable a new era of preventive clinical medicine is recognition of a need for ongoing research into how best to ensure equitable performance across diverse ancestries, how and in whom to assess the scores in clinical practice, as well as randomized trials to confirm clinical utility.
Collapse
Affiliation(s)
- Aniruddh P Patel
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Amit V Khera
- Division of Cardiology and Center for Genomic Medicine, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA; , .,Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.,Verve Therapeutics, Cambridge, Massachusetts, USA
| |
Collapse
|
14
|
Lv H, Li J, Gao K, Zeng L, Xue R, Liu X, Zhou C, Yue W, Yu H. Identification of genetic loci that overlap between schizophrenia and metabolic syndrome. Psychiatry Res 2022; 318:114947. [PMID: 36399892 DOI: 10.1016/j.psychres.2022.114947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 11/11/2022]
Abstract
Patients with schizophrenia (SCZ) frequently exhibit an elevated risk of metabolic syndrome (MetS), which may lead to a worse clinical outcome. Even though these two phenotypes are genetically linked, little is known about their shared genetic determinants. Here, we investigated whether SCZ and MetS share genetic risk factors. To examine the genetic overlap between the two disorders, we applied a comprehensive genetic overlap analysis by integrating GWAS data for SCZ (n = 320,404) and MetS (n = 291,107) at the genome, genetic variants, and gene levels. At the genome level, we observed polygenic overlap between SCZ and MetS by utilizing LDSC (rg=-0.09, P = 1 × 10-4) and GNOVA (rho=-0.04, P = 1.39 × 10-8) analysis. At the SNP level, we performed conjunctional FDR (conjFDR) analysis to identify genetic variants simultaneously associated with two phenotypes. Based on conjFDR < 0.05, we identified 22 loci shared between SCZ and MetS. At the gene level, we further demonstrated that SCZ- and MetS-inferred gene expression overlapped across 49 GTEx tissues and highlighted the PCCB and KCTD13 genes as putative mediators of the genetic association. Overall, these findings shed novel light on the association between SCZ and MetS, and potentially enhance our knowledge of the high comorbidity and genetic processes that overlap between the two disorders.
Collapse
Affiliation(s)
- Honggang Lv
- Department of Psychiatry, Jining Medical University, Jining, Shandong 272067, China
| | - Juan Li
- Department of Psychiatry, Jining Medical University, Jining, Shandong 272067, China
| | - Kai Gao
- National Clinical Research Center for Mental Disorders & Key Laboratory of Mental Health, Ministry of Health (Peking University), Peking University Sixth Hospital (Institute of Mental Health), Beijing 100191, China
| | - Lingsi Zeng
- Department of Psychiatry, Jining Medical University, Jining, Shandong 272067, China
| | - Ranran Xue
- Department of Psychiatry, Shandong Daizhuang Hospital, Jining, Shandong 272051, China
| | - Xia Liu
- Department of Psychiatry, Shandong Daizhuang Hospital, Jining, Shandong 272051, China
| | - Cong Zhou
- Department of Psychiatry, Jining Medical University, Jining, Shandong 272067, China
| | - Weihua Yue
- National Clinical Research Center for Mental Disorders & Key Laboratory of Mental Health, Ministry of Health (Peking University), Peking University Sixth Hospital (Institute of Mental Health), Beijing 100191, China; PKU-IDG/McGovern Institute for Brain Research, Peking University, Beijing 100871, China.
| | - Hao Yu
- Department of Psychiatry, Jining Medical University, Jining, Shandong 272067, China.
| |
Collapse
|
15
|
Tian P, Chan TH, Wang YF, Yang W, Yin G, Zhang YD. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front Genet 2022; 13:906965. [PMID: 36061179 PMCID: PMC9438789 DOI: 10.3389/fgene.2022.906965] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 06/27/2022] [Indexed: 11/28/2022] Open
Abstract
Polygenic risk scores (PRS) leverage the genetic contribution of an individual’s genotype to a complex trait by estimating disease risk. Traditional PRS prediction methods are predominantly for the European population. The accuracy of PRS prediction in non-European populations is diminished due to much smaller sample size of genome-wide association studies (GWAS). In this article, we introduced a novel method to construct PRS for non-European populations, abbreviated as TL-Multi, by conducting a transfer learning framework to learn useful knowledge from the European population to correct the bias for non-European populations. We considered non-European GWAS data as the target data and European GWAS data as the informative auxiliary data. TL-Multi borrows useful information from the auxiliary data to improve the learning accuracy of the target data while preserving the efficiency and accuracy. To demonstrate the practical applicability of the proposed method, we applied TL-Multi to predict the risk of systemic lupus erythematosus (SLE) in the Asian population and the risk of asthma in the Indian population by borrowing information from the European population. TL-Multi achieved better prediction accuracy than the competing methods, including Lassosum and meta-analysis in both simulations and real applications.
Collapse
Affiliation(s)
- Peixin Tian
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Tsai Hor Chan
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yong-Fei Wang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Guosheng Yin
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
| | - Yan Dora Zhang
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Yan Dora Zhang,
| |
Collapse
|
16
|
Wang Y, Tsuo K, Kanai M, Neale BM, Martin AR. Challenges and Opportunities for Developing More Generalizable Polygenic Risk Scores. Annu Rev Biomed Data Sci 2022; 5:293-320. [PMID: 35576555 PMCID: PMC9828290 DOI: 10.1146/annurev-biodatasci-111721-074830] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Polygenic risk scores (PRS) estimate an individual's genetic likelihood of complex traits and diseases by aggregating information across multiple genetic variants identified from genome-wide association studies. PRS can predict a broad spectrum of diseases and have therefore been widely used in research settings. Some work has investigated their potential applications as biomarkers in preventative medicine, but significant work is still needed to definitively establish and communicate absolute risk to patients for genetic and modifiable risk factors across demographic groups. However, the biggest limitation of PRS currently is that they show poor generalizability across diverse ancestries and cohorts. Major efforts are underway through methodological development and data generation initiatives to improve their generalizability. This review aims to comprehensively discuss current progress on the development of PRS, the factors that affect their generalizability, and promising areas for improving their accuracy, portability, and implementation.
Collapse
Affiliation(s)
- Ying Wang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA,Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Kristin Tsuo
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA,Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA,Biological and Biomedical Sciences, Harvard Medical School, Boston, Massachusetts, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA,Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA,Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Benjamin M. Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA,Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Alicia R. Martin
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA,Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| |
Collapse
|
17
|
Ballard JL, O'Connor LJ. Shared components of heritability across genetically correlated traits. Am J Hum Genet 2022; 109:989-1006. [PMID: 35477001 PMCID: PMC9247834 DOI: 10.1016/j.ajhg.2022.04.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 04/01/2022] [Indexed: 11/01/2022] Open
Abstract
Most disease-associated genetic variants are pleiotropic, affecting multiple genetically correlated traits. Their pleiotropic associations can be mechanistically informative: if many variants have similar patterns of association, they may act via similar pleiotropic mechanisms, forming a shared component of heritability. We developed pleiotropic decomposition regression (PDR) to identify shared components and their underlying genetic variants. We validated PDR on simulated data and identified limitations of existing methods in recovering the true components. We applied PDR to three clusters of five to six traits genetically correlated with coronary artery disease (CAD), asthma, and type II diabetes (T2D), producing biologically interpretable components. For CAD, PDR identified components related to BMI, hypertension, and cholesterol, and it clarified the relationship among these highly correlated risk factors. We assigned variants to components, calculated their posterior-mean effect sizes, and performed out-of-sample validation. Our posterior-mean effect sizes pool statistical power across traits and substantially boost the correlation (r2) between true and estimated effect sizes (compared with the original summary statistics) by 94% and 70% for asthma and T2D out of sample, respectively, and by a predicted 300% for CAD.
Collapse
Affiliation(s)
- Jenna Lee Ballard
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Luke Jen O'Connor
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
18
|
Chung W, Hwang H, Park T. Bayesian analysis of longitudinal traits in the Korea Association Resource (KARE) cohort. Genomics Inform 2022; 20:e16. [PMID: 35794696 DOI: 10.5808/gi.22022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 05/14/2022] [Indexed: 11/20/2022] Open
Abstract
Various methodologies for the genetic analysis of longitudinal data have been proposed and applied to data from large-scale genome-wide association studies (GWAS) to identify single nucleotide polymorphisms (SNPs) associated with traits of interest and to detect SNP-time interactions. We recently proposed a grid-based Bayesian mixed model for longitudinal genetic data and showed that our Bayesian method increased the statistical power compared to the corresponding univariate method and well detected SNP-time interactions. In this paper, we further analyze longitudinal obesity-related traits such as body mass index, hip circumference, waist circumference, and waist-hip ratio from Korea Association Resource data to evaluate the proposed Bayesian method. We first conducted GWAS analyses of cross-sectional traits and combined the results of GWAS analyses through a meta-analysis based on a trajectory model and a random-effects model. We then applied our Bayesian method to a subset of SNPs selected by meta-analysis to further discover SNPs associated with traits of interest and SNP-time interactions. The proposed Bayesian method identified several novel SNPs associated with longitudinal obesity-related traits, and almost 25% of the identified SNPs had significant p-values for SNP-time interactions.
Collapse
Affiliation(s)
- Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Hyunji Hwang
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
19
|
Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ, Khera AV, Okada Y, Martin AR, Finucane HK, Price AL. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat Genet 2022; 54:450-458. [PMID: 35393596 PMCID: PMC9009299 DOI: 10.1038/s41588-022-01036-9] [Citation(s) in RCA: 108] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/25/2022] [Indexed: 01/25/2023]
Abstract
Polygenic risk scores suffer reduced accuracy in non-European populations, exacerbating health disparities. We propose PolyPred, a method that improves cross-population polygenic risk scores by combining two predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing linkage disequilibrium differences, and BOLT-LMM, a published predictor. When a large training sample is available in the non-European target population, we propose PolyPred+, which further incorporates the non-European training data. We applied PolyPred to 49 diseases/traits in four UK Biobank populations using UK Biobank British training data, and observed relative improvements versus BOLT-LMM ranging from +7% in south Asians to +32% in Africans, consistent with simulations. We applied PolyPred+ to 23 diseases/traits in UK Biobank east Asians using both UK Biobank British and Biobank Japan training data, and observed improvements of +24% versus BOLT-LMM and +12% versus PolyPred. Summary statistics-based analogs of PolyPred and PolyPred+ attained similar improvements.
Collapse
Affiliation(s)
- Omer Weissbrod
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
| | - Masahiro Kanai
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Huwenbo Shi
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- OMNI Bioinformatics, San Francisco, CA, USA
| | - Steven Gazal
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Wouter J Peyrot
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA
- Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
| | - Amit V Khera
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Verve Therapeutics, Cambridge, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
| | | | - Hilary K Finucane
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alkes L Price
- Epidemiology Department, Harvard School of Public Health, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
20
|
Chung W, Cho Y. Bayesian mixed models for longitudinal genetic data: theory, concepts, and simulation studies. Genomics Inform 2022; 20:e8. [PMID: 35399007 PMCID: PMC9001998 DOI: 10.5808/gi.21080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 03/03/2022] [Indexed: 01/02/2023] Open
Abstract
Despite the success of recent genome-wide association studies investigating longitudinal traits, a large fraction of overall heritability remains unexplained. This suggests that some of the missing heritability may be accounted for by gene-gene and gene-time/environment interactions. In this paper, we develop a Bayesian variable selection method for longitudinal genetic data based on mixed models. The method jointly models the main effects and interactions of all candidate genetic variants and non-genetic factors and has higher statistical power than previous approaches. To account for the within-subject dependence structure, we propose a grid-based approach that models only one fixed-dimensional covariance matrix, which is thus applicable to data where subjects have different numbers of time points. We provide the theoretical basis of our Bayesian method and then illustrate its performance using data from the 1000 Genome Project with various simulation settings. Several simulation studies show that our multivariate method increases the statistical power compared to the corresponding univariate method and can detect gene-time/environment interactions well. We further evaluate our method with different numbers of individuals, variants, and causal variants, as well as different trait-heritability, and conclude that our method performs reasonably well with various simulation settings.
Collapse
Affiliation(s)
- Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Youngkwang Cho
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea
| |
Collapse
|
21
|
Tanigawa Y, Qian J, Venkataraman G, Justesen JM, Li R, Tibshirani R, Hastie T, Rivas MA. Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet 2022; 18:e1010105. [PMID: 35324888 PMCID: PMC8946745 DOI: 10.1371/journal.pgen.1010105] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 02/15/2022] [Indexed: 01/05/2023] Open
Abstract
We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 x 10-5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman's ⍴ = 0.61, p = 2.2 x 10-59 for quantitative traits, ⍴ = 0.21, p = 9.6 x 10-4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).
Collapse
Affiliation(s)
- Yosuke Tanigawa
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Junyang Qian
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Guhan Venkataraman
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
| | - Johanne Marie Justesen
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
| | - Ruilin Li
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California, United States of America
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Trevor Hastie
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Manuel A. Rivas
- Department of Biomedical Data Science, Stanford University, Stanford, California, United States of America
| |
Collapse
|
22
|
Grid-based Gaussian process models for longitudinal genetic data. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.1.065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
23
|
Grid-based Gaussian process models for longitudinal genetic data. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.1.745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
24
|
Chung W. Statistical models and computational tools for predicting complex traits and diseases. Genomics Inform 2022; 19:e36. [PMID: 35012283 PMCID: PMC8752975 DOI: 10.5808/gi.21053] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 11/01/2021] [Indexed: 12/30/2022] Open
Abstract
Predicting individual traits and diseases from genetic variants is critical to fulfilling the promise of personalized medicine. The genetic variants from genome-wide association studies (GWAS), including variants well below GWAS significance, can be aggregated into highly significant predictions across a wide range of complex traits and diseases. The recent arrival of large-sample public biobanks enables highly accurate polygenic predictions based on genetic variants across the whole genome. Various statistical methodologies and diverse computational tools have been introduced and developed to computed the polygenic risk score (PRS) more accurately. However, many researchers utilize PRS tools without a thorough understanding of the underlying model and how to specify the parameters for the best performance. It is advantageous to study the statistical models implemented in computational tools for PRS estimation and the formulas of parameters to be specified. Here, we review a variety of recent statistical methodologies and computational tools for PRS computation.
Collapse
Affiliation(s)
- Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul 06978, Korea.,Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
25
|
Raben TG, Lello L, Widen E, Hsu SDH. From Genotype to Phenotype: Polygenic Prediction of Complex Human Traits. Methods Mol Biol 2022; 2467:421-446. [PMID: 35451785 DOI: 10.1007/978-1-0716-2205-6_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Decoding the genome confers the capability to predict characteristics of the organism (phenotype) from DNA (genotype). We describe the present status and future prospects of genomic prediction of complex traits in humans. Some highly heritable complex phenotypes such as height and other quantitative traits can already be predicted with reasonable accuracy from DNA alone. For many diseases, including important common conditions such as coronary artery disease, breast cancer, type I and II diabetes, individuals with outlier polygenic scores (e.g., top few percent) have been shown to have 5 or even 10 times higher risk than average. Several psychiatric conditions such as schizophrenia and autism also fall into this category. We discuss related topics such as the genetic architecture of complex traits, sibling validation of polygenic scores, and applications to adult health, in vitro fertilization (embryo selection), and genetic engineering.
Collapse
Affiliation(s)
| | - Louis Lello
- Michigan State University, East Lansing, MI, USA
- Genomic Prediction, North Brunswick, NJ, USA
| | - Erik Widen
- Michigan State University, East Lansing, MI, USA
| | - Stephen D H Hsu
- Michigan State University, East Lansing, MI, USA.
- Genomic Prediction, North Brunswick, NJ, USA.
| |
Collapse
|
26
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
27
|
The distribution of common-variant effect sizes. Nat Genet 2021; 53:1243-1249. [PMID: 34326547 DOI: 10.1038/s41588-021-00901-3] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 06/23/2021] [Indexed: 01/08/2023]
Abstract
The genetic effect-size distribution of a disease describes the number of risk variants, the range of their effect sizes and sample sizes that will be required to discover them. Accurate estimation has been a challenge. Here I propose Fourier Mixture Regression (FMR), validating that it accurately estimates real and simulated effect-size distributions. Applied to summary statistics for ten diseases (average [Formula: see text]), FMR estimates that 100,000-1,000,000 cases will be required for genome-wide significant SNPs to explain 50% of SNP heritability. In such large studies, genome-wide significance becomes increasingly conservative, and less stringent thresholds achieve high true positive rates if confounding is controlled. Across traits, polygenicity varies, but the range of their effect sizes is similar. Compared with effect sizes in the top 10% of heritability, including most discovered thus far, those in the bottom 10-50% are orders of magnitude smaller and more numerous, spanning a large fraction of the genome.
Collapse
|
28
|
Shan N, Xie Y, Song S, Jiang W, Wang Z, Hou L. A novel transcriptional risk score for risk prediction of complex human diseases. Genet Epidemiol 2021; 45:811-820. [PMID: 34245595 DOI: 10.1002/gepi.22424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 06/08/2021] [Accepted: 06/24/2021] [Indexed: 11/06/2022]
Abstract
Recently polygenetic risk score (PRS) has been successfully used in the risk prediction of complex human diseases. Many studies incorporated internal information, such as effect size distribution, or external information, such as linkage disequilibrium, functional annotation, and pleiotropy among multiple diseases, to optimize the performance of PRS. To leverage on multiomics datasets, we developed a novel flexible transcriptional risk score (TRS), in which messenger RNA expression levels were imputed and weighted for risk prediction. In simulation studies, we demonstrated that single-tissue TRS has greater prediction power than LDpred, especially when there is a large effect of gene expression on the phenotype. Multitissue TRS improves prediction accuracy when there are multiple tissues with independent contributions to disease risk. We applied our method to complex traits, including Crohn's disease, type 2 diabetes, and so on. The single-tissue TRS method outperformed LDpred and AnnoPred across the tested traits. The performance of multitissue TRS is trait-dependent. Moreover, our method can easily incorporate information from epigenomic and proteomic data upon the availability of reference datasets.
Collapse
Affiliation(s)
- Nayang Shan
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Yuhan Xie
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
29
|
Wang SB, Coppersmith DDL, Kleiman EM, Bentley KH, Millner AJ, Fortgang R, Mair P, Dempsey W, Huffman JC, Nock MK. A Pilot Study Using Frequent Inpatient Assessments of Suicidal Thinking to Predict Short-Term Postdischarge Suicidal Behavior. JAMA Netw Open 2021; 4:e210591. [PMID: 33687442 PMCID: PMC7944382 DOI: 10.1001/jamanetworkopen.2021.0591] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
IMPORTANCE The weeks following discharge from psychiatric hospitalization are the highest-risk period for suicide attempts. Real-time monitoring of suicidal thoughts via smartphone prompts may be more indicative of short-term risk than a single, cross-sectional assessment. OBJECTIVE To test whether modeling dynamic changes in real-time suicidal thoughts during psychiatric hospitalization can improve predictions of postdischarge suicide attempts vs using only baseline (ie, admission) data or using the mean level of real-time suicidal thoughts during hospitalization. DESIGN, SETTING, AND PARTICIPANTS In this prognostic study, 83 adults recruited from the inpatient psychiatric unit at Massachusetts General Hospital completed ecological momentary assessment surveys of suicidal thinking 4 to 6 times per day during hospitalization as well as brief follow-up surveys assessing suicide attempts at 2 and 4 weeks after discharge. Participants completed at least 3 real-time monitoring surveys. Inclusion criteria included hospitalization for suicidal thoughts and/or behaviors and English fluency. Data were collected from January 2016 to December 2018 and analyzed from January to December 2020. MAIN OUTCOMES AND MEASURES The primary outcome was suicide attempt in the month after discharge. RESULTS Of 83 participants (mean [SD] age, 38.4 [13.6] years; 43 [51.8%] male participants; 69 [83.1%] White individuals), 9 (10.8%) made a suicide attempt in the month after discharge. Mean cross-validated AUC for elastic net models revealed predictive accuracy was fair for the model using baseline data (area under the curve [AUC], 0.71; first to third quartile, 0.55-0.88), good for the model using the mean level of real-time suicidal thoughts during hospitalization (AUC, 0.81; first to third quartile, 0.67-0.91), and best for the model using dynamic changes in real-time suicidal thoughts during hospitalization (AUC, 0.89; first to third quartile, 0.81-0.97); this pattern of results held for other classification metrics (eg, accuracy, positive predictive value, Brier score) and when using different cross-validation procedures. Features assessing rapid fluctuations in suicidal thinking emerged as the strongest predictors of posthospital suicide attempts. A final set of models incorporating percentage missingness further improved both the mean (mean AUC, 0.93; first to third quartile, 0.90-1.00) and dynamic feature (mean AUC, 0.93; first to third quartile, 0.88-1.00) models. CONCLUSIONS AND RELEVANCE In this study, collecting real-time data about suicidal thinking during the course of hospitalization significantly improved short-term prediction of posthospitalization suicide attempts. Models including dynamic changes in suicidal thinking over time yielded the best prediction; features that captured rapid changes in suicidal thoughts were particularly strong predictors. Survey noncompletion also emerged as an important predictor of posthospitalization suicide attempts.
Collapse
Affiliation(s)
- Shirley B. Wang
- Department of Psychology, Harvard University, Cambridge, Massachusetts
| | | | - Evan M. Kleiman
- Department of Psychology, Rutgers University, New Brunswick, New Jersey
| | - Kate H. Bentley
- Department of Psychology, Harvard University, Cambridge, Massachusetts
- Department of Psychiatry, Massachusetts General Hospital, Boston
| | - Alexander J. Millner
- Department of Psychology, Harvard University, Cambridge, Massachusetts
- Mental Health Research, Franciscan Children’s, Brighton, Massachusetts
| | - Rebecca Fortgang
- Department of Psychology, Harvard University, Cambridge, Massachusetts
| | - Patrick Mair
- Department of Psychology, Harvard University, Cambridge, Massachusetts
| | - Walter Dempsey
- Department of Biostatistics, University of Michigan, Ann Arbor
| | - Jeff C. Huffman
- Department of Psychiatry, Massachusetts General Hospital, Boston
| | - Matthew K. Nock
- Department of Psychology, Harvard University, Cambridge, Massachusetts
- Department of Psychiatry, Massachusetts General Hospital, Boston
- Mental Health Research, Franciscan Children’s, Brighton, Massachusetts
| |
Collapse
|
30
|
Shi H, Gazal S, Kanai M, Koch EM, Schoech AP, Siewert KM, Kim SS, Luo Y, Amariuta T, Huang H, Okada Y, Raychaudhuri S, Sunyaev SR, Price AL. Population-specific causal disease effect sizes in functionally important regions impacted by selection. Nat Commun 2021; 12:1098. [PMID: 33597505 PMCID: PMC7889654 DOI: 10.1038/s41467-021-21286-1] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Accepted: 01/15/2021] [Indexed: 01/31/2023] Open
Abstract
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Collapse
Affiliation(s)
- Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Steven Gazal
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Evan M Koch
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Armin P Schoech
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Katherine M Siewert
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Samuel S Kim
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Yang Luo
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tiffany Amariuta
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Graduate School of Arts and Sciences, Harvard University, Cambridge, MA, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Yukinori Okada
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, Japan
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alkes L Price
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
31
|
Zhu Z, Hasegawa K, Camargo CA, Liang L. Investigating asthma heterogeneity through shared and distinct genetics: Insights from genome-wide cross-trait analysis. J Allergy Clin Immunol 2020; 147:796-807. [PMID: 32693092 PMCID: PMC7368660 DOI: 10.1016/j.jaci.2020.07.004] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Revised: 07/03/2020] [Accepted: 07/09/2020] [Indexed: 12/17/2022]
Abstract
Asthma is a heterogeneous respiratory disease reflecting distinct pathobiologic mechanisms. These mechanisms are based, at least partly, on different genetic factors shared by many other conditions, such as allergic diseases and obesity. Investigating the shared genetic effects enables better understanding of the mechanisms of phenotypic correlations and is less subject to confounding by environmental factors. The increasing availability of large-scale genome-wide association study (GWAS) for asthma has enabled researchers to examine the genetic contributions to the epidemiologic associations between asthma subtypes and those between coexisting diseases and/or traits and asthma. Studies have found not only shared but also distinct genetic components between asthma subtypes, indicating that the heterogeneity is related to distinct genetics. This review summarizes a recently compiled analytic approach-genome-wide cross-trait analysis-to determine shared and distinct genetic architecture. The genome-wide cross-trait analysis features in several analytic aspects: genetic correlation, cross-trait meta-analysis, Mendelian randomization, polygenic risk score, and functional analysis. In this article, we discuss in detail the scientific goals that can be achieved by these analyses, their advantages, and their limitations. We also make recommendations for future directions: (1) ethnicity-specific asthma GWASs and (2) application of cross-trait methods to multiomics data to dissect the heritability found in GWASs. Finally, these analytic approaches are also applicable to complex and heterogeneous traits beyond asthma.
Collapse
Affiliation(s)
- Zhaozhong Zhu
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Mass; Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Mass; Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, Mass
| | - Kohei Hasegawa
- Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Mass
| | - Carlos A Camargo
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Mass; Department of Emergency Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Mass
| | - Liming Liang
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Mass; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Mass.
| |
Collapse
|
32
|
Miao H, Cao G, Wu XQ, Chen YY, Chen DQ, Chen L, Vaziri ND, Feng YL, Su W, Gao Y, Zhuang S, Yu XY, Zhang L, Guo Y, Zhao YY. Identification of endogenous 1-aminopyrene as a novel mediator of progressive chronic kidney disease via aryl hydrocarbon receptor activation. Br J Pharmacol 2020; 177:3415-3435. [PMID: 32219844 DOI: 10.1111/bph.15062] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 02/28/2020] [Accepted: 03/21/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND AND PURPOSE Increasing evidence has indicated that the high risk of cardiovascular disease in chronic kidney disease (CKD) patients cannot be sufficiently explained by classic risk factors. EXPERIMENTAL APPROACH Based on the least absolute shrinkage and selection operator method, we identified significantly altered renal tissue metabolites during progressive CKD in a 5/6 nephrectomized rat model and in CKD patients. KEY RESULTS Six aryl-containing metabolites (ACMs) were significantly increased from Week 1 to Week 20. They were associated with the activation of aryl hydrocarbon receptor (AhR) and its target genes including CYP1A1, CYP1A2 and CYP1B1, which were further validated by molecular docking. Our study further demonstrated that AhR signalling could be activated by ACM in patients with idiopathic membranous nephropathy, diabetic nephropathy and IgA nephropathy. Most importantly, 1-aminopyrene (AP) showed strong positive and negative correlation with serum creatinine and creatinine clearance, respectively. AP significantly up-regulated the mRNA expressions of AhR and its three target genes in both mice and NRK-52E cells, while this effect was partially weakened in AhR small hairpin RNA-treated mice and NRK-52E cells. Furthermore, dietary flavonoid supplementation ameliorated CKD and renal fibrosis through partially inhibiting the AhR activity via lowering the ACM levels. The antagonistic effect of flavonoids on AhR was deeply influenced by the number and location of hydroxyl and glycosyl groups. CONCLUSION AND IMPLICATIONS We uncovered that endogenous AP is a novel mediator of CKD progression via AhR activation; thus, AhR might serve as a promising target for CKD treatment.
Collapse
Affiliation(s)
- Hua Miao
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Gang Cao
- School of Pharmacy, Zhejiang Chinese Medical University, Hangzhou, Zhejiang, China
| | - Xia-Qing Wu
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Yuan-Yuan Chen
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Dan-Qian Chen
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Lin Chen
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Nosratola D Vaziri
- Division of Nephrology and Hypertension, School of Medicine, University of California Irvine, Irvine, California, USA
| | - Ya-Long Feng
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| | - Wei Su
- Department of Nephrology, Baoji Central Hospital, Baoji, Shaanxi, China
| | - Yi Gao
- Department of Nephrology, The Affiliated Hospital of Northwest University, Xi'an, Shaanxi, China
| | - Shougang Zhuang
- Department of Medicine, Rhode Island Hospital and Alpert Medical School, Brown University, Providence, Rhode Island, USA
| | - Xiao-Yong Yu
- Department of Nephrology, Shaanxi Traditional Chinese Medicine Hospital, Xi'an, Shaanxi, China
| | - Li Zhang
- Department of Nephrology, Xi'an No. 4 Hospital, Xi'an, Shaanxi, China
| | - Yan Guo
- Department of Internal Medicine, University of New Mexico, Albuquerque, New Mexico, USA
| | - Ying-Yong Zhao
- Faculty of Life Science & Medicine, Northwest University, Xi'an, Shaanxi, China
| |
Collapse
|
33
|
Privé F, Vilhjálmsson BJ, Aschard H, Blum MGB. Making the Most of Clumping and Thresholding for Polygenic Scores. Am J Hum Genet 2019; 105:1213-1221. [PMID: 31761295 PMCID: PMC6904799 DOI: 10.1016/j.ajhg.2019.11.001] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022] Open
Abstract
Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.
Collapse
Affiliation(s)
- Florian Privé
- Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France; Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
| | - Bjarni J Vilhjálmsson
- Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Hugues Aschard
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France
| | - Michael G B Blum
- Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France.
| |
Collapse
|
34
|
Karavani E, Zuk O, Zeevi D, Barzilai N, Stefanis NC, Hatzimanolis A, Smyrnis N, Avramopoulos D, Kruglyak L, Atzmon G, Lam M, Lencz T, Carmi S. Screening Human Embryos for Polygenic Traits Has Limited Utility. Cell 2019; 179:1424-1435.e8. [PMID: 31761530 PMCID: PMC6957074 DOI: 10.1016/j.cell.2019.10.033] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/11/2019] [Accepted: 10/25/2019] [Indexed: 12/19/2022]
Abstract
The increasing proportion of variance in human complex traits explained by polygenic scores, along with progress in preimplantation genetic diagnosis, suggests the possibility of screening embryos for traits such as height or cognitive ability. However, the expected outcomes of embryo screening are unclear, which undermines discussion of associated ethical concerns. Here, we use theory, simulations, and real data to evaluate the potential gain of embryo screening, defined as the difference in trait value between the top-scoring embryo and the average embryo. The gain increases very slowly with the number of embryos but more rapidly with the variance explained by the score. Given current technology, the average gain due to screening would be ≈2.5 cm for height and ≈2.5 IQ points for cognitive ability. These mean values are accompanied by wide prediction intervals, and indeed, in large nuclear families, the majority of children top-scoring for height are not the tallest.
Collapse
Affiliation(s)
- Ehud Karavani
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel
| | - Or Zuk
- Department of Statistics, The Hebrew University of Jerusalem, Jerusalem 9190501, Israel
| | - Danny Zeevi
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Nir Barzilai
- Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Department of Genetics, Institute for Aging Research, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Nikos C Stefanis
- Department of Psychiatry, National and Kapodistrian University of Athens Medical School, Eginition Hospital, 115 28 Athens, Greece; University Mental Health Research Institute, 115 27 Athens, Greece; Neurobiology Research Institute, Theodor-Theohari Cozzika Foundation, 115 21 Athens, Greece
| | - Alex Hatzimanolis
- Department of Psychiatry, National and Kapodistrian University of Athens Medical School, Eginition Hospital, 115 28 Athens, Greece; Neurobiology Research Institute, Theodor-Theohari Cozzika Foundation, 115 21 Athens, Greece
| | - Nikolaos Smyrnis
- Department of Psychiatry, National and Kapodistrian University of Athens Medical School, Eginition Hospital, 115 28 Athens, Greece; University Mental Health Research Institute, 115 27 Athens, Greece
| | - Dimitrios Avramopoulos
- Department of Psychiatry, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Leonid Kruglyak
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA; Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA 90095, USA; Howard Hughes Medical Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Gil Atzmon
- Department of Medicine, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Department of Genetics, Institute for Aging Research, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Department of Biology, Faculty of Natural Sciences, University of Haifa, Haifa 3498838, Israel
| | - Max Lam
- Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY 11004, USA; Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY 11030, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Todd Lencz
- Division of Psychiatry Research, Zucker Hillside Hospital, Glen Oaks, NY 11004, USA; Institute of Behavioral Science, Feinstein Institutes of Medical Research, Manhasset, NY 11030, USA; Department of Psychiatry, Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY 11549, USA.
| | - Shai Carmi
- Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel.
| |
Collapse
|
35
|
Janssens ACJW. Validity of polygenic risk scores: are we measuring what we think we are? Hum Mol Genet 2019; 28:R143-R150. [PMID: 31504522 PMCID: PMC7013150 DOI: 10.1093/hmg/ddz205] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 08/14/2019] [Accepted: 08/14/2019] [Indexed: 12/16/2022] Open
Abstract
Polygenic risk scores (PRSs) have become the standard for quantifying genetic liability in the prediction of disease risks. PRSs are generally constructed as weighted sum scores of risk alleles using effect sizes from genome-wide association studies as their weights. The construction of PRSs is being improved with more appropriate selection of independent single-nucleotide polymorphisms (SNPs) and optimized estimation of their weights but is rarely reflected upon from a theoretical perspective, focusing on the validity of the risk score. Borrowing from psychometrics, this paper discusses the validity of PRSs and introduces the three main types of validity that are considered in the evaluation of tests and measurements: construct, content, and criterion validity. This introduction is followed by a discussion of three topics that challenge the validity of PRS, namely, their claimed independence of clinical risk factors, the consequences of relaxing SNP inclusion thresholds and the selection of SNP weights. This discussion of the validity of PRS reminds us that we need to keep questioning if weighted sums of risk alleles are measuring what we think they are in the various scenarios in which PRSs are used and that we need to keep exploring alternative modeling strategies that might better reflect the underlying biological pathways.
Collapse
Affiliation(s)
- A Cecile J W Janssens
- Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Road NE, Atlanta, GA, USA
| |
Collapse
|
36
|
Gene-diet interactions associated with complex trait variation in an advanced intercross outbred mouse line. Nat Commun 2019; 10:4097. [PMID: 31506438 PMCID: PMC6736984 DOI: 10.1038/s41467-019-11952-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 08/13/2019] [Indexed: 12/12/2022] Open
Abstract
Phenotypic variation of quantitative traits is orchestrated by a complex interplay between the environment (e.g. diet) and genetics. However, the impact of gene-environment interactions on phenotypic traits mostly remains elusive. To address this, we feed 1154 mice of an autoimmunity-prone intercross line (AIL) three different diets. We find that diet substantially contributes to the variability of complex traits and unmasks additional genetic susceptibility quantitative trait loci (QTL). By performing whole-genome sequencing of the AIL founder strains, we resolve these QTLs to few or single candidate genes. To address whether diet can also modulate genetic predisposition towards a given trait, we set NZM2410/J mice on similar dietary regimens as AIL mice. Our data suggest that diet modifies genetic susceptibility to lupus and shifts intestinal bacterial and fungal community composition, which precedes clinical disease manifestation. Collectively, our study underlines the importance of including environmental factors in genetic association studies. Complex traits associate with genetic variation and environment and their interaction. Here, the authors study the influence of different diets on trait variability in 1154 outbred mice from an advanced intercross line and find gene-diet interactions associated with spontaneous autoimmunity development in these animals.
Collapse
|
37
|
Evidence for Recent Polygenic Selection on Educational Attainment and Intelligence Inferred from Gwas Hits: A Replication of Previous Findings Using Recent Data. PSYCH 2019. [DOI: 10.3390/psych1010005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Genetic variants identified by three large genome-wide association studies (GWAS) of educational attainment (EA) were used to test a polygenic selection model. Weighted and unweighted polygenic scores (PGS) were calculated and compared across populations using data from the 1000 Genomes (n = 26), HGDP-CEPH (n = 52) and gnomAD (n = 8) datasets. The PGS from the largest EA GWAS was highly correlated to two previously published PGSs (r = 0.96–0.97, N = 26). These factors are both highly predictive of average population IQ (r = 0.9, N = 23) and Learning index (r = 0.8, N = 22) and are robust to tests of spatial autocorrelation. Monte Carlo simulations yielded highly significant p values. In the gnomAD samples, the correlation between PGS and IQ was almost perfect (r = 0.98, N = 8), and ANOVA showed significant population differences in allele frequencies with positive effect. Socioeconomic variables slightly improved the prediction accuracy of the model (from 78–80% to 85–89%), but the PGS explained twice as much of the variance in IQ compared to socioeconomic variables. In both 1000 Genomes and gnomAD, there was a weak trend for lower GWAS significance SNPs to be less predictive of population IQ. Additionally, a subset of SNPs were found in the HGDP-CEPH sample (N = 127). The analysis of this sample yielded a positive correlation with latitude and a low negative correlation with distance from East Africa. This study provides robust results after accounting for spatial autocorrelation with Fst distances and random noise via an empirical Monte Carlo simulation using null SNPs.
Collapse
|
38
|
Zhu Z, Lin Y, Li X, Driver JA, Liang L. Shared genetic architecture between metabolic traits and Alzheimer's disease: a large-scale genome-wide cross-trait analysis. Hum Genet 2019; 138:271-285. [PMID: 30805717 PMCID: PMC7193309 DOI: 10.1007/s00439-019-01988-9] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 02/20/2019] [Indexed: 02/06/2023]
Abstract
A growing number of studies clearly demonstrate a substantial link between metabolic dysfunction and the risk of Alzheimer's disease (AD), especially glucose-related dysfunction; one hypothesis for this comorbidity is the presence of a common genetic etiology. We conducted a large-scale cross-trait GWAS to investigate the genetic overlap between AD and ten metabolic traits. Among all the metabolic traits, fasting glucose, fasting insulin and HDL were found to be genetically associated with AD. Local genetic covariance analysis found that 19q13 region had strong local genetic correlation between AD and T2D (P = 6.78 × 10- 22), LDL (P = 1.74 × 10- 253) and HDL (P = 7.94 × 10- 18). Cross-trait meta-analysis identified 4 loci that were associated with AD and fasting glucose, 3 loci that were associated with AD and fasting insulin, and 20 loci that were associated with AD and HDL (Pmeta < 1.6 × 10- 8, single trait P < 0.05). Functional analysis revealed that the shared genes are enriched in amyloid metabolic process, lipoprotein remodeling and other related biological pathways; also in pancreas, liver, blood and other tissues. Our work identifies common genetic architectures shared between AD and fasting glucose, fasting insulin and HDL, and sheds light on molecular mechanisms underlying the association between metabolic dysregulation and AD.
Collapse
Affiliation(s)
- Zhaozhong Zhu
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yifei Lin
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jane A Driver
- Geriatric Research Education and Clinical Center and Massachusetts Veterans Epidemiology Research and Information Center, VA Medical Center, Boston, MA, USA
- Division of Aging, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Liming Liang
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|