1
|
Gaggero A, Ajnakina O, Zucchelli E, Hackett RA. The effect of heavy smoking on retirement risk: A mendelian randomisation analysis. Addict Behav 2024; 157:108078. [PMID: 38889551 DOI: 10.1016/j.addbeh.2024.108078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/30/2024] [Accepted: 06/01/2024] [Indexed: 06/20/2024]
Abstract
BACKGROUND AND AIMS The extent to which heavy smoking and retirement risk are causally related remains to be determined. To overcome the endogeneity of heavy smoking behaviour, we employed a novel approach by exploiting the genetic predisposition to heavy smoking, as measured with a polygenic risk score (PGS), in a Mendelian Randomisation approach. METHODS 8164 participants (mean age 68.86 years) from the English Longitudinal Study of Ageing had complete data on smoking behaviour, employment and a heavy smoking PGS. Heavy smoking was indexed as smoking at least 20 cigarettes a day. A time-to-event Mendelian Randomization (MR) analysis, using a complementary log-log (cloglog) link function, was employed to model the retirement risk. RESULTS Our results show that being a heavy smoker significantly increases the risk of retirement (β = 1.324, standard error = 0.622, p < 0.05). Results were robust to a battery of checks and a placebo analysis considering the never-smokers. CONCLUSIONS Overall, our findings support a causal pathway from heavy smoking to earlier retirement.
Collapse
Affiliation(s)
- Alessio Gaggero
- Department of Quantitative Methods for Economics and Business, Universidad de Granada (UGR), Spain.
| | - Olesya Ajnakina
- Department of Biostatistics & Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Department of Behavioural Science and Health, Institute of Epidemiology and Health Care, University College London, London, UK.
| | - Eugenio Zucchelli
- Department of Economic Analysis: Economic Theory and Economic History, Universidad Autónoma de Madrid (UAM), Spain; Division of Health Research, Faculty of Health & Medicine, Lancaster University, Lancaster, UK; Institute of Labor Economics (IZA), Bonn, Germany.
| | - Ruth A Hackett
- Health Psychology Section, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK.
| |
Collapse
|
2
|
Sun TH, Wang CC, Liu TY, Lo SC, Huang YX, Chien SY, Chu YD, Tsai FJ, Hsu KC. Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling. Nat Commun 2024; 15:3168. [PMID: 38609356 PMCID: PMC11014845 DOI: 10.1038/s41467-024-47472-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 03/29/2024] [Indexed: 04/14/2024] Open
Abstract
Polygenic scores estimate genetic susceptibility to diseases. We systematically calculated polygenic scores across 457 phenotypes using genotyping array data from China Medical University Hospital. Logistic regression models assessed polygenic scores' ability to predict disease traits. The polygenic score model with the highest accuracy, based on maximal area under the receiver operating characteristic curve (AUC), is provided on the GeneAnaBase website of the hospital. Our findings indicate 49 phenotypes with AUC greater than 0.6, predominantly linked to endocrine and metabolic diseases. Notably, hyperplasia of the prostate exhibited the highest disease prediction ability (P value = 1.01 × 10-19, AUC = 0.874), highlighting the potential of these polygenic scores in preventive medicine and diagnosis. This study offers a comprehensive evaluation of polygenic scores performance across diverse human traits, identifying promising applications for precision medicine and personalized healthcare, thereby inspiring further research and development in this field.
Collapse
Affiliation(s)
- Ting-Hsuan Sun
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Chia-Chun Wang
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Ting-Yuan Liu
- Million-person Precision Medicine Initiative, Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Shih-Chang Lo
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Yi-Xuan Huang
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Shang-Yu Chien
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Yu-De Chu
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan
| | - Fuu-Jen Tsai
- Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan.
- School of Chinese Medicine, China Medical University, Taichung, 40402, Taiwan.
- Division of Pediatric Genetics, Children's Hospital of China Medical University, Taichung, 40447, Taiwan.
- Department of Biotechnology and Bioinformatics, Asia University, Taichung, 41354, Taiwan.
| | - Kai-Cheng Hsu
- Artificial Intelligence Center, China Medical University Hospital, Taichung, 40447, Taiwan.
- Department of Neurology, China Medical University Hospital, Taichung, 40447, Taiwan.
- Department of Medicine, China Medical University, Taichung, 40402, Taiwan.
| |
Collapse
|
3
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 06/06/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
4
|
Adam Y, Sadeeq S, Kumuthini J, Ajayi O, Wells G, Solomon R, Ogunlana O, Adetiba E, Iweala E, Brors B, Adebiyi E. Polygenic Risk Score in African populations: progress and challenges. F1000Res 2023; 11:175. [PMID: 37273966 PMCID: PMC10233318 DOI: 10.12688/f1000research.76218.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2023] [Indexed: 11/23/2023] Open
Abstract
Polygenic Risk Score (PRS) analysis is a method that predicts the genetic risk of an individual towards targeted traits. Even when there are no significant markers, it gives evidence of a genetic effect beyond the results of Genome-Wide Association Studies (GWAS). Moreover, it selects single nucleotide polymorphisms (SNPs) that contribute to the disease with low effect size making it more precise at individual level risk prediction. PRS analysis addresses the shortfall of GWAS by taking into account the SNPs/alleles with low effect size but play an indispensable role to the observed phenotypic/trait variance. PRS analysis has applications that investigate the genetic basis of several traits, which includes rare diseases. However, the accuracy of PRS analysis depends on the genomic data of the underlying population. For instance, several studies show that obtaining higher prediction power of PRS analysis is challenging for non-Europeans. In this manuscript, we review the conventional PRS methods and their application to sub-Saharan African communities. We conclude that lack of sufficient GWAS data and tools is the limiting factor of applying PRS analysis to sub-Saharan populations. We recommend developing Africa-specific PRS methods and tools for estimating and analyzing African population data for clinical evaluation of PRSs of interest and predicting rare diseases.
Collapse
Affiliation(s)
- Yagoub Adam
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Suraju Sadeeq
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Judit Kumuthini
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Olabode Ajayi
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Gordon Wells
- South African National Bioinformatics Institute, Life Sciences Building, University of Western Cape, Cape Town, South Africa
- Centre for Proteomic and Genomic Research, Cape Town, Western Cape, South Africa
| | - Rotimi Solomon
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Olubanke Ogunlana
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Emmanuel Adetiba
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Electrical & Information Engineering (EIE), Covenant University, Ota, Ogun State, 112212, Nigeria
- HRA, Institute for Systems Science, Durban University of Technology, Durban, South Africa
| | - Emeka Iweala
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept of Biochemistry, Covenant University, Ota, Ogun State, 112212, Nigeria
| | - Benedikt Brors
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ezekiel Adebiyi
- Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112212, Nigeria
- Covenant Applied Informatics and Communication Africa Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112212, Nigeria
- Dept Computer & Information Sciences, Covenant University, Ota, Ogun State, 112212, Nigeria
- Applied Bioinformatics Division, German Cancer Research Center (DKFZ), Heidelberg, 69120, Germany
| |
Collapse
|
5
|
Wang C, Zhang J, Veldsman WP, Zhou X, Zhang L. A comprehensive investigation of statistical and machine learning approaches for predicting complex human diseases on genomic variants. Brief Bioinform 2023; 24:6965909. [PMID: 36585786 DOI: 10.1093/bib/bbac552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 11/04/2022] [Accepted: 11/14/2022] [Indexed: 01/01/2023] Open
Abstract
Quantifying an individual's risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. Although several studies have been performed to benchmark the PRS calculation tools and assess their potential to guide future clinical applications, some issues remain to be further investigated, such as lacking (i) various simulated data with different genetic effects; (ii) evaluation of machine learning models and (iii) evaluation on multiple ancestries studies. In this study, we systematically validated and compared 13 statistical methods, 5 machine learning models and 2 ensemble models using simulated data with additive and genetic interaction models, 22 common diseases with internal training sets, 4 common diseases with external summary statistics and 3 common diseases for trans-ancestry studies in UK Biobank. The statistical methods were better in simulated data from additive models and machine learning models have edges for data that include genetic interactions. Ensemble models are generally the best choice by integrating various statistical methods. LDpred2 outperformed the other standalone tools, whereas PRS-CS, lassosum and DBSLMM showed comparable performance. We also identified that disease heritability strongly affected the predictive performance of all methods. Both the number and effect sizes of risk SNPs are important; and sample size strongly influences the performance of all methods. For the trans-ancestry studies, we found that the performance of most methods became worse when training and testing sets were from different populations.
Collapse
Affiliation(s)
- Chonghao Wang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SRA, China
| | - Jing Zhang
- Eye Institute and Department of Ophthalmology, NHC Key Laboratory of Myopia (Fudan University), Eye & ENT Hospital, Fudan University, Shanghai, China
| | | | - Xin Zhou
- Department of Biomedical Engineering, Vanderbilt University, Vanderbilt Place Nashville, 37235, TN, USA
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Hong Kong SRA, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
6
|
Uddin MJ, Hjorthøj C, Ahammed T, Nordentoft M, Ekstrøm CT. The use of polygenic risk scores as a covariate in psychological studies. METHODS IN PSYCHOLOGY 2022. [DOI: 10.1016/j.metip.2022.100099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
|
7
|
Wu T, Liu Z, Mak TSH, Sham PC. Polygenic power calculator: Statistical power and polygenic prediction accuracy of genome-wide association studies of complex traits. Front Genet 2022; 13:989639. [PMID: 36299579 PMCID: PMC9589038 DOI: 10.3389/fgene.2022.989639] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 09/02/2022] [Indexed: 11/13/2022] Open
Abstract
Power calculation is a necessary step when planning genome-wide association studies (GWAS) to ensure meaningful findings. Statistical power of GWAS depends on the genetic architecture of phenotype, sample size, and study design. While several computer programs have been developed to perform power calculation for single SNP association testing, it might be more appropriate for GWAS power calculation to address the probability of detecting any number of associated SNPs. In this paper, we derive the statistical power distribution across causal SNPs under the assumption of a point-normal effect size distribution. We demonstrate how key outcome indices of GWAS are related to the genetic architecture (heritability and polygenicity) of the phenotype through the power distribution. We also provide a fast, flexible and interactive power calculation tool which generates predictions for key GWAS outcomes including the number of independent significant SNPs, the phenotypic variance explained by these SNPs, and the predictive accuracy of resulting polygenic scores. These results could also be used to explore the future behaviour of GWAS as sample sizes increase further. Moreover, we present results from simulation studies to validate our derivation and evaluate the agreement between our predictions and reported GWAS results.
Collapse
Affiliation(s)
- Tian Wu
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Zipeng Liu
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
| | - Timothy Shin Heng Mak
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- Fano Labs, Hong Kong, Hong Kong SAR, China
| | - Pak Chung Sham
- Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- Centre for PanorOmic Sciences, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR, China
- *Correspondence: Pak Chung Sham,
| |
Collapse
|
8
|
Coon H, Shabalin A, Bakian AV, DiBlasi E, Monson ET, Kirby A, Chen D, Fraser A, Yu Z, Staley M, Callor WB, Christensen ED, Crowell SE, Gray D, Crockett DK, Li QS, Keeshin B, Docherty AR. Extended familial risk of suicide death is associated with younger age at death and elevated polygenic risk of suicide. Am J Med Genet B Neuropsychiatr Genet 2022; 189:60-73. [PMID: 35212135 PMCID: PMC9149029 DOI: 10.1002/ajmg.b.32890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 11/19/2021] [Accepted: 01/31/2022] [Indexed: 12/12/2022]
Abstract
Suicide accounts for >800,000 deaths annually worldwide; prevention is an urgent public health issue. Identification of risk factors remains challenging due to complexity and heterogeneity. The study of suicide deaths with increased extended familial risk provides an avenue to reduce etiological heterogeneity and explore traits associated with increased genetic liability. Using extensive genealogical records, we identified high-risk families where distant relatedness of suicides implicates genetic risk. We compared phenotypic and polygenic risk score (PRS) data between suicides in high-risk extended families (high familial risk (HFR), n = 1,634), suicides linked to genealogical data not in any high-risk families (low familial risk (LFR), n = 147), and suicides not linked to genealogical data with unknown familial risk (UFR, n = 1,865). HFR suicides were associated with lower age at death (mean = 39.34 years), more suicide attempts, and more PTSD and trauma diagnoses. For PRS tests, we included only suicides with >90% European ancestry and adjusted for residual ancestry effects. HFR suicides showed markedly higher PRS of suicide death (calculated using cross-validation), supporting specific elevation of genetic risk of suicide in this subgroup, and also showed increased PRS of PTSD, suicide attempt, and risk taking. LFR suicides were substantially older at death (mean = 49.10 years), had fewer psychiatric diagnoses of depression and pain, and significantly lower PRS of depression. Results suggest extended familiality and trauma/PTSD may provide specificity in identifying individuals at genetic risk for suicide death, especially among younger ages, and that LFR of suicide warrants further study regarding the contribution of demographic and medical risks.
Collapse
Affiliation(s)
- Hilary Coon
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Andrey Shabalin
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Amanda V. Bakian
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Emily DiBlasi
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Eric T. Monson
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Anne Kirby
- Department of Occupational TherapyUniversity of UtahSalt Lake CityUtahUSA
| | - Danli Chen
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Alison Fraser
- Pedigree & Population Resource, Huntsman Cancer InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Zhe Yu
- Pedigree & Population Resource, Huntsman Cancer InstituteUniversity of UtahSalt Lake CityUtahUSA
| | - Michael Staley
- Utah State Office of the Medical ExaminerUtah Department of HealthSalt Lake CityUtahUSA
| | | | - Erik D. Christensen
- Utah State Office of the Medical ExaminerUtah Department of HealthSalt Lake CityUtahUSA
| | | | - Douglas Gray
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| | | | - Qingqin S. Li
- Neuroscience Therapeutic AreaJanssen Research & Development LLCTitusvilleUtahUSA
| | - Brooks Keeshin
- Department of PediatricsUniversity of UtahSalt Lake CityUtahUSA
- Primary Children's Hospital Center for Safe and Healthy FamiliesSalt Lake CityUtahUSA
| | - Anna R. Docherty
- Department of Psychiatry & Huntsman Mental Health InstituteUniversity of UtahSalt Lake CityUtahUSA
| |
Collapse
|
9
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
10
|
Fisch GS. Associating complex traits with genetic variants: polygenic risk scores, pleiotropy and endophenotypes. Genetica 2021; 150:183-197. [PMID: 34677750 DOI: 10.1007/s10709-021-00138-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 10/07/2021] [Indexed: 11/29/2022]
Abstract
Genotype-phenotype causal modeling has evolved significantly since Johannsen's and Wright's original designs were published. The development of genomewide assays to interrogate and detect possible causal variants associated with complex traits has expanded the scope of genotype-phenotype research considerably. Clusters of causal variants discovered by genomewide assays and associated with complex traits have been used to develop polygenic risk scores to predict clinical diagnoses of multidimensional human disorders. However, genomewide investigations have met with many challenges to their research designs and statistical complexities which have hindered the reliability and validity of their predictions. Findings linked to differences in heritability estimates between causal clusters and complex traits among unrelated individuals remain a research area of some controversy. Causal models developed from case-control studies as opposed to experiments, as well as other issues concerning the genotype-phenotype causal model and the extent to which various forms of pleiotropy and the concept of the endophenotype add to its complexity, will be reviewed.
Collapse
Affiliation(s)
- Gene S Fisch
- Paul H. Chook Dept. of CIS & Statistics, CUNY/Baruch College, New York, NY, USA.
| |
Collapse
|
11
|
Analysis of genetic differences between psychiatric disorders: exploring pathways and cell types/tissues involved and ability to differentiate the disorders by polygenic scores. Transl Psychiatry 2021; 11:426. [PMID: 34389699 PMCID: PMC8363629 DOI: 10.1038/s41398-021-01545-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 07/13/2021] [Accepted: 08/02/2021] [Indexed: 02/07/2023] Open
Abstract
Although displaying genetic correlations, psychiatric disorders are clinically defined as categorical entities as they each have distinguishing clinical features and may involve different treatments. Identifying differential genetic variations between these disorders may reveal how the disorders differ biologically and help to guide more personalized treatment. Here we presented a statistical framework and comprehensive analysis to identify genetic markers differentially associated with various psychiatric disorders/traits based on GWAS summary statistics, covering 18 psychiatric traits/disorders and 26 comparisons. We also conducted comprehensive analysis to unravel the genes, pathways and SNP functional categories involved, and the cell types and tissues implicated. We also assessed how well one could distinguish between psychiatric disorders by polygenic risk scores (PRS). SNP-based heritabilities (h2snp) were significantly larger than zero for most comparisons. Based on current GWAS data, PRS have mostly modest power to distinguish between psychiatric disorders. For example, we estimated that AUC for distinguishing schizophrenia from major depressive disorder (MDD), bipolar disorder (BPD) from MDD and schizophrenia from BPD were 0.694, 0.602 and 0.618, respectively, while the maximum AUC (based on h2snp) were 0.763, 0.749 and 0.726, respectively. We also uncovered differences in each pair of studied traits in terms of their differences in genetic correlation with comorbid traits. For example, clinically defined MDD appeared to more strongly genetically correlated with other psychiatric disorders and heart disease, when compared to non-clinically defined depression in UK Biobank. Our findings highlight genetic differences between psychiatric disorders and the mechanisms involved. PRS may help differential diagnosis of selected psychiatric disorders in the future with larger GWAS samples.
Collapse
|
12
|
Zhou G, Zhao H. A fast and robust Bayesian nonparametric method for prediction of complex traits using summary statistics. PLoS Genet 2021; 17:e1009697. [PMID: 34310601 PMCID: PMC8341714 DOI: 10.1371/journal.pgen.1009697] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 08/05/2021] [Accepted: 07/05/2021] [Indexed: 12/27/2022] Open
Abstract
Genetic prediction of complex traits has great promise for disease prevention, monitoring, and treatment. The development of accurate risk prediction models is hindered by the wide diversity of genetic architecture across different traits, limited access to individual level data for training and parameter tuning, and the demand for computational resources. To overcome the limitations of the most existing methods that make explicit assumptions on the underlying genetic architecture and need a separate validation data set for parameter tuning, we develop a summary statistics-based nonparametric method that does not rely on validation datasets to tune parameters. In our implementation, we refine the commonly used likelihood assumption to deal with the discrepancy between summary statistics and external reference panel. We also leverage the block structure of the reference linkage disequilibrium matrix for implementation of a parallel algorithm. Through simulations and applications to twelve traits, we show that our method is adaptive to different genetic architectures, statistically robust, and computationally efficient. Our method is available at https://github.com/eldronzhou/SDPR.
Collapse
Affiliation(s)
- Geyu Zhou
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
13
|
Shan N, Xie Y, Song S, Jiang W, Wang Z, Hou L. A novel transcriptional risk score for risk prediction of complex human diseases. Genet Epidemiol 2021; 45:811-820. [PMID: 34245595 DOI: 10.1002/gepi.22424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 06/08/2021] [Accepted: 06/24/2021] [Indexed: 11/06/2022]
Abstract
Recently polygenetic risk score (PRS) has been successfully used in the risk prediction of complex human diseases. Many studies incorporated internal information, such as effect size distribution, or external information, such as linkage disequilibrium, functional annotation, and pleiotropy among multiple diseases, to optimize the performance of PRS. To leverage on multiomics datasets, we developed a novel flexible transcriptional risk score (TRS), in which messenger RNA expression levels were imputed and weighted for risk prediction. In simulation studies, we demonstrated that single-tissue TRS has greater prediction power than LDpred, especially when there is a large effect of gene expression on the phenotype. Multitissue TRS improves prediction accuracy when there are multiple tissues with independent contributions to disease risk. We applied our method to complex traits, including Crohn's disease, type 2 diabetes, and so on. The single-tissue TRS method outperformed LDpred and AnnoPred across the tested traits. The performance of multitissue TRS is trait-dependent. Moreover, our method can easily incorporate information from epigenomic and proteomic data upon the availability of reference datasets.
Collapse
Affiliation(s)
- Nayang Shan
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Yuhan Xie
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Shuang Song
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Lin Hou
- Center for Statistical Science, Department of Industrial Engineering, Tsinghua University, Beijing, China.,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
14
|
Tang H, He Z. Advances and challenges in quantitative delineation of the genetic architecture of complex traits. QUANTITATIVE BIOLOGY 2021; 9:168-184. [PMID: 35492964 PMCID: PMC9053444 DOI: 10.15302/j-qb-021-0249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases. Results This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted. Conclusion GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.
Collapse
Affiliation(s)
- Hua Tang
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Zihuai He
- Department of Neurology and Neurological Sciences, Stanford University, Stanford, CA 94305, USA
- Quantitative Sciences Unit, Department of Medicine, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
15
|
Murray AN, Chandler HL, Lancaster TM. Multimodal hippocampal and amygdala subfield volumetry in polygenic risk for Alzheimer's disease. Neurobiol Aging 2020; 98:33-41. [PMID: 33227567 PMCID: PMC7886309 DOI: 10.1016/j.neurobiolaging.2020.08.022] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 07/28/2020] [Accepted: 08/02/2020] [Indexed: 11/29/2022]
Abstract
Preclinical models of Alzheimer's disease (AD) suggest that volumetric reductions in medial temporal lobe (MTL) structures manifest before clinical onset. AD polygenic risk scores (PRSs) are further linked to reduced MTL volumes (the hippocampus/amygdala); however, the relationship between the PRS and specific subregions remains unclear. We determine the relationship between the AD-PRSs and MTL subregions in a large sample of young participants (N = 730, aged 22–35 years) using a multimodal (T1w/T2w) approach. We first demonstrate that the PRSs for the hippocampus/amygdala predict their respective volumes and specific hippocampal subregions (pFDR < 0.05). We further observe negative relationships between the AD-PRSs and whole hippocampal/amygdala volumes. Critically, we demonstrate novel associations between the AD-PRSs and specific hippocampal subfields such as CA1 (β = −0.096, pFDR = 0.045) and the fissure (β = −0.101, pFDR = 0.041). We provide evidence that the AD-PRS is linked to specific MTL subfields decades before AD onset. This may help inform preclinical models of AD risk, providing additional specificity for intervention and further insight into mechanisms by which common AD variants confer susceptibility. Polygenic risk for Alzheimer's disease (AD-PRS) explains significant proportion of AD. AD-PRS also linked to hippocampus and amygdala volume. AD-PRS is negatively associated with specific hippocampal subfields. Polygenic AD models help us understand genetic contributions to medial temporal lobe nuclei.
Collapse
Affiliation(s)
- Amy N Murray
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Hannah L Chandler
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, United Kingdom
| | - Thomas M Lancaster
- Cardiff University Brain Research Imaging Centre (CUBRIC), School of Psychology, Cardiff University, Cardiff, United Kingdom; Dementia Research Institute at Cardiff University, School of Medicine, Cardiff University, Cardiff, United Kingdom; School of Psychology, Bath University, Bath, United Kingdom.
| |
Collapse
|
16
|
Babb de Villiers C, Kroese M, Moorthie S. Understanding polygenic models, their development and the potential application of polygenic scores in healthcare. J Med Genet 2020; 57:725-732. [PMID: 32376789 PMCID: PMC7591711 DOI: 10.1136/jmedgenet-2019-106763] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 03/09/2020] [Accepted: 03/28/2020] [Indexed: 02/06/2023]
Abstract
The use of genomic information to better understand and prevent common complex diseases has been an ongoing goal of genetic research. Over the past few years, research in this area has proliferated with several proposed methods of generating polygenic scores. This has been driven by the availability of larger data sets, primarily from genome-wide association studies and concomitant developments in statistical methodologies. Here we provide an overview of the methodological aspects of polygenic model construction. In addition, we consider the state of the field and implications for potential applications of polygenic scores for risk estimation within healthcare.
Collapse
Affiliation(s)
| | - Mark Kroese
- PHG Foundation, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Sowmiya Moorthie
- PHG Foundation, University of Cambridge, Cambridge, Cambridgeshire, UK
| |
Collapse
|
17
|
Deutsch AR, Selya AS. Stability in effects of different smoking-related polygenic risk scores over age and smoking phenotypes. Drug Alcohol Depend 2020; 214:108154. [PMID: 32645681 PMCID: PMC7423706 DOI: 10.1016/j.drugalcdep.2020.108154] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 06/17/2020] [Accepted: 06/24/2020] [Indexed: 10/23/2022]
Abstract
PURPOSE Polygenic risk scores (PRSs) for smoking behavior largely fail to consider the demonstrated developmental change in genetic influence over age and stage of smoking behaviors. Additionally, few studies have examined how stage-specific smoking PRSs (e.g. for initiation vs. smoking heaviness) generalize to other stages of risk. The current study examines the stability of PRS effects over age, and how specifically calibrated PRSs associate with other smoking phenotypes. METHODS 7228 participants were from the National Longitudinal Study of Adolescent to Adult Health, who had calculated PRSs for two smoking phenotypes, Centers for Disease Control and Prevention (CDC) smoking initiation status, and cigarettes per day (CPD). Four time-varying effects models estimated associations between both PRSs and four smoking phenotypes (CDC status, cigarettes/day on smoking days, any past-30 day smoking, and past-30 day daily smoking) over adolescence and young adulthood. FINDINGS The time-varying effects models demonstrated that both PRSs significantly associated with all four phenotypes age. PRS effects were similar, in both odds ratios and the overlap of 95 % confidence intervals. There were increases in PRS associations with quantity of smoking over age, and a decrease in PRS effects over age for the CDC smoking status phenotype over early to late adolescence. CONCLUSIONS Smoking PRSs can be robust predictors of smoking behavior over age. However, the lack of differentiation between specific PRSs and multiple smoking phenotypes, as well as the added contribution of both PRSs to explaining genetic variance, indicates a need to reconceptualize phenotypic measurement used to calibrate smoking PRSs.
Collapse
Affiliation(s)
- Arielle R. Deutsch
- Sanford Research, Behavioral Sciences,University of South Dakota School of Medicine, Pediatrics
| | - Arielle S. Selya
- Sanford Research, Behavioral Sciences,University of South Dakota School of Medicine, Pediatrics
| |
Collapse
|
18
|
Abstract
Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520, USA
| |
Collapse
|
19
|
Yanes T, McInerney-Leo AM, Law MH, Cummings S. The emerging field of polygenic risk scores and perspective for use in clinical care. Hum Mol Genet 2020; 29:R165-R176. [DOI: 10.1093/hmg/ddaa136] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 06/30/2020] [Accepted: 07/01/2020] [Indexed: 02/06/2023] Open
Abstract
Abstract
Genetic testing is used widely for diagnostic, carrier and predictive testing in monogenic diseases. Until recently, there were no genetic testing options available for multifactorial complex diseases like heart disease, diabetes and cancer. Genome-wide association studies (GWAS) have been invaluable in identifying single-nucleotide polymorphisms (SNPs) associated with increased or decreased risk for hundreds of complex disorders. For a given disease, SNPs can be combined to generate a cumulative estimation of risk known as a polygenic risk score (PRS). After years of research, PRSs are increasingly used in clinical settings. In this article, we will review the literature on how both genome-wide and restricted PRSs are developed and the relative merit of each. The validation and evaluation of PRSs will also be discussed, including the recognition that PRS validity is intrinsically linked to the methodological and analytical approach of the foundation GWAS together with the ethnic characteristics of that cohort. Specifically, population differences may affect imputation accuracy, risk magnitude and direction. Even as PRSs are being introduced into clinical practice, there is a push to combine them with clinical and demographic risk factors to develop a holistic disease risk. The existing evidence regarding the clinical utility of PRSs is considered across four different domains: informing population screening programs, guiding therapeutic interventions, refining risk for families at high risk, and facilitating diagnosis and predicting prognostic outcomes. The evidence for clinical utility in relation to five well-studied disorders is summarized. The potential ethical, legal and social implications are also highlighted.
Collapse
Affiliation(s)
- Tatiane Yanes
- Dermatology Research Centre, The University of Queensland Diamantina Institute, The University of Queensland, Brisbane, QLD 4102, Australia
| | - Aideen M McInerney-Leo
- Dermatology Research Centre, The University of Queensland Diamantina Institute, The University of Queensland, Brisbane, QLD 4102, Australia
| | - Matthew H Law
- Statistical Genetics Lab, QIMR Berghofer Medical Research Institute, Herston QLD 4006, Australia
- Faculty of Health, School of Biomedical Sciences, and Institute of Health and Biomedical Innovation, Queensland University of Technology, Kelvin Grove QLD 4059, Australia
| | | |
Collapse
|
20
|
Chun S, Imakaev M, Hui D, Patsopoulos NA, Neale BM, Kathiresan S, Stitziel NO, Sunyaev SR. Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics. Am J Hum Genet 2020; 107:46-59. [PMID: 32470373 PMCID: PMC7332650 DOI: 10.1016/j.ajhg.2020.05.004] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Accepted: 05/01/2020] [Indexed: 02/07/2023] Open
Abstract
In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.
Collapse
Affiliation(s)
- Sung Chun
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Maxim Imakaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Daniel Hui
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Nikolaos A Patsopoulos
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham & Women's Hospital, Boston, MA 02115, USA
| | - Benjamin M Neale
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Sekar Kathiresan
- Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Nathan O Stitziel
- Cardiovascular Division, Department of Medicine, Washington University School of Medicine, Saint Louis, MO 63110, USA; Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA; McDonnell Genome Institute, Washington University School of Medicine, Saint Louis, MO 63110, USA.
| | - Shamil R Sunyaev
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA; Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA.
| |
Collapse
|
21
|
Abstract
The term axial spondyloarthritis (axSpA) encompasses a heterogeneous group of diseases that have variable presentations, extra-articular manifestations and clinical outcomes, and that will respond differently to treatments. The prototypical type of axSpA, ankylosing spondylitis, is thought to be caused by interaction between the genetically primed host immune system and gut microbiota. Currently used biomarkers such as HLA-B27 status, C-reactive protein and erythrocyte sedimentation rate have, at best, moderate diagnostic and predictive value. Improved biomarkers are needed for axSpA to assist with early diagnosis and to better predict treatment responses and long-term outcomes. Advances in a range of 'omics' technologies and statistical approaches, including genomics approaches (such as polygenic risk scores), microbiome profiling and, potentially, transcriptomic, proteomic and metabolomic profiling, are making it possible for more informative biomarker sets to be developed for use in such clinical applications. Future developments in this field will probably involve combinations of biomarkers that require novel statistical approaches to analyse and to produce easy to interpret metrics for clinical application. Large publicly available datasets from well-characterized case-cohort studies that use extensive biological sampling, particularly focusing on early disease and responses to medications, are required to establish successful biomarker discovery and validation programmes.
Collapse
|
22
|
Yang S, Zhou X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. Am J Hum Genet 2020; 106:679-693. [PMID: 32330416 PMCID: PMC7212266 DOI: 10.1016/j.ajhg.2020.03.013] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/30/2020] [Indexed: 01/24/2023] Open
Abstract
Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, Nanjing Medical University, Nanjing, Jiangsu 211166, China; Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
23
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
24
|
Abstract
PURPOSE OF REVIEW Large genome-wide association studies (GWAS) have identified variants accounting for a substantial portion of the heritable risk for coronary artery disease (CAD). These studies have catalyzed drug discovery and generated the possibility of improved risk prediction and stratification. Here, we review the current state-of-the art in polygenic risk scores (PRSs) and look to the future, as these scores move towards clinical application. RECENT FINDINGS Over the last decade, multilocus PRSs for CAD have expanded to include millions of variants and demonstrated strong association with CAD outcomes, even when adjusted for traditional risk factors. Recently, PRSs have shown better prediction of CAD outcomes than any single traditional risk factor alone. Advances in statistical methods used to generate PRSs have improved their predictive ability and transferability between populations with varied ancestries. Initial clinical studies have also demonstrated the potential of genetic information to impact shared decision-making between patients and providers, leading to improved outcomes. SUMMARY PRSs can improve risk stratification for CAD especially in white/European populations and have the potential to alter routine clinical care. However, unlocking this potential will require additional research in PRSs in nonwhite populations and substantial investment in clinical implementation studies.
Collapse
|
25
|
Song S, Jiang W, Hou L, Zhao H. Leveraging effect size distributions to improve polygenic risk scores derived from summary statistics of genome-wide association studies. PLoS Comput Biol 2020; 16:e1007565. [PMID: 32045423 PMCID: PMC7039528 DOI: 10.1371/journal.pcbi.1007565] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 02/24/2020] [Accepted: 11/25/2019] [Indexed: 12/29/2022] Open
Abstract
Genetic risk prediction is an important problem in human genetics, and accurate prediction can facilitate disease prevention and treatment. Calculating polygenic risk score (PRS) has become widely used due to its simplicity and effectiveness, where only summary statistics from genome-wide association studies are needed in the standard method. Recently, several methods have been proposed to improve standard PRS by utilizing external information, such as linkage disequilibrium and functional annotations. In this paper, we introduce EB-PRS, a novel method that leverages information for effect sizes across all the markers to improve prediction accuracy. Compared to most existing genetic risk prediction methods, our method does not need to tune parameters nor external information. Real data applications on six diseases, including asthma, breast cancer, celiac disease, Crohn's disease, Parkinson's disease and type 2 diabetes show that EB-PRS achieved 307.1%, 42.8%, 25.5%, 3.1%, 74.3% and 49.6% relative improvements in terms of predictive r2 over standard PRS method with optimally tuned parameters. Besides, compared to LDpred that makes use of LD information, EB-PRS also achieved 37.9%, 33.6%, 8.6%, 36.2%, 40.6% and 10.8% relative improvements. We note that our method is not the first method leveraging effect size distributions. Here we first justify our method by presenting theoretical optimal property over existing methods in this class of methods, and substantiate our theoretical result with extensive simulation results. The R-package EBPRS that implements our method is available on CRAN.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Wei Jiang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, Beijing, China
- Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Hongyu Zhao
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
26
|
Yin L, Chau CKL, Sham PC, So HC. Integrating Clinical Data and Imputed Transcriptome from GWAS to Uncover Complex Disease Subtypes: Applications in Psychiatry and Cardiology. Am J Hum Genet 2019; 105:1193-1212. [PMID: 31785786 PMCID: PMC6904812 DOI: 10.1016/j.ajhg.2019.10.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 10/22/2019] [Indexed: 12/19/2022] Open
Abstract
Classifying subjects into clinically and biologically homogeneous subgroups will facilitate the understanding of disease pathophysiology and development of targeted prevention and intervention strategies. Traditionally, disease subtyping is based on clinical characteristics alone, but subtypes identified by such an approach may not conform exactly to the underlying biological mechanisms. Very few studies have integrated genomic profiles (e.g., those from GWASs) with clinical symptoms for disease subtyping. Here we proposed an analytic framework capable of finding complex diseases subgroups by leveraging both GWAS-predicted gene expression levels and clinical data by a multi-view bicluster analysis. This approach connects SNPs to genes via their effects on expression, so the analysis is more biologically relevant and interpretable than a pure SNP-based analysis. Transcriptome of different tissues can also be readily modeled. We also proposed various evaluation metrics for assessing clustering performance. Our framework was able to subtype schizophrenia subjects into diverse subgroups with different prognosis and treatment response. We also applied the framework to the Northern Finland Birth Cohort (NFBC) 1966 dataset and identified high and low cardiometabolic risk subgroups in a gender-stratified analysis. The prediction strength by cross-validation was generally greater than 80%, suggesting good stability of the clustering model. Our results suggest a more data-driven and biologically informed approach to defining metabolic syndrome and subtyping psychiatric disorders. Moreover, we found that the genes "blindly" selected by the algorithm are significantly enriched for known susceptibility genes discovered in GWASs of schizophrenia or cardiovascular diseases. The proposed framework opens up an approach to subject stratification.
Collapse
Affiliation(s)
- Liangying Yin
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Carlos K L Chau
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Pak-Chung Sham
- Centre for Genomic Sciences, University of Hong Kong, Hong Kong SAR, China; Department of Psychiatry, University of Hong Kong, Hong Kong SAR, China; State Key Laboratory for Cognitive and Brain Sciences, University of Hong Kong, Hong Kong SAR, China
| | - Hon-Cheong So
- School of Biomedical Sciences, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong SAR, China; KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Zoology Institute of Zoology and The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong SAR, China; Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Hong Kong SAR, China; Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen 518000, China.
| |
Collapse
|
27
|
Toulopoulou T, Zhang X, Cherny S, Dickinson D, Berman KF, Straub RE, Sham P, Weinberger DR. Polygenic risk score increases schizophrenia liability through cognition-relevant pathways. Brain 2019; 142:471-485. [PMID: 30535067 DOI: 10.1093/brain/awy279] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 09/19/2018] [Indexed: 02/02/2023] Open
Abstract
Cognitive deficit is thought to represent, at least in part, genetic mechanisms of risk for schizophrenia, with recent evidence from statistical modelling of twin data suggesting direct causality from the former to the latter. However, earlier evidence was based on inferences from twin not molecular genetic data and it is unclear how much genetic influence 'passes through' cognition on the way to diagnosis. Thus, we included direct measurements of genetic risk (e.g. schizophrenia polygenic risk scores) in causation models to assess the extent to which cognitive deficit mediates some of the effect of polygenic risk scores on the disorder. Causal models of family data tested relationships among key variables and allowed parsing of genetic variance components. Polygenic risk scores were calculated from summary statistics from the current largest genome-wide association study of schizophrenia and were represented as a latent trait. Cognition was also modelled as a latent trait. Participants were 1313 members of 1078 families: 416 patients with schizophrenia, 290 unaffected siblings, and 607 controls. Modelling supported earlier findings that cognitive deficit has a putatively causal role in schizophrenia. In total, polygenic risk score explained 8.07% [confidence interval (CI) 5.45-10.74%] of schizophrenia risk in our sample. Of this, more than a third (2.71%, CI 2.41-3.85%) of the polygenic risk score influence was mediated through cognition paths, exceeding the direct influence of polygenic risk score on schizophrenia risk (1.43%, CI 0.46-3.08%). The remainder of the polygenic risk score influence (3.93%, CI 2.37-4.48%) reflected reciprocal causation between schizophrenia liability and cognition (e.g. mutual influences in a cyclical manner). Analysis of genetic variance components of schizophrenia liability indicated that 26.87% (CI 21.45-32.57%) was associated with cognition-related pathways not captured by polygenic risk score. The remaining variance in schizophrenia was through pathways other than cognition-related and polygenic risk score. Although our results are based on inference through statistical modelling and do not provide an absolute proof of causality, we find that cognition pathways mediate a significant part of the influence of cumulative genetic risk on schizophrenia. We estimate from our model that 33.51% (CI 27.34-43.82%) of overall genetic risk is mediated through influences on cognition, but this requires further studies and analyses as the genetics of schizophrenia becomes better characterized.
Collapse
Affiliation(s)
- Timothea Toulopoulou
- Department of Psychology, Bilkent University, Bilkent, Ankara, Turkey.,The State Key Laboratory of Brain and Cognitive Sciences, the University of Hong Kong, Hong Kong SAR, China.,Department of Psychology, the University of Hong Kong, Hong Kong SAR, China.,Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology and Neuroscience at King's College London, London, UK
| | - Xiaowei Zhang
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Stacey Cherny
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology and Neuroscience at King's College London, London, UK.,Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Dwight Dickinson
- Clinical and Translational Neuroscience Branch, National Institute of Mental Health, USA
| | - Karen F Berman
- Clinical and Translational Neuroscience Branch, National Institute of Mental Health, USA
| | - Richard E Straub
- Lieber Institute for Brain Development, Johns Hopkins University, USA
| | - Pak Sham
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry Psychology and Neuroscience at King's College London, London, UK.,Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Johns Hopkins University, USA.,Departments of Psychiatry, Neurology, Neuroscience, The McKusick Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Johns Hopkins University, USA
| |
Collapse
|
28
|
Fritsche LG, Beesley LJ, VandeHaar P, Peng RB, Salvatore M, Zawistowski M, Gagliano Taliun SA, Das S, LeFaive J, Kaleba EO, Klumpner TT, Moser SE, Blanc VM, Brummett CM, Kheterpal S, Abecasis GR, Gruber SB, Mukherjee B. Exploring various polygenic risk scores for skin cancer in the phenomes of the Michigan genomics initiative and the UK Biobank with a visual catalog: PRSWeb. PLoS Genet 2019; 15:e1008202. [PMID: 31194742 PMCID: PMC6592565 DOI: 10.1371/journal.pgen.1008202] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 06/25/2019] [Accepted: 05/17/2019] [Indexed: 01/08/2023] Open
Abstract
Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.
Collapse
Affiliation(s)
- Lars G. Fritsche
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Lauren J. Beesley
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Peter VandeHaar
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Robert B. Peng
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Maxwell Salvatore
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Matthew Zawistowski
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Sarah A. Gagliano Taliun
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Sayantan Das
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Jonathon LeFaive
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Erin O. Kaleba
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Thomas T. Klumpner
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Stephanie E. Moser
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Victoria M. Blanc
- Central Biorepository, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
| | - Chad M. Brummett
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sachin Kheterpal
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan, United States of America
- Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gonçalo R. Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Stephen B. Gruber
- USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Epidemiology, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
- University of Michigan Rogel Cancer Center, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
29
|
Chasioti D, Yan J, Nho K, Saykin AJ. Progress in Polygenic Composite Scores in Alzheimer's and Other Complex Diseases. Trends Genet 2019; 35:371-382. [PMID: 30922659 PMCID: PMC6475476 DOI: 10.1016/j.tig.2019.02.005] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Revised: 02/12/2019] [Accepted: 02/22/2019] [Indexed: 11/25/2022]
Abstract
Advances in high-throughput genotyping and next-generation sequencing (NGS) coupled with larger sample sizes brings the realization of precision medicine closer than ever. Polygenic approaches incorporating the aggregate influence of multiple genetic variants can contribute to a better understanding of the genetic architecture of many complex diseases and facilitate patient stratification. This review addresses polygenic concepts, methodological developments, hypotheses, and key issues in study design. Polygenic risk scores (PRSs) have been applied to many complex diseases and here we focus on Alzheimer's disease (AD) as a primary exemplar. This review was designed to serve as a starting point for investigators wishing to use PRSs in their research and those interested in enhancing clinical study designs through enrichment strategies.
Collapse
Affiliation(s)
- Danai Chasioti
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Jingwen Yan
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Kwangsik Nho
- Department of BioHealth Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA; Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | - Andrew J Saykin
- Indiana Alzheimer Disease Center, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| |
Collapse
|
30
|
Johnson EC, Tillman R, Aliev F, Meyers JL, Salvatore JE, Anokhin AP, Dick DM, Edenberg HJ, Kramer JR, Kuperman S, McCutcheon VV, Nurnberger JI, Porjesz B, Schuckit MA, Tischfield J, Bucholz KK, Agrawal A. Exploring the relationship between polygenic risk for cannabis use, peer cannabis use and the longitudinal course of cannabis involvement. Addiction 2019; 114:687-697. [PMID: 30474892 PMCID: PMC6411425 DOI: 10.1111/add.14512] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Revised: 08/13/2018] [Accepted: 11/16/2018] [Indexed: 01/20/2023]
Abstract
BACKGROUND AND AIMS Few studies have explored how polygenic propensity to cannabis use unfolds across development, and no studies have yet examined this question in the context of environmental contributions such as peer cannabis use. Outlining the factors that contribute to progression from cannabis initiation to problem use over time may ultimately provide insights into mechanisms for targeted interventions. We sought to examine the relationships between polygenic liability for cannabis use, cannabis use trajectories from ages 12-30 years and perceived peer cannabis use at ages 12-17 years. DESIGN Mixed-effect logistic and linear regressions were used to examine associations between polygenic risk scores, cannabis use trajectory membership and perceived peer cannabis use. SETTING United States. PARTICIPANTS From the Collaborative Study on the Genetics of Alcoholism (COGA) study, a cohort of 1167 individuals aged 12-26 years at their baseline (i.e. first) interview. MEASUREMENTS Key measurements included life-time cannabis use (yes/no), frequency of past 12-month cannabis use, maximum life-time frequency of cannabis use, cannabis use disorder (using DSM-5 criteria) and perceived peer cannabis use. Polygenic risk scores (PRS) were created using summary statistics from a large (n = 162 082) genome-wide association study (GWAS) of cannabis use. FINDINGS Three trajectories reflecting no/low (n = 844), moderate (n = 137) and high (n = 186) use were identified. PRS were significantly associated with trajectory membership [P = 0.002-0.006, maximum conditional R2 = 1.4%, odds ratios (ORs) = 1.40-1.49]. Individuals who reported that most/all of their best friends used cannabis had significantly higher PRS than those who reported that none of their friends were users [OR = 1.35, 95% confidence interval (CI) = 1.04, 1.75, P = 0.023]. Perceived peer use itself explained up to 11.3% of the variance in trajectory class membership (OR = 1.50-4.65). When peer cannabis use and the cannabis use PRS were entered into the model simultaneously, both the PRS and peer use continued to be significantly associated with class membership (P < 0.01). CONCLUSIONS Genetic propensity to cannabis use derived from heterogeneous samples appears to correlate with longitudinal increases in cannabis use frequency in young adults.
Collapse
Affiliation(s)
- Emma C Johnson
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - Rebecca Tillman
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - Fazil Aliev
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA
- Department of Actuarial and Risk Management, Faculty of Business, Karabuk University, Turkey
| | - Jacquelyn L Meyers
- Department of Psychiatry, SUNY Downstate Medical Center, Brooklyn, NY, USA
| | - Jessica E Salvatore
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA
| | - Andrey P Anokhin
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - Danielle M Dick
- Department of Psychology, Virginia Commonwealth University, Richmond, VA, USA
- Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Howard J Edenberg
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, USA
| | - John R Kramer
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Samuel Kuperman
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Vivia V McCutcheon
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - John I Nurnberger
- Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Bernice Porjesz
- Department of Psychiatry, SUNY Downstate Medical Center, Brooklyn, NY, USA
| | - Marc A Schuckit
- Department of Psychiatry, University of California San Diego Medical School, San Diego, CA, USA
| | - Jay Tischfield
- Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Kathleen K Bucholz
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| | - Arpana Agrawal
- Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA
| |
Collapse
|
31
|
Telenti A, Lippert C, Chang PC, DePristo M. Deep learning of genomic variation and regulatory network data. Hum Mol Genet 2018; 27:R63-R71. [PMID: 29648622 PMCID: PMC6499235 DOI: 10.1093/hmg/ddy115] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 03/26/2018] [Accepted: 03/27/2018] [Indexed: 02/07/2023] Open
Abstract
The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.
Collapse
Affiliation(s)
- Amalio Telenti
- Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | |
Collapse
|