101
|
Mews MA, Naj AC, Griswold AJ, Below JE, Bush WS. Brain and Blood Transcriptome-Wide Association Studies Identify Five Novel Genes Associated with Alzheimer's Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.17.24305737. [PMID: 38699333 PMCID: PMC11065015 DOI: 10.1101/2024.04.17.24305737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2024]
Abstract
INTRODUCTION Transcriptome-wide Association Studies (TWAS) extend genome-wide association studies (GWAS) by integrating genetically-regulated gene expression models. We performed the most powerful AD-TWAS to date, using summary statistics from cis -eQTL meta-analyses and the largest clinically-adjudicated Alzheimer's Disease (AD) GWAS. METHODS We implemented the OTTERS TWAS pipeline, leveraging cis -eQTL data from cortical brain tissue (MetaBrain; N=2,683) and blood (eQTLGen; N=31,684) to predict gene expression, then applied these models to AD-GWAS data (Cases=21,982; Controls=44,944). RESULTS We identified and validated five novel gene associations in cortical brain tissue ( PRKAG1 , C3orf62 , LYSMD4 , ZNF439 , SLC11A2 ) and six genes proximal to known AD-related GWAS loci (Blood: MYBPC3 ; Brain: MTCH2 , CYB561 , MADD , PSMA5 , ANXA11 ). Further, using causal eQTL fine-mapping, we generated sparse models that retained the strength of the AD-TWAS association for MTCH2 , MADD , ZNF439 , CYB561 , and MYBPC3 . DISCUSSION Our comprehensive AD-TWAS discovered new gene associations and provided insights into the functional relevance of previously associated variants.
Collapse
|
102
|
Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.09.22.23295969. [PMID: 37790574 PMCID: PMC10543037 DOI: 10.1101/2023.09.22.23295969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The role of gene-environment (GxE) interaction in disease and complex trait architectures is widely hypothesized, but currently unknown. Here, we apply three statistical approaches to quantify and distinguish three different types of GxE interaction for a given trait and E variable. First, we detect locus-specific GxE interaction by testing for genetic correlation r g < 1 across E bins. Second, we detect genome-wide effects of the E variable on genetic variance by leveraging polygenic risk scores (PRS) to test for significant PRSxE in a regression of phenotypes on PRS, E, and PRSxE, together with differences in SNP-heritability across E bins. Third, we detect genome-wide proportional amplification of genetic and environmental effects as a function of the E variable by testing for significant PRSxE with no differences in SNP-heritability across E bins. Simulations show that these approaches achieve high sensitivity and specificity in distinguishing these three GxE scenarios. We applied our framework to 33 UK Biobank traits (25 quantitative traits and 8 diseases; average N = 325 K ) and 10 E variables spanning lifestyle, diet, and other environmental exposures. First, we identified 19 trait-E pairs with r g significantly < 1 (FDR<5%) (average r g = 0.95 ); for example, white blood cell count had r g = 0.95 (s.e. 0.01) between smokers and non-smokers. Second, we identified 28 trait-E pairs with significant PRSxE and significant SNP-heritability differences across E bins; for example, BMI had a significant PRSxE for physical activity (P=4.6e-5) with 5% larger SNP-heritability in the largest versus smallest quintiles of physical activity (P=7e-4). Third, we identified 15 trait-E pairs with significant PRSxE with no SNP-heritability differences across E bins; for example, waist-hip ratio adjusted for BMI had a significant PRSxE effect for time spent watching television (P=5e-3) with no SNP-heritability differences. Across the three scenarios, 8 of the trait-E pairs involved disease traits, whose interpretation is complicated by scale effects. Analyses using biological sex as the E variable produced additional significant findings in each of the three scenarios. Overall, we infer a significant contribution of GxE and GxSex effects to complex trait and disease variance.
Collapse
Affiliation(s)
- Arun Durvasula
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Genetics, Harvard Medical School, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
103
|
Øvretveit K, Ingeström EML, Spitieris M, Tragante V, Wade KH, Thomas LF, Wolford BN, Wisløff U, Gudbjartsson DF, Holm H, Stefansson K, Brumpton BM, Hveem K. Polygenic risk scores associate with blood pressure traits across the lifespan. Eur J Prev Cardiol 2024; 31:644-654. [PMID: 38007706 PMCID: PMC11025038 DOI: 10.1093/eurjpc/zwad365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 10/18/2023] [Accepted: 11/02/2023] [Indexed: 11/28/2023]
Abstract
AIMS Hypertension is a major modifiable cause of morbidity and mortality that affects over 1 billion people worldwide. Blood pressure (BP) traits have a strong genetic component that can be quantified with polygenic risk scores (PRSs). To date, the performance of BP PRSs has mainly been assessed in adults, and less is known about polygenic hypertension risk in childhood. METHODS AND RESULTS Multiple PRSs for systolic BP (SBP), diastolic BP (DBP), and pulse pressure were developed using either genome-wide significant weights, pruning and thresholding, or Bayesian regression. Among 87 total PRSs, the top performer for each trait was applied in independent cohorts of children and adult to assess genotype-phenotype associations and disease risk across the lifespan. Differences between those with low (1st decile), average (2nd-9th decile), and high (10th decile) PRS emerge in the first years of life and are maintained throughout adulthood. These diverging BP trajectories also seem to affect cardiovascular and renal disease risk, with increased risk observed among those in the top decile and reduced risk among those in the bottom decile of the polygenic risk distribution compared with the rest of the population. CONCLUSION Genetic risk factors are associated with BP traits across the lifespan, beginning in the first years of life. Given the importance of exposure time in disease pathogenesis and the early rise in BP levels among those genetically susceptible, PRSs may help identify high-risk individuals prior to hypertension onset, facilitate primordial prevention, and reduce the burden of this public health challenge.
Collapse
Affiliation(s)
- Karsten Øvretveit
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
| | - Emma M L Ingeström
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Michail Spitieris
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
- Department of Mathematical Sciences, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | | | - Kaitlin H Wade
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 1TH, UK
- Population Health Science, Bristol Medical School, Bristol BS8 1TH, UK
- Avon Longitudinal Study of Parents and Children, Bristol BS8 1TH, UK
| | - Laurent F Thomas
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Brooke N Wolford
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
| | - Ulrik Wisløff
- Department of Circulation and Medical Imaging, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Daniel F Gudbjartsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
| | - Hilma Holm
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, Reykjavik, Iceland
| | - Ben M Brumpton
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
- HUNT Research Centre, Department of Public Health and Nursing, Norwegian University of Science and Technology, Levanger, Norway
| | - Kristian Hveem
- K.G. Jebsen Centre for Genetic Epidemiology, Faculty of Medicine and Health Sciences, Department of Public Health and Nursing, Norwegian University of Science and Technology (NTNU), Postboks 8905, N-7491 Trondheim, Norway
- Department of Innovation and Research, St. Olavs Hospital, Trondheim, Norway
| |
Collapse
|
104
|
Willems YE, Raffington L, Ligthart L, Pool R, Hottenga JJ, Finkenauer C, Bartels M. No gene by stressful life events interaction on individual differences in adults' self-control. Front Psychiatry 2024; 15:1388264. [PMID: 38693999 PMCID: PMC11061522 DOI: 10.3389/fpsyt.2024.1388264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Accepted: 04/03/2024] [Indexed: 05/03/2024] Open
Abstract
Background Difficulty with self-control, or the ability to alter impulses and behavior in a goal-directed way, predicts interpersonal conflict, lower socioeconomic attainments, and more adverse health outcomes. Etiological understanding, and intervention for low self-control is, therefore, a public health goal. A prominent developmental theory proposes that individuals with high genetic propensity for low self-control that are also exposed to stressful environments may be most at-risk of low levels of self-control. Here we examine if polygenic measures associated with behaviors marked by low self-control interact with stressful life events in predicting self-control. Methods Leveraging molecular data from a large population-based Dutch sample (N = 7,090, Mage = 41.2) to test for effects of genetics (i.e., polygenic scores for ADHD and aggression), stressful life events (e.g., traffic accident, violent assault, financial problems), and a gene-by-stress interaction on self-control (measured with the ASEBA Self-Control Scale). Results Both genetics (β =.03 -.04, p <.001) and stressful life events (β = .11 -.14, p <.001) were associated with individual differences in self-control. We find no evidence of a gene-by-stressful life events interaction on individual differences in adults' self-control. Conclusion Our findings are consistent with the notion that genetic influences and stressful life events exert largely independent effects on adult self-control. However, the small effect sizes of polygenic scores increases the likelihood of null results. Genetically-informed longitudinal research in large samples can further inform the etiology of individual differences in self-control from early childhood into later adulthood and its downstream implications for public health.
Collapse
Affiliation(s)
- Yayouk Eva Willems
- Max Planck Institute for Human Development, Max Planck Research Group Biosocial – Biology, Social Disparities, and Development, Berlin, Germany
| | - Laurel Raffington
- Max Planck Institute for Human Development, Max Planck Research Group Biosocial – Biology, Social Disparities, and Development, Berlin, Germany
| | - Lannie Ligthart
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Rene Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Jouke Jan Hottenga
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
| | - Catrin Finkenauer
- Department of Interdisciplinary Social Science, Universiteit Utrecht, Utrecht, Netherlands
| | - Meike Bartels
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, Netherlands
- Amsterdam Public Health Research Institute, Amsterdam University Medical Centres, Amsterdam, Netherlands
| |
Collapse
|
105
|
Alireza Z, Maleeha M, Kaikkonen M, Fortino V. Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection. J Transl Med 2024; 22:356. [PMID: 38627847 PMCID: PMC11020205 DOI: 10.1186/s12967-024-05090-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024] Open
Abstract
Machine learning (ML) methods are increasingly becoming crucial in genome-wide association studies for identifying key genetic variants or SNPs that statistical methods might overlook. Statistical methods predominantly identify SNPs with notable effect sizes by conducting association tests on individual genetic variants, one at a time, to determine their relationship with the target phenotype. These genetic variants are then used to create polygenic risk scores (PRSs), estimating an individual's genetic risk for complex diseases like cancer or cardiovascular disorders. Unlike traditional methods, ML algorithms can identify groups of low-risk genetic variants that improve prediction accuracy when combined in a mathematical model. However, the application of ML strategies requires addressing the feature selection challenge to prevent overfitting. Moreover, ensuring the ML model depends on a concise set of genomic variants enhances its clinical applicability, where testing is feasible for only a limited number of SNPs. In this study, we introduce a robust pipeline that applies ML algorithms in combination with feature selection (ML-FS algorithms), aimed at identifying the most significant genomic variants associated with the coronary artery disease (CAD) phenotype. The proposed computational approach was tested on individuals from the UK Biobank, differentiating between CAD and non-CAD individuals within this extensive cohort, and benchmarked against standard PRS-based methodologies like LDpred2 and Lassosum. Our strategy incorporates cross-validation to ensure a more robust evaluation of genomic variant-based prediction models. This method is commonly applied in machine learning strategies but has often been neglected in previous studies assessing the predictive performance of polygenic risk scores. Our results demonstrate that the ML-FS algorithm can identify panels with as few as 50 genetic markers that can achieve approximately 80% accuracy when used in combination with known risk factors. The modest increase in accuracy over PRS performances is noteworthy, especially considering that PRS models incorporate a substantially larger number of genetic variants. This extensive variant selection can pose practical challenges in clinical settings. Additionally, the proposed approach revealed novel CAD-genetic variant associations.
Collapse
Affiliation(s)
- Z Alireza
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland
| | - M Maleeha
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland
| | - M Kaikkonen
- A.I.Virtanen Institute, University of Eastern Finland, 70210, Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210, Kuopio, Finland.
| |
Collapse
|
106
|
Zhang J, Zhan J, Jin J, Ma C, Zhao R, O'Connell J, Jiang Y, Koelsch BL, Zhang H, Chatterjee N. An ensemble penalized regression method for multi-ancestry polygenic risk prediction. Nat Commun 2024; 15:3238. [PMID: 38622117 PMCID: PMC11271575 DOI: 10.1038/s41467-024-47357-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 03/28/2024] [Indexed: 04/17/2024] Open
Abstract
Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination ofL 1 (lasso) andL 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
Collapse
Affiliation(s)
- Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| | | | - Jin Jin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Ma
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
107
|
Zhang T, Zhou G, Klei L, Liu P, Chouldechova A, Zhao H, Roeder K, G'Sell M, Devlin B. Evaluating and improving health equity and fairness of polygenic scores. HGG ADVANCES 2024; 5:100280. [PMID: 38402414 PMCID: PMC10937319 DOI: 10.1016/j.xhgg.2024.100280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 02/26/2024] Open
Abstract
Polygenic scores (PGSs) are quantitative metrics for predicting phenotypic values, such as human height or disease status. Some PGS methods require only summary statistics of a relevant genome-wide association study (GWAS) for their score. One such method is Lassosum, which inherits the model selection advantages of Lasso to select a meaningful subset of the GWAS single-nucleotide polymorphisms as predictors from their association statistics. However, even efficient scores like Lassosum, when derived from European-based GWASs, are poor predictors of phenotype for subjects of non-European ancestry; that is, they have limited portability to other ancestries. To increase the portability of Lassosum, when GWAS information and estimates of linkage disequilibrium are available for both ancestries, we propose Joint-Lassosum (JLS). In the simulation settings we explore, JLS provides more accurate PGSs compared to other methods, especially when measured in terms of fairness. In analyses of UK Biobank data, JLS was computationally more efficient but slightly less accurate than a Bayesian comparator, SDPRX. Like all PGS methods, JLS requires selection of predictors, which are determined by data-driven tuning parameters. We describe a new approach to selecting tuning parameters and note its relevance for model selection for any PGS. We also draw connections to the literature on algorithmic fairness and discuss how JLS can help mitigate fairness-related harms that might result from the use of PGSs in clinical settings. While no PGS method is likely to be universally portable, due to the diversity of human populations and unequal information content of GWASs for different ancestries, JLS is an effective approach for enhancing portability and reducing predictive bias.
Collapse
Affiliation(s)
- Tianyu Zhang
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| | - Geyu Zhou
- Department of Biostatistics, Yale University, New Haven, CT 06511, USA
| | - Lambertus Klei
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Peng Liu
- Merck Research Laboratories, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Alexandra Chouldechova
- Microsoft Research NYC, New York, NY 10012, USA; Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT 06511, USA
| | - Kathryn Roeder
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Max G'Sell
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
108
|
Zhang J, Zhan J, Jin J, Ma C, Zhao R, O’Connell J, Jiang Y, 23andMe Research Team, Koelsch BL, Zhang H, Chatterjee N. An Ensemble Penalized Regression Method for Multi-ancestry Polygenic Risk Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.15.532652. [PMID: 36993331 PMCID: PMC10055041 DOI: 10.1101/2023.03.15.532652] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Abstract
Great efforts are being made to develop advanced polygenic risk scores (PRS) to improve the prediction of complex traits and diseases. However, most existing PRS are primarily trained on European ancestry populations, limiting their transferability to non-European populations. In this article, we propose a novel method for generating multi-ancestry Polygenic Risk scOres based on enSemble of PEnalized Regression models (PROSPER). PROSPER integrates genome-wide association studies (GWAS) summary statistics from diverse populations to develop ancestry-specific PRS with improved predictive power for minority populations. The method uses a combination of ℒ 1 (lasso) and ℒ 2 (ridge) penalty functions, a parsimonious specification of the penalty parameters across populations, and an ensemble step to combine PRS generated across different penalty parameters. We evaluate the performance of PROSPER and other existing methods on large-scale simulated and real datasets, including those from 23andMe Inc., the Global Lipids Genetics Consortium, and All of Us. Results show that PROSPER can substantially improve multi-ancestry polygenic prediction compared to alternative methods across a wide variety of genetic architectures. In real data analyses, for example, PROSPER increased out-of-sample prediction R2 for continuous traits by an average of 70% compared to a state-of-the-art Bayesian method (PRS-CSx) in the African ancestry population. Further, PROSPER is computationally highly scalable for the analysis of large SNP contents and many diverse populations.
Collapse
Affiliation(s)
- Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | - Jin Jin
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Cheng Ma
- Department of Statistics, University of Michigan, Ann Arbor, MI, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | | | | | | | | | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
109
|
Jin J, Zhan J, Zhang J, Zhao R, O'Connell J, Jiang Y, Buyske S, Gignoux C, Haiman C, Kenny EE, Kooperberg C, North K, Koelsch BL, Wojcik G, Zhang H, Chatterjee N. MUSSEL: Enhanced Bayesian polygenic risk prediction leveraging information across multiple ancestry groups. CELL GENOMICS 2024; 4:100539. [PMID: 38604127 PMCID: PMC11019365 DOI: 10.1016/j.xgen.2024.100539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 09/07/2023] [Accepted: 03/14/2024] [Indexed: 04/13/2024]
Abstract
Polygenic risk scores (PRSs) are now showing promising predictive performance on a wide variety of complex traits and diseases, but there exists a substantial performance gap across populations. We propose MUSSEL, a method for ancestry-specific polygenic prediction that borrows information in summary statistics from genome-wide association studies (GWASs) across multiple ancestry groups via Bayesian hierarchical modeling and ensemble learning. In our simulation studies and data analyses across four distinct studies, totaling 5.7 million participants with a substantial ancestral diversity, MUSSEL shows promising performance compared to alternatives. For example, MUSSEL has an average gain in prediction R2 across 11 continuous traits of 40.2% and 49.3% compared to PRS-CSx and CT-SLEB, respectively, in the African ancestry population. The best-performing method, however, varies by GWAS sample size, target ancestry, trait architecture, and linkage disequilibrium reference samples; thus, ultimately a combination of methods may be needed to generate the most robust PRSs across diverse populations.
Collapse
Affiliation(s)
- Jin Jin
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA; Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19103, USA.
| | | | - Jingning Zhang
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Ruzhang Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | | | | | - Steven Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ 08854, USA
| | - Christopher Gignoux
- Colorado Center for Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Christopher Haiman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA
| | - Eimear E Kenny
- Icahn Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Kari North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27516, USA
| | | | - Genevieve Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA
| | - Haoyu Zhang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20892, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA; Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD 21205, USA.
| |
Collapse
|
110
|
Truong B, Hull LE, Ruan Y, Huang QQ, Hornsby W, Martin H, van Heel DA, Wang Y, Martin AR, Lee SH, Natarajan P. Integrative polygenic risk score improves the prediction accuracy of complex traits and diseases. CELL GENOMICS 2024; 4:100523. [PMID: 38508198 PMCID: PMC11019356 DOI: 10.1016/j.xgen.2024.100523] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/15/2023] [Accepted: 02/20/2024] [Indexed: 03/22/2024]
Abstract
Polygenic risk scores (PRSs) are an emerging tool to predict the clinical phenotypes and outcomes of individuals. We propose PRSmix, a framework that leverages the PRS corpus of a target trait to improve prediction accuracy, and PRSmix+, which incorporates genetically correlated traits to better capture the human genetic architecture for 47 and 32 diseases/traits in European and South Asian ancestries, respectively. PRSmix demonstrated a mean prediction accuracy improvement of 1.20-fold (95% confidence interval [CI], [1.10; 1.3]; p = 9.17 × 10-5) and 1.19-fold (95% CI, [1.11; 1.27]; p = 1.92 × 10-6), and PRSmix+ improved the prediction accuracy by 1.72-fold (95% CI, [1.40; 2.04]; p = 7.58 × 10-6) and 1.42-fold (95% CI, [1.25; 1.59]; p = 8.01 × 10-7) in European and South Asian ancestries, respectively. Compared to the previously cross-trait-combination methods with scores from pre-defined correlated traits, we demonstrated that our method improved prediction accuracy for coronary artery disease up to 3.27-fold (95% CI, [2.1; 4.44]; p value after false discovery rate (FDR) correction = 2.6 × 10-4). Our method provides a comprehensive framework to benchmark and leverage the combined power of PRS for maximal performance in a desired target population.
Collapse
Affiliation(s)
- Buu Truong
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Leland E Hull
- Division of General Internal Medicine, Massachusetts General Hospital, 100 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA
| | - Yunfeng Ruan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Qin Qin Huang
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - Whitney Hornsby
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | - Hilary Martin
- Department of Human Genetics, Wellcome Sanger Institute, Cambridge, UK
| | - David A van Heel
- Blizard Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Ying Wang
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Alicia R Martin
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA; Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia Cancer Research Institute, University of South Australia, Adelaide, SA 5000, Australia
| | - Pradeep Natarajan
- Program in Medical and Population Genetics and the Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA 02142, USA; Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA; Department of Medicine, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA.
| |
Collapse
|
111
|
Schwarzerova J, Hurta M, Barton V, Lexa M, Walther D, Provaznik V, Weckwerth W. A perspective on genetic and polygenic risk scores-advances and limitations and overview of associated tools. Brief Bioinform 2024; 25:bbae240. [PMID: 38770718 PMCID: PMC11106636 DOI: 10.1093/bib/bbae240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 04/14/2024] [Accepted: 05/03/2024] [Indexed: 05/22/2024] Open
Abstract
Polygenetic Risk Scores are used to evaluate an individual's vulnerability to developing specific diseases or conditions based on their genetic composition, by taking into account numerous genetic variations. This article provides an overview of the concept of Polygenic Risk Scores (PRS). We elucidate the historical advancements of PRS, their advantages and shortcomings in comparison with other predictive methods, and discuss their conceptual limitations in light of the complexity of biological systems. Furthermore, we provide a survey of published tools for computing PRS and associated resources. The various tools and software packages are categorized based on their technical utility for users or prospective developers. Understanding the array of available tools and their limitations is crucial for accurately assessing and predicting disease risks, facilitating early interventions, and guiding personalized healthcare decisions. Additionally, we also identify potential new avenues for future bioinformatic analyzes and advancements related to PRS.
Collapse
Affiliation(s)
- Jana Schwarzerova
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 10, Brno 61600, Czechia
- Molecular Systems Biology (MOSYS), Department of Functional and Evolutionary Ecology, University of Vienna, Vienna 1010, Austria
| | - Martin Hurta
- Department of Computer Systems, Faculty of Information Technology, Brno University of Technology, Brno 612 00, Czechia
| | - Vojtech Barton
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 10, Brno 61600, Czechia
- RECETOX, Faculty of Science, Masaryk University, Kotlarska 2, Brno 62500, Czech Republic
| | - Matej Lexa
- Faculty of Informatics, Masaryk University, Botanicka 68a, Brno 60200, Czech Republic
| | - Dirk Walther
- Max-Planck-Institute of Molecular Plant Physiology, Potsdam 14476, Germany
| | - Valentine Provaznik
- Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 10, Brno 61600, Czechia
- Department of Physiology, Faculty of Medicine, Masaryk University, Brno 62500, Czech Republic
| | - Wolfram Weckwerth
- Molecular Systems Biology (MOSYS), Department of Functional and Evolutionary Ecology, University of Vienna, Vienna 1010, Austria
- Vienna Metabolomics Center (VIME), University of Vienna, Vienna 1010, Austria
| |
Collapse
|
112
|
Sunde HF, Eftedal NH, Cheesman R, Corfield EC, Kleppesto TH, Seierstad AC, Ystrom E, Eilertsen EM, Torvik FA. Genetic similarity between relatives provides evidence on the presence and history of assortative mating. Nat Commun 2024; 15:2641. [PMID: 38531929 PMCID: PMC10966108 DOI: 10.1038/s41467-024-46939-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 03/13/2024] [Indexed: 03/28/2024] Open
Abstract
Assortative mating - the non-random mating of individuals with similar traits - is known to increase trait-specific genetic variance and genetic similarity between relatives. However, empirical evidence is limited for many traits, and the implications hinge on whether assortative mating has started recently or many generations ago. Here we show theoretically and empirically that genetic similarity between relatives can provide evidence on the presence and history of assortative mating. First, we employed path analysis to understand how assortative mating affects genetic similarity between family members across generations, finding that similarity between distant relatives is more affected than close relatives. Next, we correlated polygenic indices of 47,135 co-parents from the Norwegian Mother, Father, and Child Cohort Study (MoBa) and found genetic evidence of assortative mating in nine out of sixteen examined traits. The same traits showed elevated similarity between relatives, especially distant relatives. Six of the nine traits, including educational attainment, showed greater genetic variance among offspring, which is inconsistent with stable assortative mating over many generations. These results suggest an ongoing increase in familial similarity for these traits. The implications of this research extend to genetic methodology and the understanding of social and economic disparities.
Collapse
Affiliation(s)
- Hans Fredrik Sunde
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.
- Department of Psychology, University of Oslo, Oslo, Norway.
| | | | - Rosa Cheesman
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | - Elizabeth C Corfield
- Nic Waals Institute, Lovisenberg Diakonale Hospital, Oslo, Norway
- PsychGen Centre for Genetic Epidemiology and Mental Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Thomas H Kleppesto
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Eivind Ystrom
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
- PsychGen Centre for Genetic Epidemiology and Mental Health, Norwegian Institute of Public Health, Oslo, Norway
| | - Espen Moen Eilertsen
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | - Fartein Ask Torvik
- Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| |
Collapse
|
113
|
Song M, Kwak SH, Kim J. Risk prediction and interaction analysis using polygenic risk score of type 2 diabetes in a Korean population. Sci Rep 2024; 14:6790. [PMID: 38514700 PMCID: PMC10957984 DOI: 10.1038/s41598-024-55945-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Accepted: 02/29/2024] [Indexed: 03/23/2024] Open
Abstract
Joint modelling of genetic and environmental risk factors can provide important information to predict the risk of type 2 diabetes (T2D). Therefore, to predict the genetic risk of T2D, we constructed a polygenic risk score (PRS) using genotype data of one Korean cohort, KARE (745 cases and 2549 controls), and the genome-wide association study summary statistics of Biobank Japan. We evaluated the performance of PRS in an independent Korean cohort, HEXA (5684 cases and 35,703 controls). Individuals with T2D had a significantly higher mean PRS than controls (0.492 vs. - 0.078, p ≈ 0 ). PRS predicted the risk of T2D with an AUC of 0.658 (95% CI 0.651-0.666). We also evaluated interaction between PRS and waist circumference (WC) in the HEXA cohort. PRS exhibited a significant sub-multiplicative interaction with WC (ORinteraction 0.991, 95% CI 0.987-0.995, pinteraction = 4.93 × 10-6) in T2D. The effect of WC on T2D decreased as PRS increased. The sex-specific analyses produced similar interaction results, revealing a decreased WC effect on T2D as the PRS increased. In conclusion, the risk of WC for T2D may differ depending on PRS and those with a high PRS might develop T2D with a lower WC threshold. Our findings are expected to improve risk prediction for T2D and facilitate the identification of individuals at an increased risk of T2D.
Collapse
Affiliation(s)
- Minsun Song
- Department of Statistics & Research Institute of Natural Sciences, Sookmyung Women's University, Seoul, 04310, Korea
| | - Soo Heon Kwak
- Department of Internal Medicine, Seoul National University Hospital, Seoul, 03080, Korea.
| | - Jihyun Kim
- Department of Statistics, Sookmyung Women's University, Seoul, 04310, Korea
| |
Collapse
|
114
|
Platt DE, Guzmán-Sáenz A, Bose A, Saha S, Utro F, Parida L. AI-enabled evaluation of genome-wide association relevance and polygenic risk score prediction in Alzheimer's disease. iScience 2024; 27:109209. [PMID: 38439972 PMCID: PMC10910245 DOI: 10.1016/j.isci.2024.109209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/05/2023] [Accepted: 02/07/2024] [Indexed: 03/06/2024] Open
Abstract
GWAS focuses on significance loosing false positives; machine learning probes sub-significant features relying on predictivity. Yet, these are far from orthogonal. We sought to explore how these inform each other in sub-genome-wide significant situations to define relevance for predictive features. We introduce the SVM-based RubricOE that selects heavily cross-validated feature sets, and LDpred2 PRS as a strong contrast to SVM, to explore significance and predictivity. Our Alzheimer's test case notoriously lacks strong genetic signals except for few very strong phenotype-SNP associations, which suits the problem we are exploring. We found that the most significant SNPs among ML and PRS-selected SNPs captured most of the predictivity, while weaker associations tend also to contribute weakly to predictivity. SNPs with weak associations tend not to contribute to predictivity, but deletion of these features does not injure it. Significance provides a ranking that helps identify weakly predictive features.
Collapse
Affiliation(s)
- Daniel E. Platt
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | - Aldo Guzmán-Sáenz
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | - Aritra Bose
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | | | - Filippo Utro
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY, USA
| | - Laxmi Parida
- IBM T. J. Watson Research Center, Yorktown Heights, New York, NY, USA
| |
Collapse
|
115
|
Mooney MA, Hermosillo RJM, Feczko E, Miranda-Dominguez O, Moore LA, Perrone A, Byington N, Grimsrud G, Rueter A, Nousen E, Antovich D, Feldstein Ewing SW, Nagel BJ, Nigg JT, Fair DA. Cumulative Effects of Resting-State Connectivity Across All Brain Networks Significantly Correlate with Attention-Deficit Hyperactivity Disorder Symptoms. J Neurosci 2024; 44:e1202232023. [PMID: 38286629 PMCID: PMC10919250 DOI: 10.1523/jneurosci.1202-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 11/30/2023] [Accepted: 12/18/2023] [Indexed: 01/31/2024] Open
Abstract
Identification of replicable neuroimaging correlates of attention-deficit hyperactivity disorder (ADHD) has been hindered by small sample sizes, small effects, and heterogeneity of methods. Given evidence that ADHD is associated with alterations in widely distributed brain networks and the small effects of individual brain features, a whole-brain perspective focusing on cumulative effects is warranted. The use of large, multisite samples is crucial for improving reproducibility and clinical utility of brain-wide MRI association studies. To address this, a polyneuro risk score (PNRS) representing cumulative, brain-wide, ADHD-associated resting-state functional connectivity was constructed and validated using data from the Adolescent Brain Cognitive Development (ABCD, N = 5,543, 51.5% female) study, and was further tested in the independent Oregon-ADHD-1000 case-control cohort (N = 553, 37.4% female). The ADHD PNRS was significantly associated with ADHD symptoms in both cohorts after accounting for relevant covariates (p < 0.001). The most predictive PNRS involved all brain networks, though the strongest effects were concentrated among the default mode and cingulo-opercular networks. In the longitudinal Oregon-ADHD-1000, non-ADHD youth had significantly lower PNRS (Cohen's d = -0.318, robust p = 5.5 × 10-4) than those with persistent ADHD (age 7-19). The PNRS, however, did not mediate polygenic risk for ADHD. Brain-wide connectivity was robustly associated with ADHD symptoms in two independent cohorts, providing further evidence of widespread dysconnectivity in ADHD. Evaluation in enriched samples demonstrates the promise of the PNRS approach for improving reproducibility in neuroimaging studies and unraveling the complex relationships between brain connectivity and behavioral disorders.
Collapse
Affiliation(s)
- Michael A Mooney
- Division of Bioinformatics and Computational Biology, Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon 97239
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon 97239
- Center for Mental Health Innovation, Oregon Health & Science University, Portland, Oregon 97239
| | - Robert J M Hermosillo
- Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota 55454
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Eric Feczko
- Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota 55454
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Oscar Miranda-Dominguez
- Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota 55454
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455
| | - Lucille A Moore
- Department of Neurology, Oregon Health & Science University, Portland, Oregon 97239
| | - Anders Perrone
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Nora Byington
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Gracie Grimsrud
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Amanda Rueter
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
| | - Elizabeth Nousen
- Center for Mental Health Innovation, Oregon Health & Science University, Portland, Oregon 97239
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, Oregon 97239
| | - Dylan Antovich
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, Oregon 97239
| | | | - Bonnie J Nagel
- Center for Mental Health Innovation, Oregon Health & Science University, Portland, Oregon 97239
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, Oregon 97239
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, Oregon 97239
| | - Joel T Nigg
- Center for Mental Health Innovation, Oregon Health & Science University, Portland, Oregon 97239
- Division of Psychology, Department of Psychiatry, Oregon Health & Science University, Portland, Oregon 97239
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, Oregon 97239
| | - Damien A Fair
- Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota 55454
- Masonic Institute for the Developing Brain, University of Minnesota, Minneapolis, Minnesota 55414
- Institute of Child Development, College of Education and Human Development, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
116
|
Zhao T, Wang F, Mott R, Dekkers J, Cheng H. Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality. Genetics 2024; 226:iyad210. [PMID: 38085098 PMCID: PMC11090459 DOI: 10.1093/genetics/iyad210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 11/13/2023] [Indexed: 03/08/2024] Open
Abstract
To adhere to and capitalize on the benefits of the FAIR (findable, accessible, interoperable, and reusable) principles in agricultural genome-to-phenome studies, it is crucial to address privacy and intellectual property issues that prevent sharing and reuse of data in research and industry. Direct sharing of genotype and phenotype data is often prohibited due to intellectual property and privacy concerns. Thus, there is a pressing need for encryption methods that obscure confidential aspects of the data, without affecting the outcomes of certain statistical analyses. A homomorphic encryption method for genotypes and phenotypes (HEGP) has been proposed for single-marker regression in genome-wide association studies (GWAS) using linear mixed models with Gaussian errors. This methodology permits frequentist likelihood-based parameter estimation and inference. In this paper, we extend HEGP to broader applications in genome-to-phenome analyses. We show that HEGP is suited to commonly used linear mixed models for genetic analyses of quantitative traits including genomic best linear unbiased prediction (GBLUP) and ridge-regression best linear unbiased prediction (RR-BLUP), as well as Bayesian variable selection methods (e.g. those in Bayesian Alphabet), for genetic parameter estimation, genomic prediction, and GWAS. By advancing the capabilities of HEGP, we offer researchers and industry professionals a secure and efficient approach for collaborative genomic analyses while preserving data confidentiality.
Collapse
Affiliation(s)
- Tianjing Zhao
- Department of Animal Science, University of California, Davis, CA 95616, USA
- Department of Animal Science, University of Nebraska-Lincoln, Lincoln, NE 68583, USA
| | - Fangyi Wang
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - Richard Mott
- Genetics Institute, University College London, London, WC1E 6BT, UK
| | - Jack Dekkers
- Department of Animal Science, Iowa State University, Ames, IA 50011, USA
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
117
|
Maity S, Dutta D, Terhorst J, Sun Y, Banerjee M. A linear adjustment-based approach to posterior drift in transfer learning. Biometrika 2024; 111:31-50. [PMID: 38948430 PMCID: PMC11212525 DOI: 10.1093/biomet/asad029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Indexed: 07/02/2024] Open
Abstract
We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.
Collapse
Affiliation(s)
- Subha Maity
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, U.S.A.
| | - Diptavo Dutta
- Integrative Tumor Epidemiology Branch, Division of Cancer Epidemiology & Genetics, National Cancer Institute, 9609 Medical Center Drive, Bethesda, Maryland 20892, U.S.A
| | | | | | - Moulinath Banerjee
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, Michigan 48109, U.S.A
| |
Collapse
|
118
|
Madrid-Valero JJ, Barclay NL, Gregory AM. The interaction between polygenic risk and environmental influences: A direct test of the 3P model of insomnia in adolescents. J Child Psychol Psychiatry 2024; 65:308-315. [PMID: 37792459 PMCID: PMC10922170 DOI: 10.1111/jcpp.13895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/07/2023] [Indexed: 10/05/2023]
Abstract
BACKGROUND Stress is a universal phenomenon and one of the most common precipitants of insomnia. However, not everyone develops insomnia after experiencing a stressful life event. This study aims to test aspects of Spielman's '3P model of insomnia' (during adolescence) by exploring the extent to which: (a) insomnia symptoms are predicted by polygenic scores (PGS); (b) life events predict insomnia symptoms; (c) the interaction between PGS and life events contribute to the prediction of insomnia symptoms; (d) gene-environment interaction effects remain after controlling for sex. METHODS The sample comprised 4,629 twins aged 16 from the Twin Early Development Study who reported on their insomnia symptoms and life events. PGS for insomnia were calculated. In order to test the main hypothesis of this study (a significant interaction between PGS and negative life events), we fitted a series of mixed effect regressions. RESULTS The best fit was provided by the model including sex, PGS for insomnia, negative life events, and their interactions (AIC = 26,158.7). Our results show that the association between insomnia symptoms and negative life events is stronger for those with a higher genetic risk for insomnia. CONCLUSIONS This work sheds light on the complex relationship between genetic and environmental factors implicated for insomnia. This study has tested for the first time the interaction between genetic predisposition (PGS) for insomnia and environmental stressors (negative life events) in adolescents. This work represents a direct test of components of Spielman's 3P model for insomnia which is supported by our results.
Collapse
Affiliation(s)
- Juan J Madrid-Valero
- Department of Health Psychology, Faculty of Health Sciences, University of Alicante, Alicante, Spain
- Department of Human Anatomy and Psychobiology, University of Murcia, Murcia, Spain
| | - Nicola L Barclay
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Alice M Gregory
- Department of Psychology, Goldsmiths, University of London, London, United Kingdom
| |
Collapse
|
119
|
Garg E, Arguello-Pascualli P, Vishnyakova O, Halevy AR, Yoo S, Brooks JD, Bull SB, Gagnon F, Greenwood CMT, Hung RJ, Lawless JF, Lerner-Ellis J, Dennis JK, Abraham RJS, Garant JM, Thiruvahindrapuram B, Jones SJM, CGEn HostSeq Initiative, Strug LJ, Paterson AD, Sun L, Elliott LT. Canadian COVID-19 host genetics cohort replicates known severity associations. PLoS Genet 2024; 20:e1011192. [PMID: 38517939 PMCID: PMC10990181 DOI: 10.1371/journal.pgen.1011192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 04/03/2024] [Accepted: 02/22/2024] [Indexed: 03/24/2024] Open
Abstract
The HostSeq initiative recruited 10,059 Canadians infected with SARS-CoV-2 between March 2020 and March 2023, obtained clinical information on their disease experience and whole genome sequenced (WGS) their DNA. We analyzed the WGS data for genetic contributors to severe COVID-19 (considering 3,499 hospitalized cases and 4,975 non-hospitalized after quality control). We investigated the evidence for replication of loci reported by the International Host Genetics Initiative (HGI); analyzed the X chromosome; conducted rare variant gene-based analysis and polygenic risk score testing. Population stratification was adjusted for using meta-analysis across ancestry groups. We replicated two loci identified by the HGI for COVID-19 severity: the LZTFL1/SLC6A20 locus on chromosome 3 and the FOXP4 locus on chromosome 6 (the latter with a variant significant at P < 5E-8). We found novel significant associations with MRAS and WDR89 in gene-based analyses, and constructed a polygenic risk score that explained 1.01% of the variance in severe COVID-19. This study provides independent evidence confirming the robustness of previously identified COVID-19 severity loci by the HGI and identifies novel genes for further investigation.
Collapse
Affiliation(s)
- Elika Garg
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Paola Arguello-Pascualli
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Olga Vishnyakova
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Anat R. Halevy
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Samantha Yoo
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
| | - Jennifer D. Brooks
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Shelley B. Bull
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - France Gagnon
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Celia M. T. Greenwood
- Gerald Bronfman Department of Oncology, Department of Epidemiology, Biostatistics and Occupational Health, Department of Human Genetics, McGill University, Montreal, Quebec, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Rayjean J. Hung
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Jerald F. Lawless
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Jordan Lerner-Ellis
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Mount Sinai Hospital, Toronto, Ontario, Canada
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
| | - Jessica K. Dennis
- BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Rohan J. S. Abraham
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | - Jean-Michel Garant
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Steven J. M. Jones
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
- Canada’s Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
| | | | - Lisa J. Strug
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Andrew D. Paterson
- Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Lei Sun
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Lloyd T. Elliott
- Department of Statistics and Actuarial Science, Simon Fraser University, Vancouver, British Columbia, Canada
| |
Collapse
|
120
|
Singh M, Verhulst B, Vinh P, Zhou Y(D, Castro-de-Araujo LFS, Hottenga JJ, Pool R, de Geus EJC, Vink JM, Boomsma DI, Maes HHM, Dolan CV, Neale MC. Using Instrumental Variables to Measure Causation over Time in Cross-Lagged Panel Models. MULTIVARIATE BEHAVIORAL RESEARCH 2024; 59:342-370. [PMID: 38358370 PMCID: PMC11014768 DOI: 10.1080/00273171.2023.2283634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2024]
Abstract
Cross-lagged panel models (CLPMs) are commonly used to estimate causal influences between two variables with repeated assessments. The lagged effects in a CLPM depend on the time interval between assessments, eventually becoming undetectable at longer intervals. To address this limitation, we incorporate instrumental variables (IVs) into the CLPM with two study waves and two variables. Doing so enables estimation of both the lagged (i.e., "distal") effects and the bidirectional cross-sectional (i.e., "proximal") effects at each wave. The distal effects reflect Granger-causal influences across time, which decay with increasing time intervals. The proximal effects capture causal influences that accrue over time and can help infer causality when the distal effects become undetectable at longer intervals. Significant proximal effects, with a negligible distal effect, would imply that the time interval is too long to estimate a lagged effect at that time interval using the standard CLPM. Through simulations and an empirical application, we demonstrate the impact of time intervals on causal inference in the CLPM and present modeling strategies to detect causal influences regardless of the time interval in a study. Furthermore, to motivate empirical applications of the proposed model, we highlight the utility and limitations of using genetic variables as IVs in large-scale panel studies.
Collapse
Affiliation(s)
- Madhurbain Singh
- Department of Human and Molecular Genetics, Virginia Commonwealth University
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University
- Department of Biological Psychology, Vrije Universiteit Amsterdam
| | - Brad Verhulst
- Department of Psychiatry and Behavioral Sciences, Texas A&M University
| | - Philip Vinh
- Department of Human and Molecular Genetics, Virginia Commonwealth University
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University
| | - Yi (Daniel) Zhou
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University
- Department of Psychiatry, Virginia Commonwealth University
| | | | - Jouke-Jan Hottenga
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Amsterdam Public Health Research Institute
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Amsterdam Public Health Research Institute
| | - Eco J. C. de Geus
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Amsterdam Public Health Research Institute
| | | | - Dorret I. Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Amsterdam Public Health Research Institute
| | - Hermine H. M. Maes
- Department of Human and Molecular Genetics, Virginia Commonwealth University
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University
| | - Conor V. Dolan
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Amsterdam Public Health Research Institute
| | - Michael C. Neale
- Department of Human and Molecular Genetics, Virginia Commonwealth University
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University
- Department of Biological Psychology, Vrije Universiteit Amsterdam
- Department of Psychiatry, Virginia Commonwealth University
| |
Collapse
|
121
|
Peyrot WJ, Panagiotaropoulou G, Olde Loohuis LM, Adams MJ, Awasthi S, Ge T, McIntosh AM, Mitchell BL, Mullins N, O'Connell KS, Penninx BWJH, Posthuma D, Ripke S, Ruderfer DM, Uffelmann E, Vilhjalmsson BJ, Zhu Z, Smoller JW, Price AL. Distinguishing different psychiatric disorders using DDx-PRS. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.02.24302228. [PMID: 38352307 PMCID: PMC10862992 DOI: 10.1101/2024.02.02.24302228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/24/2024]
Abstract
Despite great progress on methods for case-control polygenic prediction (e.g. schizophrenia vs. control), there remains an unmet need for a method that genetically distinguishes clinically related disorders (e.g. schizophrenia (SCZ) vs. bipolar disorder (BIP) vs. depression (MDD) vs. control); such a method could have important clinical value, especially at disorder onset when differential diagnosis can be challenging. Here, we introduce a method, Differential Diagnosis-Polygenic Risk Score (DDx-PRS), that jointly estimates posterior probabilities of each possible diagnostic category (e.g. SCZ=50%, BIP=25%, MDD=15%, control=10%) by modeling variance/covariance structure across disorders, leveraging case-control polygenic risk scores (PRS) for each disorder (computed using existing methods) and prior clinical probabilities for each diagnostic category. DDx-PRS uses only summary-level training data and does not use tuning data, facilitating implementation in clinical settings. In simulations, DDx-PRS was well-calibrated (whereas a simpler approach that analyzes each disorder marginally was poorly calibrated), and effective in distinguishing each diagnostic category vs. the rest. We then applied DDx-PRS to Psychiatric Genomics Consortium SCZ/BIP/MDD/control data, including summary-level training data from 3 case-control GWAS ( N =41,917-173,140 cases; total N =1,048,683) and held-out test data from different cohorts with equal numbers of each diagnostic category (total N =11,460). DDx-PRS was well-calibrated and well-powered relative to these training sample sizes, attaining AUCs of 0.66 for SCZ vs. rest, 0.64 for BIP vs. rest, 0.59 for MDD vs. rest, and 0.68 for control vs. rest. DDx-PRS produced comparable results to methods that leverage tuning data, confirming that DDx-PRS is an effective method. True diagnosis probabilities in top deciles of predicted diagnosis probabilities were considerably larger than prior baseline probabilities, particularly in projections to larger training sample sizes, implying considerable potential for clinical utility under certain circumstances. In conclusion, DDx-PRS is an effective method for distinguishing clinically related disorders.
Collapse
|
122
|
Quinn TP, Hess JL, Marshe VS, Barnett MM, Hauschild AC, Maciukiewicz M, Elsheikh SSM, Men X, Schwarz E, Trakadis YJ, Breen MS, Barnett EJ, Zhang-James Y, Ahsen ME, Cao H, Chen J, Hou J, Salekin A, Lin PI, Nicodemus KK, Meyer-Lindenberg A, Bichindaritz I, Faraone SV, Cairns MJ, Pandey G, Müller DJ, Glatt SJ. A primer on the use of machine learning to distil knowledge from data in biological psychiatry. Mol Psychiatry 2024; 29:387-401. [PMID: 38177352 PMCID: PMC11228968 DOI: 10.1038/s41380-023-02334-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/21/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024]
Abstract
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Collapse
Affiliation(s)
- Thomas P Quinn
- Applied Artificial Intelligence Institute (A2I2), Burwood, VIC, 3125, Australia
| | - Jonathan L Hess
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Victoria S Marshe
- Institute of Medical Science, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Michelle M Barnett
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Anne-Christin Hauschild
- Department of Medical Informatics, Medical University Center Göttingen, Göttingen, Lower Saxony, 37075, Germany
| | - Malgorzata Maciukiewicz
- Hospital Zurich, University of Zurich, Zurich, 8091, Switzerland
- Department of Rheumatology and Immunology, University Hospital Bern, Bern, 3010, Switzerland
- Department for Biomedical Research (DBMR), University of Bern, Bern, 3010, Switzerland
| | - Samar S M Elsheikh
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
| | - Xiaoyu Men
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON, M5S 1A1, Canada
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Yannis J Trakadis
- Department Human Genetics, McGill University Health Centre, Montreal, QC, H4A 3J1, Canada
| | - Michael S Breen
- Psychiatry, Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Eric J Barnett
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Yanli Zhang-James
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Mehmet Eren Ahsen
- Department of Business Administration, Gies College of Business, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Department of Biomedical and Translational Sciences, Carle-Illinois School of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Junfang Chen
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Jiahui Hou
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Asif Salekin
- Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY, 13244, USA
| | - Ping-I Lin
- Discipline of Psychiatry and Mental Health, University of New South Wales, Sydney, NSW, 2052, Australia
- Mental Health Research Unit, South Western Sydney Local Health District, Liverpool, NSW, 2170, Australia
| | | | - Andreas Meyer-Lindenberg
- Clinical Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Mannheim, Baden-Württemberg, J5 68159, Germany
| | - Isabelle Bichindaritz
- Biomedical and Health Informatics/Computer Science Department, State University of New York at Oswego, Oswego, NY, 13126, USA
- Intelligent Bio Systems Lab, State University of New York at Oswego, Oswego, NY, 13126, USA
| | - Stephen V Faraone
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, NSW, 2308, Australia
- Precision Medicine Research Program, Hunter Medical Research Institute, Newcastle, NSW, 2308, Australia
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Daniel J Müller
- Pharmacogenetics Research Clinic, Campbell Family Mental Health Research Institute, Centre for Addiction and Mental Health, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5S 1A1, Canada
- Department of Psychiatry, Psychosomatics and Psychotherapy, Center of Mental Health, University Hospital of Würzburg, Würzburg, 97080, Germany
| | - Stephen J Glatt
- Department of Psychiatry and Behavioral Sciences, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Neuroscience and Physiology, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
- Department of Public Health and Preventive Medicine, Norton College of Medicine at SUNY Upstate Medical University, Syracuse, NY, 13210, USA.
| |
Collapse
|
123
|
Lee A, Seo J, Park S, Cho Y, Kim G, Li J, Liang L, Park T, Chung W. Type 2 diabetes and its genetic susceptibility are associated with increased severity and mortality of COVID-19 in UK Biobank. Commun Biol 2024; 7:122. [PMID: 38267566 PMCID: PMC10808197 DOI: 10.1038/s42003-024-05799-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 01/09/2024] [Indexed: 01/26/2024] Open
Abstract
Type 2 diabetes (T2D) is known as one of the important risk factors for the severity and mortality of COVID-19. Here, we evaluate the impact of T2D and its genetic susceptibility on the severity and mortality of COVID-19, using 459,119 individuals in UK Biobank. Utilizing the polygenic risk scores (PRS) for T2D, we identified a significant association between T2D or T2D PRS, and COVID-19 severity. We further discovered the efficacy of vaccination and the pivotal role of T2D-related genetics in the pathogenesis of severe COVID-19. Moreover, we found that individuals with T2D or those in the high T2D PRS group had a significantly increased mortality rate. We also observed that the mortality rate for SARS-CoV-2-infected patients was approximately 2 to 7 times higher than for those not infected, depending on the time of infection. These findings emphasize the potential of T2D PRS in estimating the severity and mortality of COVID-19.
Collapse
Affiliation(s)
- Aeyeon Lee
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Jieun Seo
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Seunghwan Park
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
- Institute of Genetic Epidemiology, Basgenbio Co. Ltd., Seoul, 04167, Korea
| | - Youngkwang Cho
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Gaeun Kim
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea
| | - Jun Li
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Liming Liang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, Korea.
| | - Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, 06978, Korea.
- Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA.
| |
Collapse
|
124
|
Brīvība M, Atava I, Pečulis R, Elbere I, Ansone L, Rozenberga M, Silamiķelis I, Kloviņš J. Evaluating the Efficacy of Type 2 Diabetes Polygenic Risk Scores in an Independent European Population. Int J Mol Sci 2024; 25:1151. [PMID: 38256224 PMCID: PMC10817091 DOI: 10.3390/ijms25021151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/04/2024] [Accepted: 01/09/2024] [Indexed: 01/24/2024] Open
Abstract
Numerous type 2 diabetes (T2D) polygenic risk scores (PGSs) have been developed to predict individuals' predisposition to the disease. An independent assessment and verification of the best-performing PGS are warranted to allow for a rapid application of developed models. To date, only 3% of T2D PGSs have been evaluated. In this study, we assessed all (n = 102) presently published T2D PGSs in an independent cohort of 3718 individuals, which has not been included in the construction or fine-tuning of any T2D PGS so far. We further chose the best-performing PGS, assessed its performance across major population principal component analysis (PCA) clusters, and compared it with newly developed population-specific T2D PGS. Our findings revealed that 88% of the published PGSs were significantly associated with T2D; however, their performance was lower than what had been previously reported. We found a positive association of PGS improvement over the years (p-value = 8.01 × 10-4 with PGS002771 currently showing the best discriminatory power (area under the receiver operating characteristic (AUROC) = 0.669) and PGS003443 exhibiting the strongest association PGS003443 (odds ratio (OR) = 1.899). Further investigation revealed no difference in PGS performance across major population PCA clusters and when compared with newly developed population-specific PGS. Our findings revealed a positive trend in T2D PGS performance, consistently identifying high-T2D-risk individuals in an independent European population.
Collapse
Affiliation(s)
- Monta Brīvība
- Latvian Biomedical Research and Study Centre, LV-1067 Riga, Latvia; (I.A.); (I.E.); (L.A.); (J.K.)
| | | | | | | | | | | | | | | |
Collapse
|
125
|
Zhu Y, Meng Y, Zhang Y, Karlsson IK, Hägg S, Zhan Y. Genetically determined telomere length and its association with chronic obstructive pulmonary disease and interstitial lung disease in biobank Japan: A Mendelian randomization study. Heliyon 2024; 10:e23415. [PMID: 38163245 PMCID: PMC10757031 DOI: 10.1016/j.heliyon.2023.e23415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/01/2023] [Accepted: 12/04/2023] [Indexed: 01/03/2024] Open
Abstract
Importance Chronic obstructive pulmonary disease (COPD) and interstitial lung disease (ILD) have been linked to shorter telomere length (TL). While understanding this association has critical clinical implications for respiratory diseases, previous studies exploring these associations were conducted in European populations. The present study aims to investigate this relationship in an Asian population. Objective To examine the causal relationship between leukocyte TL and COPD and ILD in an Asian population. Design Setting, and Participants: We used a genome-wide association study summary statistics-based two-sample Mendelian randomization (MR) design to investigate the association between leukocyte TL, genetically predicted by nine single-nucleotide polymorphisms and the risk of COPD and ILD. Participants were Japanese individuals enrolled in the Biobank Japan Project, including 3315 COPD patients and 806 ILD patients. Exposure Leukocyte TL was genetically predicted by nine single-nucleotide polymorphisms. Results The inverse-variance weighted estimates showed a significant inverse association between leukocyte TL and COPD (odds ratio [OR] = 0.78; 95 % confidence interval [CI]: 0.64, 0.95; P = 0.01) and ILD (OR = 0.29; 95 % CI: 0.14, 0.61; P = 0.001), respectively. All sensitivity analyses yielded consistent results. The MR-Egger regression intercept test showed no evidence of horizontal pleiotropy (Pintercept: COPD, 0.56; ILD: 0.70). Conclusion and Relevance: Our findings suggest that leukocyte telomere shortening may causally increase the risk of COPD and ILD. These results highlight the potential importance of TL for these respiratory diseases.
Collapse
Affiliation(s)
- Yanan Zhu
- Department of Epidemiology, School of Public Health (Shenzhen), Sun Yat-Sen University, Shenzhen, China
| | - Yaxian Meng
- Department of Epidemiology, School of Public Health (Shenzhen), Sun Yat-Sen University, Shenzhen, China
| | - Yasi Zhang
- Department of Epidemiology, School of Public Health (Shenzhen), Sun Yat-Sen University, Shenzhen, China
| | - Ida K. Karlsson
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Sara Hägg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Yiqiang Zhan
- Department of Epidemiology, School of Public Health (Shenzhen), Sun Yat-Sen University, Shenzhen, China
- Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
126
|
Ye Y, Hu J, Pang F, Cui C, Zhao H. Genomic risk prediction of cardiovascular diseases among type 2 diabetes patients in the UK Biobank. FRONTIERS IN BIOINFORMATICS 2024; 3:1320748. [PMID: 38239805 PMCID: PMC10794561 DOI: 10.3389/fbinf.2023.1320748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Accepted: 12/11/2023] [Indexed: 01/22/2024] Open
Abstract
Background: Polygenic risk score (PRS) has proved useful in predicting the risk of cardiovascular diseases (CVD) based on the genotypes of an individual, but most analyses have focused on disease onset in the general population. The usefulness of PRS to predict CVD risk among type 2 diabetes (T2D) patients remains unclear. Methods: We built a meta-PRSCVD upon the candidate PRSs developed from state-of-the-art PRS methods for three CVD subtypes of significant importance: coronary artery disease (CAD), ischemic stroke (IS), and heart failure (HF). To evaluate the prediction performance of the meta-PRSCVD, we restricted our analysis to 21,092 white British T2D patients in the UK Biobank, among which 4,015 had CVD events. Results: Results showed that the meta-PRSCVD was significantly associated with CVD risk with a hazard ratio per standard deviation increase of 1.28 (95% CI: 1.23-1.33). The meta-PRSCVD alone predicted the CVD incidence with an area under the receiver operating characteristic curve (AUC) of 0.57 (95% CI: 0.54-0.59). When restricted to the early-onset patients (onset age ≤ 55), the AUC was further increased to 0.61 (95% CI 0.56-0.67). Conclusion: Our results highlight the potential role of genomic screening for secondary preventions of CVD among T2D patients, especially among early-onset patients.
Collapse
Affiliation(s)
- Yixuan Ye
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Jiaqi Hu
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, CT, United States
| | - Fuyuan Pang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
- Department of Biostatistics, Shanghai Jiao Tong University, Shanghai, China
| | - Can Cui
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT, United States
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, United States
| |
Collapse
|
127
|
Jiang W, Chen L, Girgenti MJ, Zhao H. Tuning parameters for polygenic risk score methods using GWAS summary statistics from training data. Nat Commun 2024; 15:24. [PMID: 38169469 PMCID: PMC10762162 DOI: 10.1038/s41467-023-44009-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/27/2023] [Indexed: 01/05/2024] Open
Abstract
Various polygenic risk scores (PRS) methods have been proposed to combine the estimated effects of single nucleotide polymorphisms (SNPs) to predict genetic risks for common diseases, using data collected from genome-wide association studies (GWAS). Some methods require external individual-level GWAS dataset for parameter tuning, posing privacy and security-related concerns. Leaving out partial data for parameter tuning can also reduce model prediction accuracy. In this article, we propose PRStuning, a method that tunes parameters for different PRS methods using GWAS summary statistics from the training data. PRStuning predicts the PRS performance with different parameters, and then selects the best-performing parameters. Because directly using training data effects tends to overestimate the performance in the testing data, we adopt an empirical Bayes approach to shrinking the predicted performance in accordance with the genetic architecture of the disease. Extensive simulations and real data applications demonstrate PRStuning's accuracy across PRS methods and parameters.
Collapse
Affiliation(s)
- Wei Jiang
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Ling Chen
- Department of Statistics, Columbia University, New York, NY, USA
| | | | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.
| |
Collapse
|
128
|
Cui R, Elzur RA, Kanai M, Ulirsch JC, Weissbrod O, Daly MJ, Neale BM, Fan Z, Finucane HK. Improving fine-mapping by modeling infinitesimal effects. Nat Genet 2024; 56:162-169. [PMID: 38036779 PMCID: PMC11056999 DOI: 10.1038/s41588-023-01597-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 10/26/2023] [Indexed: 12/02/2023]
Abstract
Fine-mapping aims to identify causal genetic variants for phenotypes. Bayesian fine-mapping algorithms (for example, SuSiE, FINEMAP, ABF and COJO-ABF) are widely used, but assessing posterior probability calibration remains challenging in real data, where model misspecification probably exists, and true causal variants are unknown. We introduce replication failure rate (RFR), a metric to assess fine-mapping consistency by downsampling. SuSiE, FINEMAP and COJO-ABF show high RFR, indicating potential overconfidence in their output. Simulations reveal that nonsparse genetic architecture can lead to miscalibration, while imputation noise, nonuniform distribution of causal variants and quality control filters have minimal impact. Here we present SuSiE-inf and FINEMAP-inf, fine-mapping methods modeling infinitesimal effects alongside fewer larger causal effects. Our methods show improved calibration, RFR and functional enrichment, competitive recall and computational efficiency. Notably, using our methods' posterior effect sizes substantially increases polygenic risk score accuracy over SuSiE and FINEMAP. Our work improves causal variant identification for complex traits, a fundamental goal of human genetics.
Collapse
Affiliation(s)
- Ran Cui
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Roy A Elzur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Masahiro Kanai
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
| | - Jacob C Ulirsch
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mark J Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Benjamin M Neale
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Zhou Fan
- Department of Statistics and Data Science, Yale University, New Haven, CT, USA.
| | - Hilary K Finucane
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
129
|
Kachuri L, Chatterjee N, Hirbo J, Schaid DJ, Martin I, Kullo IJ, Kenny EE, Pasaniuc B, Witte JS, Ge T. Principles and methods for transferring polygenic risk scores across global populations. Nat Rev Genet 2024; 25:8-25. [PMID: 37620596 PMCID: PMC10961971 DOI: 10.1038/s41576-023-00637-2] [Citation(s) in RCA: 103] [Impact Index Per Article: 103.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2023] [Indexed: 08/26/2023]
Abstract
Polygenic risk scores (PRSs) summarize the genetic predisposition of a complex human trait or disease and may become a valuable tool for advancing precision medicine. However, PRSs that are developed in populations of predominantly European genetic ancestries can increase health disparities due to poor predictive performance in individuals of diverse and complex genetic ancestries. We describe genetic and modifiable risk factors that limit the transferability of PRSs across populations and review the strengths and weaknesses of existing PRS construction methods for diverse ancestries. Developing PRSs that benefit global populations in research and clinical settings provides an opportunity for innovation and is essential for health equity.
Collapse
Affiliation(s)
- Linda Kachuri
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jibril Hirbo
- Department of Medicine Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Daniel J Schaid
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Iman Martin
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, MD, USA
| | - Iftikhar J Kullo
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bogdan Pasaniuc
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Stanford, CA, USA.
- Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
- Department of Genetics, Stanford University, Stanford, CA, USA.
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
130
|
Lo YC, Chan TF, Jeon S, Maskarinec G, Taparra K, Nakatsuka N, Yu M, Chen CY, Lin YF, Wilkens LR, Le Marchand L, Haiman CA, Chiang CWK. The accuracy of polygenic score models for anthropometric traits and Type II Diabetes in the Native Hawaiian Population. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.25.23300499. [PMID: 38234828 PMCID: PMC10793530 DOI: 10.1101/2023.12.25.23300499] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Polygenic scores (PGS) are promising in stratifying individuals based on the genetic susceptibility to complex diseases or traits. However, the accuracy of PGS models, typically trained in European- or East Asian-ancestry populations, tend to perform poorly in other ethnic minority populations, and their accuracies have not been evaluated for Native Hawaiians. Using body mass index, height, and type-2 diabetes as examples of highly polygenic traits, we evaluated the prediction accuracies of PGS models in a large Native Hawaiian sample from the Multiethnic Cohort with up to 5,300 individuals. We evaluated both publicly available PGS models or genome-wide PGS models trained in this study using the largest available GWAS. We found evidence of lowered prediction accuracies for the PGS models in some cases, particularly for height. We also found that using the Native Hawaiian samples as an optimization cohort during training did not consistently improve PGS performance. Moreover, even the best performing PGS models among Native Hawaiians would have lowered prediction accuracy among the subset of individuals most enriched with Polynesian ancestry. Our findings indicate that factors such as admixture histories, sample size and diversity in GWAS can influence PGS performance for complex traits among Native Hawaiian samples. This study provides an initial survey of PGS performance among Native Hawaiians and exposes the current gaps and challenges associated with improving polygenic prediction models for underrepresented minority populations.
Collapse
Affiliation(s)
- Ying-Chu Lo
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Tsz Fung Chan
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Soyoung Jeon
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Gertraud Maskarinec
- Epidemiology Program, University of Hawai'i Cancer Center, University of Hawai'i, Manoa, Honolulu, HI, USA
| | - Kekoa Taparra
- Standard Health Care, Department of Radiation Oncology, Palo Alto, CA, USA
| | | | - Mingrui Yu
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
| | - Chia-Yen Chen
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
- Biogen, Cambridge, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Yen-Feng Lin
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli, Taiwan
- Department of Public Health & Medical Humanities, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
- Institute of Behavioral Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Lynne R Wilkens
- Epidemiology Program, University of Hawai'i Cancer Center, University of Hawai'i, Manoa, Honolulu, HI, USA
| | - Loic Le Marchand
- Epidemiology Program, University of Hawai'i Cancer Center, University of Hawai'i, Manoa, Honolulu, HI, USA
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Cancer Epidemiology Program, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Cancer Epidemiology Program, Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA, USA
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
131
|
Huang X, Zhu TN, Liu YC, Qi GA, Zhang JN, Chen GB. Efficient estimation for large-scale linkage disequilibrium patterns of the human genome. eLife 2023; 12:RP90636. [PMID: 38149842 PMCID: PMC10752592 DOI: 10.7554/elife.90636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
In this study, we proposed an efficient algorithm (X-LD) for estimating linkage disequilibrium (LD) patterns for a genomic grid, which can be of inter-chromosomal scale or of small segments. Compared with conventional methods, the proposed method was significantly faster, dropped from O(nm2) to O(n2m)-n the sample size and m the number of SNPs, and consequently we were permitted to explore in depth unknown or reveal long-anticipated LD features of the human genome. Having applied the algorithm for 1000 Genome Project (1KG), we found (1) the extended LD, driven by population structure, universally existed, and the strength of inter-chromosomal LD was about 10% of their respective intra-chromosomal LD in relatively homogeneous cohorts, such as FIN, and to nearly 56% in admixed cohort, such as ASW. (2) After splitting each chromosome into upmost of more than a half million grids, we elucidated the LD of the HLA region was nearly 42 folders higher than chromosome 6 in CEU and 11.58 in ASW; on chromosome 11, we observed that the LD of its centromere was nearly 94.05 folders higher than chromosome 11 in YRI and 42.73 in ASW. (3) We uncovered the long-anticipated inversely proportional linear relationship between the length of a chromosome and the strength of chromosomal LD, and their Pearson's correlation was on average over 0.80 for 26 1KG cohorts. However, this linear norm was so far perturbed by chromosome 11 given its more completely sequenced centromere region. Uniquely chromosome 8 of ASW was found most deviated from the linear norm than any other autosomes. The proposed algorithm has been realized in C++ (called X-LD) and is available at https://github.com/gc5k/gear2, and can be applied to explore LD features in any sequenced populations.
Collapse
Affiliation(s)
- Xin Huang
- Institute of Bioinformatics, Zhejiang UniversityHangzhouChina
- Center for General Practice Medicine, Department of General Practice Medicine, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical CollegeHangzhouChina
- Center for Reproductive Medicine, Department of Genetic and Genomic Medicine, and Clinical Research Institute, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical CollegeZhejiangChina
| | - Tian-Neng Zhu
- Institute of Bioinformatics, Zhejiang UniversityHangzhouChina
| | - Ying-Chao Liu
- Institute of Bioinformatics, Zhejiang UniversityHangzhouChina
| | - Guo-An Qi
- Institute of Bioinformatics, Zhejiang UniversityHangzhouChina
- Hainan Institute of Zhejiang UniversityHainanChina
| | | | - Guo-Bo Chen
- Center for General Practice Medicine, Department of General Practice Medicine, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical CollegeHangzhouChina
- Center for Reproductive Medicine, Department of Genetic and Genomic Medicine, and Clinical Research Institute, Zhejiang Provincial People’s Hospital, People’s Hospital of Hangzhou Medical CollegeZhejiangChina
- Key Laboratory of Endocrine Gland Diseases of Zhejiang ProvinceHangzhouChina
| |
Collapse
|
132
|
Drouard G, Hagenbeek FA, Whipp AM, Pool R, Hottenga JJ, Jansen R, Hubers N, Afonin A, Willemsen G, de Geus EJC, Ripatti S, Pirinen M, Kanninen KM, Boomsma DI, van Dongen J, Kaprio J. Longitudinal multi-omics study reveals common etiology underlying association between plasma proteome and BMI trajectories in adolescent and young adult twins. BMC Med 2023; 21:508. [PMID: 38129841 PMCID: PMC10740308 DOI: 10.1186/s12916-023-03198-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND The influence of genetics and environment on the association of the plasma proteome with body mass index (BMI) and changes in BMI remains underexplored, and the links to other omics in these associations remain to be investigated. We characterized protein-BMI trajectory associations in adolescents and adults and how these connect to other omics layers. METHODS Our study included two cohorts of longitudinally followed twins: FinnTwin12 (N = 651) and the Netherlands Twin Register (NTR) (N = 665). Follow-up comprised 4 BMI measurements over approximately 6 (NTR: 23-27 years old) to 10 years (FinnTwin12: 12-22 years old), with omics data collected at the last BMI measurement. BMI changes were calculated in latent growth curve models. Mixed-effects models were used to quantify the associations between the abundance of 439 plasma proteins with BMI at blood sampling and changes in BMI. In FinnTwin12, the sources of genetic and environmental variation underlying the protein abundances were quantified by twin models, as were the associations of proteins with BMI and BMI changes. In NTR, we investigated the association of gene expression of genes encoding proteins identified in FinnTwin12 with BMI and changes in BMI. We linked identified proteins and their coding genes to plasma metabolites and polygenic risk scores (PRS) applying mixed-effects models and correlation networks. RESULTS We identified 66 and 14 proteins associated with BMI at blood sampling and changes in BMI, respectively. The average heritability of these proteins was 35%. Of the 66 BMI-protein associations, 43 and 12 showed genetic and environmental correlations, respectively, including 8 proteins showing both. Similarly, we observed 7 and 3 genetic and environmental correlations between changes in BMI and protein abundance, respectively. S100A8 gene expression was associated with BMI at blood sampling, and the PRG4 and CFI genes were associated with BMI changes. Proteins showed strong connections with metabolites and PRSs, but we observed no multi-omics connections among gene expression and other omics layers. CONCLUSIONS Associations between the proteome and BMI trajectories are characterized by shared genetic, environmental, and metabolic etiologies. We observed few gene-protein pairs associated with BMI or changes in BMI at the proteome and transcriptome levels.
Collapse
Affiliation(s)
- Gabin Drouard
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| | - Fiona A Hagenbeek
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Alyce M Whipp
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
| | - René Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Jouke Jan Hottenga
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Rick Jansen
- Department of Psychiatry, Amsterdam UMC Location Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Amsterdam Neuroscience, Mood, Anxiety, Psychosis, Sleep & Stress Program, Amsterdam, The Netherlands
| | - Nikki Hubers
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, Amsterdam, The Netherlands
| | - Aleksei Afonin
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Gonneke Willemsen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Eco J C de Geus
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland
- Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Katja M Kanninen
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Dorret I Boomsma
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, Amsterdam, The Netherlands
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Amsterdam Reproduction & Development (AR&D) Research Institute, Amsterdam, The Netherlands
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
133
|
Habets PC, Thomas RM, Milaneschi Y, Jansen R, Pool R, Peyrot WJ, Penninx BWJH, Meijer OC, van Wingen GA, Vinkers CH. Multimodal Data Integration Advances Longitudinal Prediction of the Naturalistic Course of Depression and Reveals a Multimodal Signature of Remission During 2-Year Follow-up. Biol Psychiatry 2023; 94:948-958. [PMID: 37330166 DOI: 10.1016/j.biopsych.2023.05.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 05/11/2023] [Accepted: 05/30/2023] [Indexed: 06/19/2023]
Abstract
BACKGROUND The ability to predict the disease course of individuals with major depressive disorder (MDD) is essential for optimal treatment planning. Here, we used a data-driven machine learning approach to assess the predictive value of different sets of biological data (whole-blood proteomics, lipid metabolomics, transcriptomics, genetics), both separately and added to clinical baseline variables, for the longitudinal prediction of 2-year remission status in MDD at the individual-subject level. METHODS Prediction models were trained and cross-validated in a sample of 643 patients with current MDD (2-year remission n = 325) and subsequently tested for performance in 161 individuals with MDD (2-year remission n = 82). RESULTS Proteomics data showed the best unimodal data predictions (area under the receiver operating characteristic curve = 0.68). Adding proteomic to clinical data at baseline significantly improved 2-year MDD remission predictions (area under the receiver operating characteristic curve = 0.63 vs. 0.78, p = .013), while the addition of other omics data to clinical data did not yield significantly improved model performance. Feature importance and enrichment analysis revealed that proteomic analytes were involved in inflammatory response and lipid metabolism, with fibrinogen levels showing the highest variable importance, followed by symptom severity. Machine learning models outperformed psychiatrists' ability to predict 2-year remission status (balanced accuracy = 71% vs. 55%). CONCLUSIONS This study showed the added predictive value of combining proteomic data, but not other omics data, with clinical data for the prediction of 2-year remission status in MDD. Our results reveal a novel multimodal signature of 2-year MDD remission status that shows clinical potential for individual MDD disease course predictions from baseline measurements.
Collapse
Affiliation(s)
- Philippe C Habets
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Internal Medicine, section Endocrinology, Leiden University Medical Center, Leiden, the Netherlands.
| | - Rajat M Thomas
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Yuri Milaneschi
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Rick Jansen
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Rene Pool
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Wouter J Peyrot
- Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Complex Traits Genetics, Center for Neurogenomics and Cognitive Research, Amsterdam Neuroscience, Vrije Universiteit, Amsterdam, the Netherlands
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Onno C Meijer
- Department of Internal Medicine, section Endocrinology, Leiden University Medical Center, Leiden, the Netherlands
| | - Guido A van Wingen
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| | - Christiaan H Vinkers
- Department of Anatomy & Neurosciences, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands; Department of Psychiatry, Amsterdam Neuroscience, Amsterdam University Medical Center, Vrije Universiteit, Amsterdam, the Netherlands
| |
Collapse
|
134
|
Zhang J, Zhou W, Yu H, Wang T, Wang X, Liu L, Wen Y. Prediction of Parkinson's Disease Using Machine Learning Methods. Biomolecules 2023; 13:1761. [PMID: 38136632 PMCID: PMC10741603 DOI: 10.3390/biom13121761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/29/2023] [Accepted: 12/06/2023] [Indexed: 12/24/2023] Open
Abstract
The detection of Parkinson's disease (PD) in its early stages is of great importance for its treatment and management, but consensus is lacking on what information is necessary and what models should be used to best predict PD risk. In our study, we first grouped PD-associated factors based on their cost and accessibility, and then gradually incorporated them into risk predictions, which were built using eight commonly used machine learning models to allow for comprehensive assessment. Finally, the Shapley Additive Explanations (SHAP) method was used to investigate the contributions of each factor. We found that models built with demographic variables, hospital admission examinations, clinical assessment, and polygenic risk score achieved the best prediction performance, and the inclusion of invasive biomarkers could not further enhance its accuracy. Among the eight machine learning models considered, penalized logistic regression and XGBoost were the most accurate algorithms for assessing PD risk, with penalized logistic regression achieving an area under the curve of 0.94 and a Brier score of 0.08. Olfactory function and polygenic risk scores were the most important predictors for PD risk. Our research has offered a practical framework for PD risk assessment, where necessary information and efficient machine learning tools were highlighted.
Collapse
Affiliation(s)
- Jiayu Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China; (J.Z.); (W.Z.); (H.Y.); (T.W.)
| | - Wenchao Zhou
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China; (J.Z.); (W.Z.); (H.Y.); (T.W.)
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China; (J.Z.); (W.Z.); (H.Y.); (T.W.)
| | - Tong Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China; (J.Z.); (W.Z.); (H.Y.); (T.W.)
| | - Xiaqiong Wang
- Department of Epidemiology and Biostatistics, Southeast University, 87 Ding Jiaqiao Road, Nanjing 210009, China;
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, No. 56 Xinjian South Road, Yingze District, Taiyuan 030001, China; (J.Z.); (W.Z.); (H.Y.); (T.W.)
| | - Yalu Wen
- Department of Statistics, University of Auckland, 38 Princes Street, Auckland Central, Auckland 1010, New Zealand
| |
Collapse
|
135
|
Li H, Mazumder R, Lin X. Accurate and efficient estimation of local heritability using summary statistics and the linkage disequilibrium matrix. Nat Commun 2023; 14:7954. [PMID: 38040712 PMCID: PMC10692177 DOI: 10.1038/s41467-023-43565-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 11/14/2023] [Indexed: 12/03/2023] Open
Abstract
Existing SNP-heritability estimators that leverage summary statistics from genome-wide association studies (GWAS) are much less efficient (i.e., have larger standard errors) than the restricted maximum likelihood (REML) estimators which require access to individual-level data. We introduce a new method for local heritability estimation-Heritability Estimation with high Efficiency using LD and association Summary Statistics (HEELS)-that significantly improves the statistical efficiency of summary-statistics-based heritability estimator and attains comparable statistical efficiency as REML (with a relative statistical efficiency >92%). Moreover, we propose representing the empirical LD matrix as the sum of a low-rank matrix and a banded matrix. We show that this way of modeling the LD can not only reduce the storage and memory cost, but also improve the computational efficiency of heritability estimation. We demonstrate the statistical efficiency of HEELS and the advantages of our proposed LD approximation strategies both in simulations and through empirical analyses of the UK Biobank data.
Collapse
Affiliation(s)
- Hui Li
- Harvard T.H. Chan School of Public Health, Department of Biostatistics, Boston, MA, USA
| | - Rahul Mazumder
- Massachusetts Institute of Technology, Operations Research and Statistics group, Cambridge, MA, USA
| | - Xihong Lin
- Harvard T.H. Chan School of Public Health, Department of Biostatistics, Boston, MA, USA.
- Harvard University, Department of Statistics, Cambridge, MA, USA.
| |
Collapse
|
136
|
Lin J, Mars N, Fu Y, Ripatti P, Kiiskinen T, FinnGen study, Tukiainen T, Ripatti S, Pirinen M. Integration of Biomarker Polygenic Risk Score Improves Prediction of Coronary Heart Disease. JACC Basic Transl Sci 2023; 8:1489-1499. [PMID: 38205343 PMCID: PMC10774750 DOI: 10.1016/j.jacbts.2023.07.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 07/06/2023] [Accepted: 07/10/2023] [Indexed: 01/12/2024]
Abstract
There are several established biomarkers for coronary heart disease (CHD), including blood pressure, cholesterol, and lipoproteins. It is of high interest to determine how a combined polygenic risk score (PRS) of CHD-associated biomarkers (BioPRS) can further improve genetic prediction of CHD. We developed CHDBioPRS, combining BioPRS with PRS of CHD in the UK Biobank and tested it on FinnGen. We found that BioPRS was clearly predictive of CHD and that CHDBioPRS improved the standard CHD PRS. The largest effect was observed with early onset cases in FinnGen, with HRs above 2 per standard deviation of CHDBioPRS.
Collapse
Affiliation(s)
- Jake Lin
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Health Sciences, Faculty of Social Sciences, Tampere University, Tampere, Finland
| | - Nina Mars
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Yu Fu
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Pietari Ripatti
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Tuomo Kiiskinen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - FinnGen study
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Health Sciences, Faculty of Social Sciences, Tampere University, Tampere, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Broad Institute of Massachusetts Institute of Technology, Harvard University, Cambridge, Massachusetts, USA
- Massachusetts General Hospital, Cambridge, Massachusetts, USA
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Taru Tukiainen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Broad Institute of Massachusetts Institute of Technology, Harvard University, Cambridge, Massachusetts, USA
- Massachusetts General Hospital, Cambridge, Massachusetts, USA
| | - Matti Pirinen
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
137
|
von Hinke S, Sørensen EN. The long-term effects of early-life pollution exposure: Evidence from the London smog. JOURNAL OF HEALTH ECONOMICS 2023; 92:102827. [PMID: 37866291 DOI: 10.1016/j.jhealeco.2023.102827] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 09/29/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023]
Abstract
This paper uses a large UK cohort to investigate the impact of early-life pollution exposure on individuals' human capital and health outcomes in older age. We compare individuals who were exposed to the London smog in December 1952 whilst in utero or in infancy to those born after the smog and those born at the same time but in unaffected areas. We find that those exposed to the smog have substantially lower fluid intelligence and worse respiratory health, with some evidence of a reduction in years of schooling.
Collapse
Affiliation(s)
- Stephanie von Hinke
- School of Economics, University of Bristol, United Kingdom; Institute for Fiscal Studies, United Kingdom.
| | | |
Collapse
|
138
|
Khanna NN, Singh M, Maindarkar M, Kumar A, Johri AM, Mentella L, Laird JR, Paraskevas KI, Ruzsa Z, Singh N, Kalra MK, Fernandes JFE, Chaturvedi S, Nicolaides A, Rathore V, Singh I, Teji JS, Al-Maini M, Isenovic ER, Viswanathan V, Khanna P, Fouda MM, Saba L, Suri JS. Polygenic Risk Score for Cardiovascular Diseases in Artificial Intelligence Paradigm: A Review. J Korean Med Sci 2023; 38:e395. [PMID: 38013648 PMCID: PMC10681845 DOI: 10.3346/jkms.2023.38.e395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/15/2023] [Indexed: 11/29/2023] Open
Abstract
Cardiovascular disease (CVD) related mortality and morbidity heavily strain society. The relationship between external risk factors and our genetics have not been well established. It is widely acknowledged that environmental influence and individual behaviours play a significant role in CVD vulnerability, leading to the development of polygenic risk scores (PRS). We employed the PRISMA search method to locate pertinent research and literature to extensively review artificial intelligence (AI)-based PRS models for CVD risk prediction. Furthermore, we analyzed and compared conventional vs. AI-based solutions for PRS. We summarized the recent advances in our understanding of the use of AI-based PRS for risk prediction of CVD. Our study proposes three hypotheses: i) Multiple genetic variations and risk factors can be incorporated into AI-based PRS to improve the accuracy of CVD risk predicting. ii) AI-based PRS for CVD circumvents the drawbacks of conventional PRS calculators by incorporating a larger variety of genetic and non-genetic components, allowing for more precise and individualised risk estimations. iii) Using AI approaches, it is possible to significantly reduce the dimensionality of huge genomic datasets, resulting in more accurate and effective disease risk prediction models. Our study highlighted that the AI-PRS model outperformed traditional PRS calculators in predicting CVD risk. Furthermore, using AI-based methods to calculate PRS may increase the precision of risk predictions for CVD and have significant ramifications for individualized prevention and treatment plans.
Collapse
Affiliation(s)
- Narendra N Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, New Delhi, India
- Asia Pacific Vascular Society, New Delhi, India
| | - Manasvi Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- Bennett University, Greater Noida, India
| | - Mahesh Maindarkar
- Asia Pacific Vascular Society, New Delhi, India
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- School of Bioengineering Sciences and Research, Maharashtra Institute of Technology's Art, Design and Technology University, Pune, India
| | | | - Amer M Johri
- Department of Medicine, Division of Cardiology, Queen's University, Kingston, Canada
| | - Laura Mentella
- Department of Medicine, Division of Cardiology, University of Toronto, Toronto, Canada
| | - John R Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St. Helena, CA, USA
| | | | - Zoltan Ruzsa
- Invasive Cardiology Division, University of Szeged, Szeged, Hungary
| | - Narpinder Singh
- Department of Food Science and Technology, Graphic Era Deemed to be University, Dehradun, Uttarakhand, India
| | | | | | - Seemant Chaturvedi
- Department of Neurology & Stroke Program, University of Maryland, Baltimore, MD, USA
| | - Andrew Nicolaides
- Vascular Screening and Diagnostic Centre and University of Nicosia Medical School, Cyprus
| | - Vijay Rathore
- Nephrology Department, Kaiser Permanente, Sacramento, CA, USA
| | - Inder Singh
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
| | - Jagjit S Teji
- Ann and Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, USA
| | - Mostafa Al-Maini
- Allergy, Clinical Immunology and Rheumatology Institute, Toronto, ON, Canada
| | - Esma R Isenovic
- Department of Radiobiology and Molecular Genetics, National Institute of The Republic of Serbia, University of Belgrade, Beograd, Serbia
| | | | - Puneet Khanna
- Department of Anaesthesiology, AIIMS, New Delhi, India
| | - Mostafa M Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID, USA
| | - Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, Cagliari, Italy
| | - Jasjit S Suri
- Asia Pacific Vascular Society, New Delhi, India
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA, USA
- Department of Computer Engineering, Graphic Era Deemed to be University, Dehradun, India.
| |
Collapse
|
139
|
Zhai S, Mehrotra DV, Shen J. Applying polygenic risk score methods to pharmacogenomics GWAS: challenges and opportunities. Brief Bioinform 2023; 25:bbad470. [PMID: 38152980 PMCID: PMC10782924 DOI: 10.1093/bib/bbad470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/20/2023] [Accepted: 11/28/2023] [Indexed: 12/29/2023] Open
Abstract
Polygenic risk scores (PRSs) have emerged as promising tools for the prediction of human diseases and complex traits in disease genome-wide association studies (GWAS). Applying PRSs to pharmacogenomics (PGx) studies has begun to show great potential for improving patient stratification and drug response prediction. However, there are unique challenges that arise when applying PRSs to PGx GWAS beyond those typically encountered in disease GWAS (e.g. Eurocentric or trans-ethnic bias). These challenges include: (i) the lack of knowledge about whether PGx or disease GWAS/variants should be used in the base cohort (BC); (ii) the small sample sizes in PGx GWAS with corresponding low power and (iii) the more complex PRS statistical modeling required for handling both prognostic and predictive effects simultaneously. To gain insights in this landscape about the general trends, challenges and possible solutions, we first conduct a systematic review of both PRS applications and PRS method development in PGx GWAS. To further address the challenges, we propose (i) a novel PRS application strategy by leveraging both PGx and disease GWAS summary statistics in the BC for PRS construction and (ii) a new Bayesian method (PRS-PGx-Bayesx) to reduce Eurocentric or cross-population PRS prediction bias. Extensive simulations are conducted to demonstrate their advantages over existing PRS methods applied in PGx GWAS. Our systematic review and methodology research work not only highlights current gaps and key considerations while applying PRS methods to PGx GWAS, but also provides possible solutions for better PGx PRS applications and future research.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
140
|
Kim DJ, Kang JH, Kim JW, Cheon MJ, Kim SB, Lee YK, Lee BC. Evaluation of optimal methods and ancestries for calculating polygenic risk scores in East Asian population. Sci Rep 2023; 13:19195. [PMID: 37932343 PMCID: PMC10628155 DOI: 10.1038/s41598-023-45859-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 10/25/2023] [Indexed: 11/08/2023] Open
Abstract
Polygenic risk scores (PRSs) have been studied for predicting human diseases, and various methods for PRS calculation have been developed. Most PRS studies to date have focused on European ancestry, and the performance of PRS has not been sufficiently assessed in East Asia. Herein, we evaluated the predictive performance of PRSs for East Asian populations under various conditions. Simulation studies using data from the Korean cohort, Health Examinees (HEXA), demonstrated that SBayesRC and PRS-CS outperformed other PRS methods (lassosum, LDpred-funct, and PRSice) in high fixed heritability (0.3 and 0.7). In addition, we generated PRSs using real-world data from HEXA for ten diseases: asthma, breast cancer, cataract, coronary artery disease, gastric cancer, glaucoma, hyperthyroidism, hypothyroidism, osteoporosis, and type 2 diabetes (T2D). We utilized the five previous PRS methods and genome-wide association study (GWAS) data from two biobank-scale datasets [European (UK Biobank) and East Asian (BioBank Japan) ancestry]. Additionally, we employed PRS-CSx, a PRS method that combines GWAS data from both ancestries, to generate a total of 110 PRS for ten diseases. Similar to the simulation results, SBayesRC showed better predictive performance for disease risk than the other methods. Furthermore, the East Asian GWAS data outperformed those from European ancestry for breast cancer, cataract, gastric cancer, and T2D, but neither of the two GWAS ancestries showed a significant advantage on PRS performance for the remaining six diseases. Based on simulation data and real data studies, it is expected that SBayesRC will offer superior performance for East Asian populations, and PRS generated using GWAS from non-East Asian may also yield good results.
Collapse
|
141
|
Tang D, Freudenberg J, Dahl A. Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits. Am J Hum Genet 2023; 110:1875-1887. [PMID: 37922884 PMCID: PMC10645564 DOI: 10.1016/j.ajhg.2023.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.
Collapse
Affiliation(s)
- David Tang
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | - Jerome Freudenberg
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Andy Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
142
|
Rohde PD, Fourie Sørensen I, Sørensen P. Expanded utility of the R package, qgg, with applications within genomic medicine. Bioinformatics 2023; 39:btad656. [PMID: 37882742 PMCID: PMC10627350 DOI: 10.1093/bioinformatics/btad656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 09/17/2023] [Accepted: 10/24/2023] [Indexed: 10/27/2023] Open
Abstract
SUMMARY Here, we present an expanded utility of the R package qgg for genetic analyses of complex traits and diseases. One of the major updates of the package is, that it now includes Bayesian linear regression modeling procedures, which provide a unified framework for mapping of genetic variants, estimation of heritability and genomic prediction from either individual level data or from genome-wide association study summary data. With this release, the qgg package now provides a wealth of the commonly used methods in analysis of complex traits and diseases, without the need to switch between software and data formats. AVAILABILITY AND IMPLEMENTATION The methodologies are implemented in the publicly available R software package, qgg, using fast and memory efficient algorithms in C++ and is available on CRAN or as a developer version at our GitHub page (https://github.com/psoerensen/qgg). Notes on the implemented statistical genetic models, tutorials and example scripts are available at our GitHub page https://psoerensen.github.io/qgg/.
Collapse
Affiliation(s)
- Palle Duun Rohde
- Genomic Medicine, Department of Health Science and Technology, Aalborg University, 9260 Gistrup, Denmark
| | - Izel Fourie Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000 Aarhus, Denmark
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, 8000 Aarhus, Denmark
| |
Collapse
|
143
|
Schmengler H, Oldehinkel AJ, Vollebergh WAM, Pasman JA, Hartman CA, Stevens GWJM, Nolte IM, Peeters M. Disentangling the interplay between genes, cognitive skills, and educational level in adolescent and young adult smoking - The TRAILS study. Soc Sci Med 2023; 336:116254. [PMID: 37751630 DOI: 10.1016/j.socscimed.2023.116254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 08/17/2023] [Accepted: 09/15/2023] [Indexed: 09/28/2023]
Abstract
Recent studies suggest that smoking and lower educational attainment may have genetic influences in common. However, little is known about the mechanisms through which genetics contributes to educational inequalities in adolescent and young adult smoking. Common genetic liabilities may underlie cognitive skills associated with both smoking and education, such as IQ and effortful control, in line with indirect health-related selection explanations. Additionally, by affecting cognitive skills, genes may predict educational trajectories and hereby adolescents' social context, which may be associated with smoking, consistent with social causation explanations. Using data from the Dutch TRAILS Study (N = 1581), we estimated the extent to which polygenic scores (PGSs) for ever smoking regularly (PGSSMOK) and years of education (PGSEDU) predict IQ and effortful control, measured around age 11, and whether these cognitive skills then act as shared predictors of smoking and educational level around age 16, 19, 22, and 26. Second, we assessed if educational level mediated associations between PGSs and smoking. Both PGSs were associated with lower effortful control, and PGSEDU also with lower IQ. Lower IQ and effortful control, in turn, predicted having a lower educational level. However, neither of these cognitive skills were directly associated with smoking behaviour after controlling for covariates and PGSs. This suggests that IQ and effortful control are not shared predictors of smoking and education (i.e., no indirect health-related selection related to cognitive skills). Instead, PGSSMOK and PGSEDU, partly through their associations with lower cognitive skills, predicted selection into a lower educational track, which in turn was associated with more smoking, in line with social causation explanations. Our findings suggest that educational differences in the social context contribute to associations between genetic liabilities and educational inequalities in smoking.
Collapse
Affiliation(s)
- Heiko Schmengler
- Department of Interdisciplinary Social Science, Utrecht University, the Netherlands.
| | - Albertine J Oldehinkel
- Interdisciplinary Center Psychopathology and Emotion Regulation, Department of Psychiatry, University Medical Center of Groningen, University of Groningen, the Netherlands
| | - Wilma A M Vollebergh
- Department of Interdisciplinary Social Science, Utrecht University, the Netherlands
| | - Joëlle A Pasman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Sweden
| | - Catharina A Hartman
- Interdisciplinary Center Psychopathology and Emotion Regulation, Department of Psychiatry, University Medical Center of Groningen, University of Groningen, the Netherlands
| | | | - Ilja M Nolte
- Department of Epidemiology, University Medical Center of Groningen, University of Groningen, the Netherlands
| | - Margot Peeters
- Department of Interdisciplinary Social Science, Utrecht University, the Netherlands
| |
Collapse
|
144
|
Jeng XJ, Hu Y, Venkat V, Lu TP, Tzeng JY. Transfer learning with false negative control improves polygenic risk prediction. PLoS Genet 2023; 19:e1010597. [PMID: 38011285 PMCID: PMC10723713 DOI: 10.1371/journal.pgen.1010597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 12/15/2023] [Accepted: 11/09/2023] [Indexed: 11/29/2023] Open
Abstract
Polygenic risk score (PRS) is a quantity that aggregates the effects of variants across the genome and estimates an individual's genetic predisposition for a given trait. PRS analysis typically contains two input data sets: base data for effect size estimation and target data for individual-level prediction. Given the availability of large-scale base data, it becomes more common that the ancestral background of base and target data do not perfectly match. In this paper, we treat the GWAS summary information obtained in the base data as knowledge learned from a pre-trained model, and adopt a transfer learning framework to effectively leverage the knowledge learned from the base data that may or may not have similar ancestral background as the target samples to build prediction models for target individuals. Our proposed transfer learning framework consists of two main steps: (1) conducting false negative control (FNC) marginal screening to extract useful knowledge from the base data; and (2) performing joint model training to integrate the knowledge extracted from base data with the target training data for accurate trans-data prediction. This new approach can significantly enhance the computational and statistical efficiency of joint-model training, alleviate over-fitting, and facilitate more accurate trans-data prediction when heterogeneity level between target and base data sets is small or high.
Collapse
Affiliation(s)
- Xinge Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Yifei Hu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Vaishnavi Venkat
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Tzu-Pin Lu
- Institute of Health Data Analytics and Statistics, National Taiwan University, Taipei, Taiwan
- Department of Public Health, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Institute of Health Data Analytics and Statistics, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
145
|
Abstract
Polygenic risk scores (PRS) estimate genetic susceptibility of an individual to disease and have the potential of providing utility in multiple clinical contexts. However, their performance, computation, and reporting in diverse populations remain challenging. Here, we present a pragmatic approach to optimize a PRS for a population of interest that leverages publicly available data and methods and consists of seven steps that are easily implemented without the requirement of expertise in complex genetics: step 1, selecting source genome-wide association studies (GWAS) and imputation; step 2, selecting methods to compute polygenic score; step 3, adjusting scores using principal components of genetic ancestry; step 4, selecting the best performing score; step 5, defining percentiles of a population distribution; step 6, validating performance of the optimized polygenic score; and step 7, implementing the optimized polygenic score in clinical practice. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Aniruddh P Patel
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts
| | - Akl C Fahed
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, Massachusetts
| |
Collapse
|
146
|
Malanchini M, Allegrini AG, Nivard MG, Biroli P, Rimfeld K, Cheesman R, von Stumm S, Demange PA, van Bergen E, Grotzinger AD, Raffington L, De la Fuente J, Pingault JB, Harden KP, Tucker-Drob EM, Plomin R. Genetic contributions of noncognitive skills to academic development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535380. [PMID: 37066409 PMCID: PMC10103958 DOI: 10.1101/2023.04.03.535380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Noncognitive skills such as motivation and self-regulation, are partly heritable and predict academic achievement beyond cognitive skills. However, how the relationship between noncognitive skills and academic achievement changes over development is unclear. The current study examined how cognitive and noncognitive skills contribute to academic achievement from ages 7 to 16 in a sample of over 10,000 children from England and Wales. Noncognitive skills were increasingly predictive of academic achievement across development. Twin and polygenic scores analyses found that the contribution of noncognitive genetics to academic achievement became stronger over the school years. Results from within-family analyses indicated that associations with noncognitive genetics could not simply be attributed to confounding by environmental differences between nuclear families and are consistent with a possible role for evocative/active gene-environment correlations. By studying genetic effects through a developmental lens, we provide novel insights into the role of noncognitive skills in academic development.
Collapse
Affiliation(s)
- Margherita Malanchini
- School of Biological and Behavioural Sciences, Queen Mary University of London, United Kingdom
- Social, Genetic and Developmental Psychiatry Centre, King’s College London, United Kingdom
| | - Andrea G. Allegrini
- Social, Genetic and Developmental Psychiatry Centre, King’s College London, United Kingdom
- Department of Clinical, Educational and Health Psychology, University College London, United Kingdom
| | - Michel G. Nivard
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
| | - Pietro Biroli
- Department of Economics, Universita’ di Bologna, Bologna, Italy
| | - Kaili Rimfeld
- Social, Genetic and Developmental Psychiatry Centre, King’s College London, United Kingdom
- Royal Holloway University of London, United Kingdom
| | - Rosa Cheesman
- PROMENTA Research Center, Department of Psychology, University of Oslo, Oslo, Norway
| | | | - Perline A. Demange
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Research Institute LEARN!, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Amsterdam Public Health Research Institute, Mental Health, Amsterdam, the Netherlands
| | - Elsje van Bergen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Research Institute LEARN!, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands
- Amsterdam Public Health Research Institute, Mental Health, Amsterdam, the Netherlands
| | - Andrew D. Grotzinger
- Institute for Behavioral Genetics, University of Colorado Boulder, United States
| | - Laurel Raffington
- Max Planck Research Group Biosocial – Biology, Social Disparities, and Development; Max Planck Institute for Human Development, Berlin, Germany
| | | | - Jean-Baptiste Pingault
- Department of Clinical, Educational and Health Psychology, University College London, United Kingdom
| | - K. Paige Harden
- Department of Psychology, The University of Texas at Austin, United States
| | | | - Robert Plomin
- Social, Genetic and Developmental Psychiatry Centre, King’s College London, United Kingdom
| |
Collapse
|
147
|
Yang S, Sun D, Sun Z, Yu C, Guo Y, Si J, Sun D, Pang Y, Pei P, Yang L, Millwood IY, Walters RG, Chen Y, Du H, Pang Z, Schmidt D, Stevens R, Clarke R, Chen J, Chen Z, Lv J, Li L. Minimal improvement in coronary artery disease risk prediction in Chinese population using polygenic risk scores: evidence from the China Kadoorie Biobank. Chin Med J (Engl) 2023; 136:2476-2483. [PMID: 37200020 PMCID: PMC10586831 DOI: 10.1097/cm9.0000000000002694] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Indexed: 05/19/2023] Open
Abstract
BACKGROUND Several studies have reported that polygenic risk scores (PRSs) can enhance risk prediction of coronary artery disease (CAD) in European populations. However, research on this topic is far from sufficient in non-European countries, including China. We aimed to evaluate the potential of PRS for predicting CAD for primary prevention in the Chinese population. METHODS Participants with genome-wide genotypic data from the China Kadoorie Biobank were divided into training ( n = 28,490) and testing sets ( n = 72,150). Ten previously developed PRSs were evaluated, and new ones were developed using clumping and thresholding or LDpred method. The PRS showing the strongest association with CAD in the training set was selected to further evaluate its effects on improving the traditional CAD risk-prediction model in the testing set. Genetic risk was computed by summing the product of the weights and allele dosages across genome-wide single-nucleotide polymorphisms. Prediction of the 10-year first CAD events was assessed using hazard ratios (HRs) and measures of model discrimination, calibration, and net reclassification improvement (NRI). Hard CAD (nonfatal I21-I23 and fatal I20-I25) and soft CAD (all fatal or nonfatal I20-I25) were analyzed separately. RESULTS In the testing set, 1214 hard and 7201 soft CAD cases were documented during a mean follow-up of 11.2 years. The HR per standard deviation of the optimal PRS was 1.26 (95% CI:1.19-1.33) for hard CAD. Based on a traditional CAD risk prediction model containing only non-laboratory-based information, the addition of PRS for hard CAD increased Harrell's C index by 0.001 (-0.001 to 0.003) in women and 0.003 (0.001 to 0.005) in men. Among the different high-risk thresholds ranging from 1% to 10%, the highest categorical NRI was 3.2% (95% CI: 0.4-6.0%) at a high-risk threshold of 10.0% in women. The association of the PRS with soft CAD was much weaker than with hard CAD, leading to minimal or no improvement in the soft CAD model. CONCLUSIONS In this Chinese population sample, the current PRSs minimally changed risk discrimination and offered little improvement in risk stratification for soft CAD. Therefore, this may not be suitable for promoting genetic screening in the general Chinese population to improve CAD risk prediction.
Collapse
Affiliation(s)
- Songchun Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Department of Dermatology, Xiangya Hospital, Central South University, Changsha, Hunan 410008, China
| | - Dong Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
| | - Zhijia Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
| | - Canqing Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing 100191, China
| | - Yu Guo
- Fuwai Hospital Chinese Academy of Medical Sciences, Beijing 100730, China
| | - Jiahui Si
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
| | - Dianjianyi Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing 100191, China
| | - Yuanjie Pang
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
| | - Pei Pei
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing 100191, China
| | - Ling Yang
- Medical Research Council Population Health Research Unit at the University of Oxford, Oxford OX3 7LF, UK
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Iona Y. Millwood
- Medical Research Council Population Health Research Unit at the University of Oxford, Oxford OX3 7LF, UK
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Robin G. Walters
- Medical Research Council Population Health Research Unit at the University of Oxford, Oxford OX3 7LF, UK
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Yiping Chen
- Medical Research Council Population Health Research Unit at the University of Oxford, Oxford OX3 7LF, UK
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Huaidong Du
- Medical Research Council Population Health Research Unit at the University of Oxford, Oxford OX3 7LF, UK
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Zengchang Pang
- Qingdao Center of Disease Control and Prevention, Qingdao, Shandong 266033, China
| | - Dan Schmidt
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Rebecca Stevens
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Robert Clarke
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Junshi Chen
- China National Center for Food Safety Risk Assessment, Beijing 100738, China
| | - Zhengming Chen
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK
| | - Jun Lv
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing 100191, China
| | - Liming Li
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing 100191, China
| |
Collapse
|
148
|
Liu R, Li D, Haritunians T, Ruan Y, Daly MJ, Huang H, McGovern DP. Profiling the inflammatory bowel diseases using genetics, serum biomarkers, and smoking information. iScience 2023; 26:108053. [PMID: 37841595 PMCID: PMC10568094 DOI: 10.1016/j.isci.2023.108053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 07/28/2023] [Accepted: 09/22/2023] [Indexed: 10/17/2023] Open
Abstract
Crohn's disease (CD) and ulcerative colitis (UC) are two etiologically related yet distinctive subtypes of the inflammatory bowel diseases (IBD). Differentiating CD from UC can be challenging using conventional clinical approaches in a subset of patients. We designed and evaluated a novel molecular-based prediction model aggregating genetics, serum biomarkers, and tobacco smoking information to assist the diagnosis of CD and UC in over 30,000 samples. A joint model combining genetics, serum biomarkers and smoking explains 46% (42-50%, 95% CI) of phenotypic variation. Despite modest overlaps with serum biomarkers, genetics makes unique contributions to distinguishing IBD subtypes. Smoking status only explains 1% (0-6%, 95% CI) of the phenotypic variance suggesting it may not be an effective biomarker. This study reveals that molecular-based models combining genetics, serum biomarkers, and smoking information could complement current diagnostic strategies and help classify patients based on biologic state rather than imperfect clinical parameters.
Collapse
Affiliation(s)
- Ruize Liu
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Dalin Li
- F. Widjaja Family Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Talin Haritunians
- F. Widjaja Family Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| | - Yunfeng Ruan
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Hailiang Huang
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Dermot P.B. McGovern
- F. Widjaja Family Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA
| |
Collapse
|
149
|
Ward J, Lyall LM, Cullen B, Strawbridge RJ, Zhu X, Stanciu I, Aman A, Niedzwiedz CL, Anderson J, Bailey MES, Lyall DM, Pell JP. Consistent effects of the genetics of happiness across the lifespan and ancestries in multiple cohorts. Sci Rep 2023; 13:17262. [PMID: 37828061 PMCID: PMC10570373 DOI: 10.1038/s41598-023-43193-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 09/20/2023] [Indexed: 10/14/2023] Open
Abstract
Happiness is a fundamental human affective trait, but its biological basis is not well understood. Using a novel approach, we construct LDpred-inf polygenic scores of a general happiness measure in 2 cohorts: the Adolescent Brain Cognitive Development (ABCD) cohort (N = 15,924, age range 9.23-11.8 years), the Add Health cohort (N = 9129, age range 24.5-34.7) to determine associations with several well-being and happiness measures. Additionally, we investigated associations between genetic scores for happiness and brain structure in ABCD (N = 9626, age range (8.9-11) and UK Biobank (N = 16,957, age range 45-83). We detected significant (p.FDR < 0.05) associations between higher genetic scores vs. several well-being measures (best r2 = 0.019) in children of multiple ancestries in ABCD and small yet significant correlations with a happiness measure in European participants in Add Health (r2 = 0.004). Additionally, we show significant associations between lower genetic scores for happiness with smaller structural brain phenotypes in a white British subsample of UK Biobank and a white sub-sample group of ABCD. We demonstrate that the genetic basis for general happiness level appears to have a consistent effect on happiness and wellbeing measures throughout the lifespan, across multiple ancestral backgrounds, and multiple brain structures.
Collapse
Affiliation(s)
- Joey Ward
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK.
| | - Laura M Lyall
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Breda Cullen
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Rona J Strawbridge
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
- Cardiovascular Medicine Unit, Department of Medicine Solna, Karolinska Institutet, Stockholm, Sweden
- Health Data Research UK, Glasgow, UK
| | - Xingxing Zhu
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Ioana Stanciu
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Alisha Aman
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Claire L Niedzwiedz
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Jana Anderson
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Mark E S Bailey
- School of Life Sciences, University of Glasgow, Glasgow, Scotland, UK
| | - Donald M Lyall
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| | - Jill P Pell
- School of Health and Wellbeing, University of Glasgow, 1 Lilybank Gardens, Glasgow, G12 8RZ, UK
| |
Collapse
|
150
|
Jeon S, Lo YC, Morimoto LM, Metayer C, Ma X, Wiemels JL, de Smith AJ, Chiang CWK. Evaluating genomic polygenic risk scores for childhood acute lymphoblastic leukemia in Latinos. HGG ADVANCES 2023; 4:100239. [PMID: 37710962 PMCID: PMC10550840 DOI: 10.1016/j.xhgg.2023.100239] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/08/2023] [Accepted: 09/08/2023] [Indexed: 09/16/2023] Open
Abstract
The utility of polygenic risk score (PRS) models has not been comprehensively evaluated for childhood acute lymphoblastic leukemia (ALL), the most common type of cancer in children. Previous PRS models for ALL were based on significant loci observed in genome-wide association studies (GWASs), even though genomic PRS models have been shown to improve prediction performance for a number of complex diseases. In the United States, Latino (LAT) children have the highest risk of ALL, but the transferability of PRS models to LAT children has not been studied. In this study, we constructed and evaluated genomic PRS models based on either non-Latino White (NLW) GWAS or a multi-ancestry GWAS. We found that the best PRS models performed similarly between held-out NLW and LAT samples (PseudoR2 = 0.086 ± 0.023 in NLW vs. 0.060 ± 0.020 in LAT), and can be improved for LAT if we performed GWAS in LAT-only (PseudoR2 = 0.116 ± 0.026) or multi-ancestry samples (PseudoR2 = 0.131 ± 0.025). However, the best genomic models currently do not have better prediction accuracy than a conventional model using all known ALL-associated loci in the literature (PseudoR2 = 0.166 ± 0.025), which includes loci from GWAS populations that we could not access to train genomic PRS models. Our results suggest that larger and more inclusive GWASs may be needed for genomic PRS to be useful for ALL. Moreover, the comparable performance between populations may suggest a more oligogenic architecture for ALL, where some large effect loci may be shared between populations. Future PRS models that move away from the infinite causal loci assumption may further improve PRS for ALL.
Collapse
Affiliation(s)
- Soyoung Jeon
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Ying Chu Lo
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Libby M Morimoto
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
| | - Catherine Metayer
- Division of Epidemiology and Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, CA, USA
| | - Xiaomei Ma
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, CT, USA
| | - Joseph L Wiemels
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Adam J de Smith
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|