1
|
Klinkhammer H, Staerk C, Maj C, Krawitz PM, Mayr A. Genetic Prediction Modeling in Large Cohort Studies via Boosting Targeted Loss Functions. Stat Med 2024. [PMID: 39440393 DOI: 10.1002/sim.10249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 09/11/2024] [Accepted: 10/01/2024] [Indexed: 10/25/2024]
Abstract
Polygenic risk scores (PRS) aim to predict a trait from genetic information, relying on common genetic variants with low to medium effect sizes. As genotype data are high-dimensional in nature, it is crucial to develop methods that can be applied to large-scale data (largen $$ n $$ and largep $$ p $$ ). Many PRS tools aggregate univariate summary statistics from genome-wide association studies into a single score. Recent advancements allow simultaneous modeling of variant effects from individual-level genotype data. In this context, we introduced snpboost, an algorithm that applies statistical boosting on individual-level genotype data to estimate PRS via multivariable regression models. By processing variants iteratively in batches, snpboost can deal with large-scale cohort data. Having solved the technical obstacles due to data dimensionality, the methodological scope can now be broadened-focusing on key objectives for the clinical application of PRS. Similar to most methods in this context, snpboost has, so far, been restricted to quantitative and binary traits. Now, we incorporate more advanced alternatives-targeted to the particular aim and outcome. Adapting the loss function extends the snpboost framework to further data situations such as time-to-event and count data. Furthermore, alternative loss functions for continuous outcomes allow us to focus not only on the mean of the conditional distribution but also on other aspects that may be more helpful in the risk stratification of individual patients and can quantify prediction uncertainty, for example, median or quantile regression. This work enhances PRS fitting across multiple model classes previously unfeasible for this data type.
Collapse
Affiliation(s)
- Hannah Klinkhammer
- Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany
- Institute of Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany
| | - Christian Staerk
- Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
- Department of Statistics, TU Dortmund University, Dortmund, Germany
| | - Carlo Maj
- Institute of Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany
- Center for Human Genetics, Philipps-University Marburg, Marburg, Germany
| | - Peter M Krawitz
- Institute of Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute of Medical Biometry, Informatics and Epidemiology, Medical Faculty, University of Bonn, Bonn, Germany
| |
Collapse
|
2
|
Yang L, Sadler MC, Altman RB. Genetic association studies using disease liabilities from deep neural networks. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.01.18.23284383. [PMID: 36712099 PMCID: PMC9882423 DOI: 10.1101/2023.01.18.23284383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The case-control study is a widely used method for investigating the genetic underpinnings of binary traits. However, long-term, prospective cohort studies often grapple with absent or evolving health-related outcomes. Here, we propose two methods, liability and meta, for conducting genome-wide association study (GWAS) that leverage disease liabilities calculated from deep patient phenotyping. Analyzing 38 common traits in ~300,000 UK Biobank participants, we identified an increased number of loci compared to the conventional case-control approach, with high replication rates in larger external GWAS. Further analyses confirmed the disease-specificity of the genetic architecture with the meta method demonstrating higher robustness when phenotypes were imputed with low accuracy. Additionally, polygenic risk scores based on disease liabilities more effectively predicted newly diagnosed cases in the 2022 dataset, which were controls in the earlier 2019 dataset. Our findings demonstrate that integrating high-dimensional phenotypic data into deep neural networks enhances genetic association studies while capturing disease-relevant genetic architecture.
Collapse
Affiliation(s)
- Lu Yang
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| | - Marie C. Sadler
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- University Center for Primary Care and Public Health, Lausanne, 1010, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Russ B. Altman
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University, Stanford, CA, 94305, USA
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
3
|
Pedersen EM, Agerbo E, Plana-Ripoll O, Steinbach J, Krebs MD, Hougaard DM, Werge T, Nordentoft M, Børglum AD, Musliner KL, Ganna A, Schork AJ, Mortensen PB, McGrath JJ, Privé F, Vilhjálmsson BJ. ADuLT: An efficient and robust time-to-event GWAS. Nat Commun 2023; 14:5553. [PMID: 37689771 PMCID: PMC10492844 DOI: 10.1038/s41467-023-41210-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 08/28/2023] [Indexed: 09/11/2023] Open
Abstract
Proportional hazards models have been proposed to analyse time-to-event phenotypes in genome-wide association studies (GWAS). However, little is known about the ability of proportional hazards models to identify genetic associations under different generative models and when ascertainment is present. Here we propose the age-dependent liability threshold (ADuLT) model as an alternative to a Cox regression based GWAS, here represented by SPACox. We compare ADuLT, SPACox, and standard case-control GWAS in simulations under two generative models and with varying degrees of ascertainment as well as in the iPSYCH cohort. We find Cox regression GWAS to be underpowered when cases are strongly ascertained (cases are oversampled by a factor 5), regardless of the generative model used. ADuLT is robust to ascertainment in all simulated scenarios. Then, we analyse four psychiatric disorders in iPSYCH, ADHD, Autism, Depression, and Schizophrenia, with a strong case-ascertainment. Across these psychiatric disorders, ADuLT identifies 20 independent genome-wide significant associations, case-control GWAS finds 17, and SPACox finds 8, which is consistent with simulation results. As more genetic data are being linked to electronic health records, robust GWAS methods that can make use of age-of-onset information will help increase power in analyses for common health outcomes.
Collapse
Affiliation(s)
- Emil M Pedersen
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
| | - Esben Agerbo
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Centre for Integrated Register-based Research at Aarhus University, Aarhus, Denmark
| | - Oleguer Plana-Ripoll
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Department of Clinical Epidemiology, Aarhus University and Aarhus University Hospital, Aarhus, Denmark
| | - Jette Steinbach
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Morten D Krebs
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
| | - David M Hougaard
- Department for Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
- Department of Clinical Sciences, Copenhagen University, Copenhagen, Denmark
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Science, Copenhagen University, Copenhagen, Denmark
| | - Merete Nordentoft
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- CORE- Copenhagen Centre for Research in Mental Health, Mental Health Center-Copenhagen, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
| | - Anders D Børglum
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Biomedicine and iSEQ Centre, Aarhus University, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, CGPM, Aarhus University, Aarhus, Denmark
| | - Katherine L Musliner
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Department of Affective Disorders, Aarhus University Hospital-Psychiatry, Aarhus, Denmark
- Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | - Andrea Ganna
- Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland
| | - Andrew J Schork
- Institute of Biological Psychiatry, Mental Health Center - Sct Hans, Copenhagen University Hospital - Mental Health Services CPH, Copenhagen, Denmark
- Section for Geogenetics, GLOBE Institute, Faculty of Health and Medical Science, Copenhagen University, Copenhagen, Denmark
- Neurogenomics Division, The Translational Genomics Research Institute (TGEN), Phoenix, AZ, USA
| | - Preben B Mortensen
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - John J McGrath
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Queensland Brain Institute, University of Queensland, St Lucia, QLD, Australia
- Queensland Centre for Mental Health Research, The Park Centre for Mental Health, Wacol, QLD, Australia
| | - Florian Privé
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
- Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark.
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark.
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, the Broad Institute of MIT and Harvard, Massachusetts, USA.
| |
Collapse
|
4
|
Ojavee SE, Kousathanas A, Trejo Banos D, Orliac EJ, Patxot M, Läll K, Mägi R, Fischer K, Kutalik Z, Robinson MR. Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis. Nat Commun 2021; 12:2337. [PMID: 33879782 PMCID: PMC8058085 DOI: 10.1038/s41467-021-22538-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 03/17/2021] [Indexed: 01/18/2023] Open
Abstract
While recent advancements in computation and modelling have improved the analysis of complex traits, our understanding of the genetic basis of the time at symptom onset remains limited. Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a sampling scheme that facilitates biobank-scale time-to-event analyses. We show in extensive simulation work the benefits BayesW provides in terms of number of discoveries, model performance and genomic prediction. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches.
Collapse
Affiliation(s)
- Sven E Ojavee
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
| | | | - Daniel Trejo Banos
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Etienne J Orliac
- Scientific Computing and Research Support Unit, University of Lausanne, Lausanne, Switzerland
| | - Marion Patxot
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Kristi Läll
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Reedik Mägi
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Krista Fischer
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
- Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
| | - Zoltan Kutalik
- University Center for Primary Care and Public Health, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | | |
Collapse
|
5
|
Annor FB, Bayakly RA, Morrison RA, Bryan MJ, Gilbert LK, Ivey-Stephenson AZ, Holland KM, Simon TR. Suicide Among Persons With Dementia, Georgia, 2013 to 2016. J Geriatr Psychiatry Neurol 2019; 32:31-39. [PMID: 30477384 PMCID: PMC6690600 DOI: 10.1177/0891988718814363] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
INTRODUCTION Findings from studies examining the relationship between dementia and suicide have been inconsistent. This study examined the characteristics, precipitants, and risk factors for suicide among persons with dementia. METHODS Data from the Georgia Alzheimer's Disease and Related Dementia registry were linked with 2013 to 2016 data from Georgia Vital Records and Georgia Violent Death Reporting System. Descriptive statistics were calculated and logistic regression was used to examine risk factors for suicide. RESULTS Ninety-one Georgia residents with dementia who died by suicide were identified. Among decedents with known circumstances, common precipitants included depressed mood (38.7%) and physical health problems (72.6%). Suicide rate among persons with dementia was 9.3 per 100 000 person-years overall and substantially higher among those diagnosed in the past 12 months (424.5/100 000 person-years). Being male, dementia diagnosis before age 65, and a recent diagnosis of dementia independently predicted suicide, but not depression or cardiovascular diseases. CONCLUSION Prevention strategies that identify at-risk individuals, provide support, and ensure continuity of care for persons diagnosed with dementia may help reduce suicide in this population.
Collapse
Affiliation(s)
- Francis B. Annor
- Epidemic Intelligence Service, Centers for Disease Control and Prevention, Atlanta, Georgia.,Division of Violence Prevention, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
| | | | | | | | - Leah K. Gilbert
- Division of Violence Prevention, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Asha Z. Ivey-Stephenson
- Division of Violence Prevention, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Kristin M. Holland
- Division of Violence Prevention, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
| | - Thomas R. Simon
- Division of Violence Prevention, National Center for Injury Prevention and Control, Centers for Disease Control and Prevention, Atlanta, Georgia
| |
Collapse
|
6
|
Syed H, Jorgensen AL, Morris AP. SurvivalGWAS_SV: software for the analysis of genome-wide association studies of imputed genotypes with "time-to-event" outcomes. BMC Bioinformatics 2017; 18:265. [PMID: 28525968 PMCID: PMC5438515 DOI: 10.1186/s12859-017-1683-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 05/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Analysis of genome-wide association studies (GWAS) with "time to event" outcomes have become increasingly popular, predominantly in the context of pharmacogenetics, where the survival endpoint could be death, disease remission or the occurrence of an adverse drug reaction. However, methodology and software that can efficiently handle the scale and complexity of genetic data from GWAS with time to event outcomes has not been extensively developed. RESULTS SurvivalGWAS_SV is an easy to use software implemented using C# and run on Linux, Mac OS X & Windows operating systems. SurvivalGWAS_SV is able to handle large scale genome-wide data, allowing for imputed genotypes by modelling time to event outcomes under a dosage model. Either a Cox proportional hazards or Weibull regression model is used for analysis. The software can adjust for multiple covariates and incorporate SNP-covariate interaction effects. CONCLUSIONS We introduce a new console application analysis tool for the analysis of GWAS with time to event outcomes. SurvivalGWAS_SV is compatible with high performance parallel computing clusters, thereby allowing efficient and effective analysis of large scale GWAS datasets, without incurring memory issues. With its particular relevance to pharmacogenetic GWAS, SurvivalGWAS_SV will aid in the identification of genetic biomarkers of patient response to treatment, with the ultimate goal of personalising therapeutic intervention for an array of diseases.
Collapse
Affiliation(s)
- Hamzah Syed
- Department of Biostatistics, University of Liverpool, Liverpool, UK.
| | | | - Andrew P Morris
- Department of Biostatistics, University of Liverpool, Liverpool, UK.,Department of Molecular and Clinical Pharmacology, University of Liverpool, Liverpool, UK
| |
Collapse
|
7
|
SurvivalGWAS_Power: a user friendly tool for power calculations in pharmacogenetic studies with "time to event" outcomes. BMC Bioinformatics 2016; 17:523. [PMID: 27931206 PMCID: PMC5146816 DOI: 10.1186/s12859-016-1407-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/03/2016] [Indexed: 11/28/2022] Open
Abstract
Background Power calculators are currently available for the design of genetic association studies of binary phenotypes and quantitative traits, but not for “time to event” outcomes, which are of particular relevance in pharmacogenetics. With the rapid emergence of pharmacogenetic association studies of single nucleotide polymorphisms (SNPs), and the complexity of clinical outcomes they consider, there is a need for software to perform power calculations of time to event data over a range of design scenarios and analytical methodologies. Results We have developed the user friendly software tool SurvivalGWAS_Power to perform power calculations for time to event outcomes over a range of study designs and different analytical approaches. The software calculates the power to detect SNP association with a time to event outcome over a range of study design scenarios. The software enables analyses under a Cox proportional hazards model or Weibull regression model, and can account for treatment and SNP-treatment interaction effects. Simulated data sets can also be generated by SurvivalGWAS_Power to enable analyses with methods that are not currently supported by the power calculator, thereby increasing the flexibility of the software. Conclusions SurvivalGWAS_Power addresses the need for flexible and user-friendly software for power calculations for genetic association studies of time to event outcomes, with particular design features of relevance in pharmacogenetics. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1407-9) contains supplementary material, which is available to authorized users.
Collapse
|
8
|
Lemieux Perreault LP, Legault MA, Asselin G, Dubé MP. genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools. Bioinformatics 2016; 32:3661-3663. [PMID: 27497439 PMCID: PMC5181529 DOI: 10.1093/bioinformatics/btw487] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 06/30/2016] [Accepted: 07/18/2016] [Indexed: 11/24/2022] Open
Abstract
Summary: Genotype imputation is now commonly performed following genome-wide genotyping experiments. Imputation increases the density of analyzed genotypes in the dataset, enabling fine-mapping across the genome. However, the process of imputation using the most recent publicly available reference datasets can require considerable computation power and the management of hundreds of large intermediate files. We have developed genipe, a complete genome-wide imputation pipeline which includes automatic reporting, imputed data indexing and management, and a suite of statistical tests for imputed data commonly used in genetic epidemiology (Sequence Kernel Association Test, Cox proportional hazards for survival analysis, and linear mixed models for repeated measurements in longitudinal studies). Availability and Implementation: The genipe package is an open source Python software and is freely available for non-commercial use (CC BY-NC 4.0) at https://github.com/pgxcentre/genipe. Documentation and tutorials are available at http://pgxcentre.github.io/genipe. Contact:louis-philippe.lemieux.perreault@statgen.org or marie-pierre.dube@statgen.org Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Louis-Philippe Lemieux Perreault
- Beaulieu-Saucier Université de Montréal Pharmacogenomics Centre, Montreal Heart Institute Research Center, Montréal, Canada H1T 1C8
| | - Marc-André Legault
- Beaulieu-Saucier Université de Montréal Pharmacogenomics Centre, Montreal Heart Institute Research Center, Montréal, Canada H1T 1C8.,Department of Biochemistry and molecular medicine, Université de Montréal, Montreal, Canada H3T 1J4
| | - Géraldine Asselin
- Beaulieu-Saucier Université de Montréal Pharmacogenomics Centre, Montreal Heart Institute Research Center, Montréal, Canada H1T 1C8
| | - Marie-Pierre Dubé
- Beaulieu-Saucier Université de Montréal Pharmacogenomics Centre, Montreal Heart Institute Research Center, Montréal, Canada H1T 1C8.,Department of Medicine, Université de Montréal, Montreal, Canada H3T 1J4
| |
Collapse
|