1
|
Herrera-Luis E, Benke K, Volk H, Ladd-Acosta C, Wojcik GL. Gene-environment interactions in human health. Nat Rev Genet 2024; 25:768-784. [PMID: 38806721 DOI: 10.1038/s41576-024-00731-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2024] [Indexed: 05/30/2024]
Abstract
Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.
Collapse
Affiliation(s)
- Esther Herrera-Luis
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Kelly Benke
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Heather Volk
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Genevieve L Wojcik
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
| |
Collapse
|
2
|
Khrennikov A, Iryama S, Basieva I, Sato K. Quantum-like environment adaptive model for creation of phenotype. Biosystems 2024; 242:105261. [PMID: 38964651 DOI: 10.1016/j.biosystems.2024.105261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2024] [Revised: 06/26/2024] [Accepted: 06/26/2024] [Indexed: 07/06/2024]
Abstract
The textbook conceptualization of phenotype creation, "genotype (G) + environment (E) + genotype & environment interactions (GE) ↦ phenotype (Ph)", is modeled with open quantum systems theory (OQST) or more generally with adaptive dynamics theory (ADT). The model is quantum-like, i.e., it is not about quantum physical processes in biosystems. Generally such modeling is about applications of the quantum formalism and methodology outside of physics. Macroscopic biosystems, in our case genotypes and phenotypes, are treated as information processors which functioning matches the laws of quantum information theory. Phenotypes are the outputs of the E-adaptation processes described by the quantum master equation, Gorini-Kossakowski-Sudarshan-Lindblad equation (GKSL). Its stationary states correspond to phenotypes. We highlight the class of GKSL dynamics characterized by the camel-like graphs of (von Neumann) entropy: in the process of E-adaptation phenotype's state entropy (disorder) first increases and then falls down - a stable and well-ordered phenotype is created. Traits, an organism's phenotypic characteristics, are modeled within the quantum measurement theory, as generally unsharp observables given by positive operator valued measures (POVMs. This paper is also a review on the methods and mathematical apparatus of quantum information biology.
Collapse
Affiliation(s)
- Andrei Khrennikov
- Linnaeus University, International Center for Mathematical Modeling in Physics and Cognitive Sciences Växjö, SE-351 95, Sweden.
| | - Satoshi Iryama
- Tokyo University of Science, Faculty of Science and Technology, Department of Information Sciences, Noda City, Chiba 278-8510, Japan
| | - Irina Basieva
- Linnaeus University, International Center for Mathematical Modeling in Physics and Cognitive Sciences Växjö, SE-351 95, Sweden
| | - Keiko Sato
- Tokyo University of Science, Faculty of Science and Technology, Department of Information Sciences, Noda City, Chiba 278-8510, Japan
| |
Collapse
|
3
|
Pazokitoroudi A, Liu Z, Dahl A, Zaitlen N, Rosset S, Sankararaman S. A scalable and robust variance components method reveals insights into the architecture of gene-environment interactions underlying complex traits. Am J Hum Genet 2024; 111:1462-1480. [PMID: 38866020 PMCID: PMC11267529 DOI: 10.1016/j.ajhg.2024.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 05/15/2024] [Accepted: 05/15/2024] [Indexed: 06/14/2024] Open
Abstract
Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex.
Collapse
Affiliation(s)
- Ali Pazokitoroudi
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Andrew Dahl
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Noah Zaitlen
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Neurology, UCLA, Los Angeles, CA, USA
| | - Saharon Rosset
- Department of Statistics, Tel-Aviv University, Tel-Aviv, Israel
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA; Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Lin WY. Detecting gene-environment interactions from multiple continuous traits. Bioinformatics 2024; 40:btae419. [PMID: 38917408 PMCID: PMC11254352 DOI: 10.1093/bioinformatics/btae419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 06/17/2024] [Accepted: 06/24/2024] [Indexed: 06/27/2024] Open
Abstract
MOTIVATION Genetic variants present differential effects on humans according to various environmental exposures, the so-called "gene-environment interactions" (GxE). Many diseases can be diagnosed with multiple traits, such as obesity, diabetes, and dyslipidemia. I developed a multivariate scale test (MST) for detecting the GxE of a disease with several continuous traits. Given a significant MST result, I continued to search for which trait and which E enriched the GxE signals. Simulation studies were performed to compare MST with the univariate scale test (UST). RESULTS MST can gain more power than UST because of (1) integrating more traits with GxE information and (2) the less harsh penalty on multiple testing. However, if only few traits account for GxE, MST may lose power due to aggregating non-informative traits into the test statistic. As an example, MST was applied to a discovery set of 93 708 Taiwan Biobank (TWB) individuals and a replication set of 25 200 TWB individuals. From among 2 570 487 SNPs with minor allele frequencies ≥5%, MST identified 18 independent variance quantitative trait loci (P < 2.4E-9 in the discovery cohort and P < 2.8E-5 in the replication cohort) and 41 GxE signals (P < .00027) based on eight trait domains (including 29 traits). AVAILABILITY AND IMPLEMENTATION https://github.com/WanYuLin/Multivariate-scale-test-MST.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei 100, Taiwan
- Master of Public Health Degree Program, College of Public Health, National Taiwan University, Taipei 100, Taiwan
| |
Collapse
|
5
|
Tiezzi F, Goda K, Morgante F. Using lifestyle information in polygenic modeling of blood pressure traits: a simple method to reduce bias. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.05.597631. [PMID: 38895222 PMCID: PMC11185601 DOI: 10.1101/2024.06.05.597631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Complex traits are determined by the effects of multiple genetic variants, multiple environmental factors, and potentially their interaction. Predicting complex trait phenotypes from genotypes is a fundamental task in quantitative genetics that was pioneered in agricultural breeding for selection purposes. However, it has recently become important in human genetics. While prediction accuracy for some human complex traits is appreciable, this remains low for most traits. A promising way to improve prediction accuracy is by including not only genetic information but also environmental information in prediction models. However, environmental factors can, in turn, be genetically determined. This phenomenon gives rise to a correlation between the genetic and environmental components of the phenotype, which violates the assumption of independence between the genetic and environmental components of most statistical methods for polygenic modeling. In this work, we investigated the impact of including 27 lifestyle variables as well as genotype information (and their interaction) for predicting diastolic blood pressure, systolic blood pressure, and pulse pressure in older individuals in UK Biobank. The 27 lifestyle variables were included as either raw variables or adjusted by genetic and other non-genetic factors. The results show that including both lifestyle and genetic data improved prediction accuracy compared to using either piece of information alone. Both prediction accuracy and bias can improve substantially for some traits when the models account for the lifestyle variables after their proper adjustment. Our work confirms the utility of including environmental information in polygenic models of complex traits and highlights the importance of proper handling of the environmental variables.
Collapse
Affiliation(s)
- Francesco Tiezzi
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Florence, Italy
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Khushi Goda
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Fabio Morgante
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| |
Collapse
|
6
|
Deng T, Li K, Du L, Liang M, Qian L, Xue Q, Qiu S, Xu L, Zhang L, Gao X, Lan X, Li J, Gao H. Genome-Wide Gene-Environment Interaction Analysis Identifies Novel Candidate Variants for Growth Traits in Beef Cattle. Animals (Basel) 2024; 14:1695. [PMID: 38891742 PMCID: PMC11171348 DOI: 10.3390/ani14111695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 05/24/2024] [Accepted: 05/30/2024] [Indexed: 06/21/2024] Open
Abstract
Complex traits are widely considered to be the result of a compound regulation of genes, environmental factors, and genotype-by-environment interaction (G × E). The inclusion of G × E in genome-wide association analyses is essential to understand animal environmental adaptations and improve the efficiency of breeding decisions. Here, we systematically investigated the G × E of growth traits (including weaning weight, yearling weight, 18-month body weight, and 24-month body weight) with environmental factors (farm and temperature) using genome-wide genotype-by-environment interaction association studies (GWEIS) with a dataset of 1350 cattle. We validated the robust estimator's effectiveness in GWEIS and detected 29 independent interacting SNPs with a significance threshold of 1.67 × 10-6, indicating that these SNPs, which do not show main effects in traditional genome-wide association studies (GWAS), may have non-additive effects across genotypes but are obliterated by environmental means. The gene-based analysis using MAGMA identified three genes that overlapped with the GEWIS results exhibiting G × E, namely SMAD2, PALMD, and MECOM. Further, the results of functional exploration in gene-set analysis revealed the bio-mechanisms of how cattle growth responds to environmental changes, such as mitotic or cytokinesis, fatty acid β-oxidation, neurotransmitter activity, gap junction, and keratan sulfate degradation. This study not only reveals novel genetic loci and underlying mechanisms influencing growth traits but also transforms our understanding of environmental adaptation in beef cattle, thereby paving the way for more targeted and efficient breeding strategies.
Collapse
Affiliation(s)
- Tianyu Deng
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
- Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang 712100, China;
| | - Keanning Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lili Du
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Mang Liang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Li Qian
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Qingqing Xue
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Shiyuan Qiu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lingyang Xu
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Lupei Zhang
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Xue Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Xianyong Lan
- Shaanxi Key Laboratory of Molecular Biology for Agriculture, College of Animal Science and Technology, Northwest A&F University, Yangling, Xianyang 712100, China;
| | - Junya Li
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| | - Huijiang Gao
- Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing 100193, China; (T.D.); (K.L.); (L.D.); (M.L.); (L.Q.); (Q.X.); (S.Q.); (L.X.); (L.Z.); (X.G.)
| |
Collapse
|
7
|
Dong Z, Jiang W, Li H, DeWan AT, Zhao H. LDER-GE estimates phenotypic variance component of gene-environment interactions in human complex traits accurately with GE interaction summary statistics and full LD information. Brief Bioinform 2024; 25:bbae335. [PMID: 38980374 PMCID: PMC11232466 DOI: 10.1093/bib/bbae335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 06/05/2024] [Accepted: 06/26/2024] [Indexed: 07/10/2024] Open
Abstract
Gene-environment (GE) interactions are essential in understanding human complex traits. Identifying these interactions is necessary for deciphering the biological basis of such traits. In this study, we review state-of-art methods for estimating the proportion of phenotypic variance explained by genome-wide GE interactions and introduce a novel statistical method Linkage-Disequilibrium Eigenvalue Regression for Gene-Environment interactions (LDER-GE). LDER-GE improves the accuracy of estimating the phenotypic variance component explained by genome-wide GE interactions using large-scale biobank association summary statistics. LDER-GE leverages the complete Linkage Disequilibrium (LD) matrix, as opposed to only the diagonal squared LD matrix utilized by LDSC (Linkage Disequilibrium Score)-based methods. Our extensive simulation studies demonstrate that LDER-GE performs better than LDSC-based approaches by enhancing statistical efficiency by ~23%. This improvement is equivalent to a sample size increase of around 51%. Additionally, LDER-GE effectively controls type-I error rate and produces unbiased results. We conducted an analysis using UK Biobank data, comprising 307 259 unrelated European-Ancestry subjects and 966 766 variants, across 217 environmental covariate-phenotype (E-Y) pairs. LDER-GE identified 34 significant E-Y pairs while LDSC-based method only identified 23 significant E-Y pairs with 22 overlapped with LDER-GE. Furthermore, we employed LDER-GE to estimate the aggregated variance component attributed to multiple GE interactions, leading to an increase in the explained phenotypic variance with GE interactions compared to considering main genetic effects only. Our results suggest the importance of impacts of GE interactions on human complex traits.
Collapse
Affiliation(s)
- Zihan Dong
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, United States
- Center for Perinatal, Pediatric and Environmental Epidemiology, 60 College Street, Yale School of Public Health, New Haven, CT 06510, United States
| | - Wei Jiang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, United States
| | - Hongyu Li
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, United States
| | - Andrew T DeWan
- Center for Perinatal, Pediatric and Environmental Epidemiology, 60 College Street, Yale School of Public Health, New Haven, CT 06510, United States
- Department of Chronic Disease Epidemiology, Yale School of Public Health, 60 College Street, New Haven, CT 06510, United States
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06510, United States
| |
Collapse
|
8
|
Durvasula A, Price AL. Distinct explanations underlie gene-environment interactions in the UK Biobank. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.09.22.23295969. [PMID: 37790574 PMCID: PMC10543037 DOI: 10.1101/2023.09.22.23295969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
The role of gene-environment (GxE) interaction in disease and complex trait architectures is widely hypothesized, but currently unknown. Here, we apply three statistical approaches to quantify and distinguish three different types of GxE interaction for a given trait and E variable. First, we detect locus-specific GxE interaction by testing for genetic correlation r g < 1 across E bins. Second, we detect genome-wide effects of the E variable on genetic variance by leveraging polygenic risk scores (PRS) to test for significant PRSxE in a regression of phenotypes on PRS, E, and PRSxE, together with differences in SNP-heritability across E bins. Third, we detect genome-wide proportional amplification of genetic and environmental effects as a function of the E variable by testing for significant PRSxE with no differences in SNP-heritability across E bins. Simulations show that these approaches achieve high sensitivity and specificity in distinguishing these three GxE scenarios. We applied our framework to 33 UK Biobank traits (25 quantitative traits and 8 diseases; average N = 325 K ) and 10 E variables spanning lifestyle, diet, and other environmental exposures. First, we identified 19 trait-E pairs with r g significantly < 1 (FDR<5%) (average r g = 0.95 ); for example, white blood cell count had r g = 0.95 (s.e. 0.01) between smokers and non-smokers. Second, we identified 28 trait-E pairs with significant PRSxE and significant SNP-heritability differences across E bins; for example, BMI had a significant PRSxE for physical activity (P=4.6e-5) with 5% larger SNP-heritability in the largest versus smallest quintiles of physical activity (P=7e-4). Third, we identified 15 trait-E pairs with significant PRSxE with no SNP-heritability differences across E bins; for example, waist-hip ratio adjusted for BMI had a significant PRSxE effect for time spent watching television (P=5e-3) with no SNP-heritability differences. Across the three scenarios, 8 of the trait-E pairs involved disease traits, whose interpretation is complicated by scale effects. Analyses using biological sex as the E variable produced additional significant findings in each of the three scenarios. Overall, we infer a significant contribution of GxE and GxSex effects to complex trait and disease variance.
Collapse
Affiliation(s)
- Arun Durvasula
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Genetics, Harvard Medical School, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Alkes L Price
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
9
|
Hui D, Dudek S, Kiryluk K, Walunas TL, Kullo IJ, Wei WQ, Tiwari HK, Peterson JF, Chung WK, Davis B, Khan A, Kottyan L, Limdi NA, Feng Q, Puckelwartz MJ, Weng C, Smith JL, Karlson EW, Jarvik GP, Ritchie MD. Risk factors affecting polygenic score performance across diverse cohorts. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2023.05.10.23289777. [PMID: 38645167 PMCID: PMC11030495 DOI: 10.1101/2023.05.10.23289777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge GWAS effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
Collapse
Affiliation(s)
- Daniel Hui
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Scott Dudek
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| | - Krzysztof Kiryluk
- Division of Nephrology, Department of Medicine, Columbia University, NY, New York
| | - Theresa L. Walunas
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | | | - Wei-Qi Wei
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Hemant K. Tiwari
- Department of Pediatrics, University of Alabama at Birmingham, Birmingham, AL
| | - Josh F. Peterson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Wendy K. Chung
- Departments of Pediatrics and Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY
| | - Brittney Davis
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Atlas Khan
- Division of Nephrology, Department of Medicine, Columbia University, NY, New York
| | - Leah Kottyan
- The Center for Autoimmune Genomics and Etiology, Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH
| | - Nita A. Limdi
- Department of Neurology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL
| | - Qiping Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Megan J. Puckelwartz
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL
| | - Chunhua Weng
- Department of Biomedical Informatics, Vagelos College of Physicians & Surgeons, Columbia University, New York, NY
| | - Johanna L. Smith
- Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN
| | - Elizabeth W. Karlson
- Division of Rheumatology, Inflammation, and Immunity, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA
| | | | - Gail P. Jarvik
- Departments of Medicine (Medical Genetics) and Genome Sciences, University of Washington Medical Center, Seattle, WA
| | - Marylyn D. Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA
| |
Collapse
|
10
|
Miao J, Wu Y, Lu Q. Statistical methods for gene-environment interaction analysis. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2024; 16:e1635. [PMID: 38699459 PMCID: PMC11064894 DOI: 10.1002/wics.1635] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 09/12/2023] [Indexed: 05/05/2024]
Abstract
Most human complex phenotypes result from multiple genetic and environmental factors and their interactions. Understanding the mechanisms by which genetic and environmental factors interact offers valuable insights into the genetic architecture of complex traits and holds great potential for advancing precision medicine. The emergence of large population biobanks has led to the development of numerous statistical methods aiming at identifying gene-environment interactions (G × E). In this review, we present state-of-the-art statistical methodologies for G × E analysis. We will survey a spectrum of approaches for single-variant G × E mapping, followed by various techniques for polygenic G × E analysis. We conclude this review with a discussion on the future directions and challenges in G × E research.
Collapse
Affiliation(s)
- Jiacheng Miao
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, USA
| | - Yixuan Wu
- University of Wisconsin–Madison, Madison, Wisconsin, USA
| | - Qiongshi Lu
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, USA
- Department of Statistics, University of Wisconsin–Madison, Madison, Wisconsin, USA
- Center for Demography of Health and Aging, University of Wisconsin–Madison, Madison, Wisconsin, USA
| |
Collapse
|
11
|
Pham DT, Westerman KE, Pan C, Chen L, Srinivasan S, Isganaitis E, Vajravelu ME, Bacha F, Chernausek S, Gubitosi-Klug R, Divers J, Pihoker C, Marcovina SM, Manning AK, Chen H. Re-analysis and meta-analysis of summary statistics from gene-environment interaction studies. Bioinformatics 2023; 39:btad730. [PMID: 38039147 PMCID: PMC10724851 DOI: 10.1093/bioinformatics/btad730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/26/2023] [Accepted: 11/30/2023] [Indexed: 12/03/2023] Open
Abstract
MOTIVATION statistics from genome-wide association studies enable many valuable downstream analyses that are more efficient than individual-level data analysis while also reducing privacy concerns. As growing sample sizes enable better-powered analysis of gene-environment interactions, there is a need for gene-environment interaction-specific methods that manipulate and use summary statistics. RESULTS We introduce two tools to facilitate such analysis, with a focus on statistical models containing multiple gene-exposure and/or gene-covariate interaction terms. REGEM (RE-analysis of GEM summary statistics) uses summary statistics from a single, multi-exposure genome-wide interaction study to derive analogous sets of summary statistics with arbitrary sets of exposures and interaction covariate adjustments. METAGEM (META-analysis of GEM summary statistics) extends current fixed-effects meta-analysis models to incorporate multiple exposures from multiple studies. We demonstrate the value and efficiency of these tools by exploring alternative methods of accounting for ancestry-related population stratification in genome-wide interaction study in the UK Biobank as well as by conducting a multi-exposure genome-wide interaction study meta-analysis in cohorts from the diabetes-focused ProDiGY consortium. These programs help to maximize the value of summary statistics from diverse and complex gene-environment interaction studies. AVAILABILITY AND IMPLEMENTATION REGEM and METAGEM are open-source projects freely available at https://github.com/large-scale-gxe-methods/REGEM and https://github.com/large-scale-gxe-methods/METAGEM.
Collapse
Affiliation(s)
- Duy T Pham
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Kenneth E Westerman
- Department of Medicine, Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA 02114, United States
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, United States
| | - Cong Pan
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Ling Chen
- Department of Medicine, Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA 02114, United States
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
| | - Shylaja Srinivasan
- Department of Pediatrics, University of California, San Francisco, CA 94158, United States
| | - Elvira Isganaitis
- Research Division, Joslin Diabetes Center, Boston, MA 02115, United States
| | - Mary Ellen Vajravelu
- Department of Pediatrics, University of Pittsburgh School of Medicine, Pittsburgh, PA 15224, United States
| | - Fida Bacha
- Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, United States
| | - Steve Chernausek
- Department of Pediatrics, The University of Oklahoma College of Medicine, Oklahoma City, OK 73117, United States
| | - Rose Gubitosi-Klug
- Department of Pediatrics, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Jasmin Divers
- Department of Foundations of Medicine, New York University, New York, NY 10016, United States
| | - Catherine Pihoker
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA 98105, United States
| | - Santica M Marcovina
- Northwest Lipid Metabolism and Diabetes Research Laboratories, Department of Medicine, University of Washington, Seattle, WA 98105, United States
| | - Alisa K Manning
- Department of Medicine, Clinical and Translational Epidemiology Unit, Mongan Institute, Massachusetts General Hospital, Boston, MA 02114, United States
- Metabolism Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States
- Department of Medicine, Harvard Medical School, Boston, MA 02115, United States
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| |
Collapse
|
12
|
Sun NA, Wang YU, Chu J, Han Q, Shen Y. Bayesian Approaches in Exploring Gene-environment and Gene-gene Interactions: A Comprehensive Review. Cancer Genomics Proteomics 2023; 20:669-678. [PMID: 38035701 PMCID: PMC10687732 DOI: 10.21873/cgp.20414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 12/02/2023] Open
Abstract
Rapid advancements in high-throughput biological techniques have facilitated the generation of high-dimensional omics datasets, which have provided a solid foundation for precision medicine and prognosis prediction. Nonetheless, the problem of missing heritability persists. To solve this problem, it is essential to explain the genetic structure of disease incidence risk and prognosis by incorporating interactions. The development of the Bayesian theory has provided new approaches for developing models for interaction identification and estimation. Several Bayesian models have been developed to improve the accuracy of model and identify the main effect, gene-environment (G×E) and gene-gene (G×G) interactions. Studies based on single-nucleotide polymorphisms (SNPs) are significant for the exploration of rare and common variants. Models based on the effect heredity principle and group-based models are relatively flexible and do not require strict constraints when dealing with the hierarchical structure between the main effect and interactions (M-I). These models have a good interpretability of biological mechanisms. Machine learning-based Bayesian approaches are highly competitive in improving prediction accuracy. These models provide insights into the mechanisms underlying the occurrence and progression of complex diseases, identify more reliable biomarkers, and develop higher predictive accuracy. In this paper, we provide a comprehensive review of these Bayesian approaches.
Collapse
Affiliation(s)
- N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Y U Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Qiang Han
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| |
Collapse
|
13
|
Di Scipio M, Khan M, Mao S, Chong M, Judge C, Pathan N, Perrot N, Nelson W, Lali R, Di S, Morton R, Petch J, Paré G. A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets. Nat Commun 2023; 14:5196. [PMID: 37626057 PMCID: PMC10457310 DOI: 10.1038/s41467-023-40913-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 08/16/2023] [Indexed: 08/27/2023] Open
Abstract
Identification of gene-by-environment interactions (GxE) is crucial to understand the interplay of environmental effects on complex traits. However, current methods evaluating GxE on biobank-scale datasets have limitations. We introduce MonsterLM, a multiple linear regression method that does not rely on model specification and provides unbiased estimates of variance explained by GxE. We demonstrate robustness of MonsterLM through comprehensive genome-wide simulations using real genetic data from 325,989 individuals. We estimate GxE using waist-to-hip-ratio, smoking, and exercise as the environmental variables on 13 outcomes (N = 297,529-325,989) in the UK Biobank. GxE variance is significant for 8 environment-outcome pairs, ranging from 0.009 - 0.071. The majority of GxE variance involves SNPs without strong marginal or interaction associations. We observe modest improvements in polygenic score prediction when incorporating GxE. Our results imply a significant contribution of GxE to complex trait variance and we show MonsterLM to be well-purposed to handle this with biobank-scale data.
Collapse
Affiliation(s)
- Matteo Di Scipio
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Mohammad Khan
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Shihong Mao
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
| | - Michael Chong
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, ON, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, ON, Canada
| | - Conor Judge
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
| | - Nazia Pathan
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
| | - Nicolas Perrot
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
| | - Walter Nelson
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, ON, Canada
| | - Ricky Lali
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada
| | - Shuang Di
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| | - Robert Morton
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, ON, Canada
| | - Jeremy Petch
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, ON, Canada
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, ON, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada
| | - Guillaume Paré
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, ON, Canada.
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, ON, Canada.
- Department of Pathology and Molecular Medicine, McMaster University, Michael G. DeGroote School of Medicine, Hamilton, ON, Canada.
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, ON, Canada.
| |
Collapse
|
14
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
15
|
Zhou Z, Ku HC, Manning SE, Zhang M, Xing C. A Varying Coefficient Model to Jointly Test Genetic and Gene-Environment Interaction Effects. Behav Genet 2023; 53:374-382. [PMID: 36622576 PMCID: PMC10277225 DOI: 10.1007/s10519-022-10131-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 12/18/2022] [Indexed: 01/10/2023]
Abstract
Most human traits are influenced by the interplay between genetic and environmental factors. Many statistical methods have been proposed to screen for gene-environment interaction (GxE) in the post genome-wide association study era. However, most of the existing methods assume a linear interaction between genetic and environmental factors toward phenotypic variations, which diminishes statistical power in the case of nonlinear GxE. In this paper, we present a flexible statistical procedure to detect GxE regardless of whether the underlying relationship is linear or not. By modeling the joint genetic and GxE effects as a varying-coefficient function of the environmental factor, the proposed model is able to capture dynamic trajectories of GxE. We employ a likelihood ratio test with a fast Monte Carlo algorithm for hypothesis testing. Simulations were conducted to evaluate validity and power of the proposed model in various settings. Real data analysis was performed to illustrate its power, in particular, in the case of nonlinear GxE.
Collapse
Affiliation(s)
- Zhengyang Zhou
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX, USA.
| | - Hung-Chih Ku
- Department of Mathematical Sciences, DePaul University, Chicago, IL, USA
| | - Sydney E Manning
- Department of Pharmacotherapy, University of North Texas Health Science Center, Fort Worth, TX, USA
| | - Ming Zhang
- Department of Statistical Science, Southern Methodist University, Dallas, TX, USA
| | - Chao Xing
- McDermott Center for Human Growth and Development and Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
16
|
Khramtsova EA, Wilson MA, Martin J, Winham SJ, He KY, Davis LK, Stranger BE. Quality control and analytic best practices for testing genetic models of sex differences in large populations. Cell 2023; 186:2044-2061. [PMID: 37172561 PMCID: PMC10266536 DOI: 10.1016/j.cell.2023.04.014] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 01/31/2023] [Accepted: 04/07/2023] [Indexed: 05/15/2023]
Abstract
Phenotypic sex-based differences exist for many complex traits. In other cases, phenotypes may be similar, but underlying biology may vary. Thus, sex-aware genetic analyses are becoming increasingly important for understanding the mechanisms driving these differences. To this end, we provide a guide outlining the current best practices for testing various models of sex-dependent genetic effects in complex traits and disease conditions, noting that this is an evolving field. Insights from sex-aware analyses will not only teach us about the biology of complex traits but also aid in achieving the goals of precision medicine and health equity for all.
Collapse
Affiliation(s)
- Ekaterina A Khramtsova
- Population Analytics and Insights, Data Science Analytics & Insights, Janssen R&D, Lower Gwynedd Township, PA, USA.
| | - Melissa A Wilson
- School of Life Sciences, Center for Evolution and Medicine, Biodesign Center for Mechanisms of Evolution, Arizona State University, Tempe, AZ 85282, USA
| | - Joanna Martin
- Centre for Neuropsychiatric Genetics and Genomics, Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
| | - Stacey J Winham
- Department of Quantitative Health Sciences, Division of Computational Biology, Mayo Clinic, Rochester, MN, USA
| | - Karen Y He
- Population Analytics and Insights, Data Science Analytics & Insights, Janssen R&D, Lower Gwynedd Township, PA, USA
| | - Lea K Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Barbara E Stranger
- Center for Genetic Medicine, Department of Pharmacology, Northwestern University, Chicago, IL, USA.
| |
Collapse
|
17
|
Zhong W, Chhibber A, Luo L, Mehrotra DV, Shen J. A fast and powerful linear mixed model approach for genotype-environment interaction tests in large-scale GWAS. Brief Bioinform 2023; 24:6955097. [PMID: 36545787 DOI: 10.1093/bib/bbac547] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 10/26/2022] [Accepted: 11/12/2022] [Indexed: 12/24/2022] Open
Abstract
Genotype-by-environment interaction (GEI or GxE) plays an important role in understanding complex human traits. However, it is usually challenging to detect GEI signals efficiently and accurately while adjusting for population stratification and sample relatedness in large-scale genome-wide association studies (GWAS). Here we propose a fast and powerful linear mixed model-based approach, fastGWA-GE, to test for GEI effect and G + GxE joint effect. Our extensive simulations show that fastGWA-GE outperforms other existing GEI test methods by controlling genomic inflation better, providing larger power and running hundreds to thousands of times faster. We performed a fastGWA-GE analysis of ~7.27 million variants on 452 249 individuals of European ancestry for 13 quantitative traits and five environment variables in the UK Biobank GWAS data and identified 96 significant signals (72 variants across 57 loci) with GEI test P-values < 1 × 10-9, including 27 novel GEI associations, which highlights the effectiveness of fastGWA-GE in GEI signal discovery in large-scale GWAS.
Collapse
Affiliation(s)
- Wujuan Zhong
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Aparna Chhibber
- Translational Bioinformatics, Bristol Myers Squibb, Lawrenceville, NJ 08540, USA
| | - Lan Luo
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
18
|
Ng HM, Jiang B, Wong KY. Penalized estimation of a class of single-index varying-coefficient models for integrative genomic analysis. Biom J 2023; 65:e2100139. [PMID: 35837982 DOI: 10.1002/bimj.202100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 04/15/2022] [Accepted: 05/27/2022] [Indexed: 01/17/2023]
Abstract
Recent technological advances have made it possible to collect high-dimensional genomic data along with clinical data on a large number of subjects. In the studies of chronic diseases such as cancer, it is of great interest to integrate clinical and genomic data to build a comprehensive understanding of the disease mechanisms. Despite extensive studies on integrative analysis, it remains an ongoing challenge to model the interaction effects between clinical and genomic variables, due to high dimensionality of the data and heterogeneity across data types. In this paper, we propose an integrative approach that models interaction effects using a single-index varying-coefficient model, where the effects of genomic features can be modified by clinical variables. We propose a penalized approach for separate selection of main and interaction effects. Notably, the proposed methods can be applied to right-censored survival outcomes based on a Cox proportional hazards model. We demonstrate the advantages of the proposed methods through extensive simulation studies and provide applications to a motivating cancer genomic study.
Collapse
Affiliation(s)
- Hoi Min Ng
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Binyan Jiang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| |
Collapse
|
19
|
Lu T, Forgetta V, Richards JB, Greenwood CMT. Genetic determinants of polygenic prediction accuracy within a population. Genetics 2022; 222:6762086. [PMID: 36250789 PMCID: PMC9713421 DOI: 10.1093/genetics/iyac158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 10/10/2022] [Indexed: 11/15/2022] Open
Abstract
Genomic risk prediction is on the emerging path toward personalized medicine. However, the accuracy of polygenic prediction varies strongly in different individuals. Based on up to 352,277 European ancestry participants in the UK Biobank, we constructed polygenic risk scores for 15 physiological and biochemical quantitative traits. We identified a total of 185 polygenic prediction variability quantitative trait loci for 11 traits by Levene's test among 254,376 unrelated individuals. We validated the effects of prediction variability quantitative trait loci using an independent test set of 58,927 individuals. For instance, a score aggregating 51 prediction variability quantitative trait locus variants for triglycerides had the strongest Spearman correlation of 0.185 (P-value <1.0 × 10-300) with the squared prediction errors. We found a strong enrichment of complex genetic effects conferred by prediction variability quantitative trait loci compared to risk loci identified in genome-wide association studies, including 89 prediction variability quantitative trait loci exhibiting dominance effects. Incorporation of dominance effects into polygenic risk scores significantly improved polygenic prediction for triglycerides, low-density lipoprotein cholesterol, vitamin D, and platelet. In conclusion, we have discovered and profiled genetic determinants of polygenic prediction variability for 11 quantitative biomarkers. These findings may assist interpretation of genomic risk prediction in various contexts and encourage novel approaches for constructing polygenic risk scores with complex genetic effects.
Collapse
Affiliation(s)
- Tianyuan Lu
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada.,Quantitative Life Sciences Program, McGill University, Montreal, QC H3A 0G4, Canada
| | - Vincenzo Forgetta
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada
| | - John Brent Richards
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada.,Department of Human Genetics, McGill University, Montreal, QC H3A 0G4, Canada.,Department of Twin Research and Genetic Epidemiology, King's College London, London WC2R 2LS, UK
| | - Celia M T Greenwood
- Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, QC H3T 1E2, Canada.,Department of Human Genetics, McGill University, Montreal, QC H3A 0G4, Canada.,Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC H3A 0G4, Canada.,Gerald Bronfman Department of Oncology, McGill University, Montreal, QC H3A 0G4, Canada
| |
Collapse
|
20
|
Hofmeister RJ, Rubinacci S, Ribeiro DM, Buil A, Kutalik Z, Delaneau O. Parent-of-Origin inference for biobanks. Nat Commun 2022; 13:6668. [PMID: 36335127 PMCID: PMC9637181 DOI: 10.1038/s41467-022-34383-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
Abstract
Identical genetic variations can have different phenotypic effects depending on their parent of origin. Yet, studies focusing on parent-of-origin effects have been limited in terms of sample size due to the lack of parental genomes or known genealogies. We propose a probabilistic approach to infer the parent-of-origin of individual alleles that does not require parental genomes nor prior knowledge of genealogy. Our model uses Identity-By-Descent sharing with second- and third-degree relatives to assign alleles to parental groups and leverages chromosome X data in males to distinguish maternal from paternal groups. We combine this with robust haplotype inference and haploid imputation to infer the parent-of-origin for 26,393 UK Biobank individuals. We screen 99 phenotypes for parent-of-origin effects and replicate the discoveries of 6 GWAS studies, confirming signals on body mass index, type 2 diabetes, standing height and multiple blood biomarkers, including the known maternal effect at the MEG3/DLK1 locus on platelet phenotypes. We also report a novel maternal effect at the TERT gene on telomere length, thereby providing new insights on the heritability of this phenotype. All our summary statistics are publicly available to help the community to better characterize the molecular mechanisms leading to parent-of-origin effects and their implications for human health.
Collapse
Affiliation(s)
- Robin J Hofmeister
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Simone Rubinacci
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Diogo M Ribeiro
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Alfonso Buil
- Institute of Biological Psychiatry, Mental Health Services, Copenhagen University Hospital, Copenhagen, Denmark.,Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Zoltán Kutalik
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.,University Center for Primary Care and Public Health, University of Lausanne, Lausanne, Switzerland
| | - Olivier Delaneau
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland. .,Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| |
Collapse
|
21
|
Hecker J, Prokopenko D, Moll M, Lee S, Kim W, Qiao D, Voorhies K, Kim W, Vansteelandt S, Hobbs BD, Cho MH, Silverman EK, Lutz SM, DeMeo DL, Weiss ST, Lange C. A robust and adaptive framework for interaction testing in quantitative traits between multiple genetic loci and exposure variables. PLoS Genet 2022; 18:e1010464. [PMID: 36383614 PMCID: PMC9668174 DOI: 10.1371/journal.pgen.1010464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Accepted: 10/04/2022] [Indexed: 11/17/2022] Open
Abstract
The identification and understanding of gene-environment interactions can provide insights into the pathways and mechanisms underlying complex diseases. However, testing for gene-environment interaction remains a challenge since a.) statistical power is often limited and b.) modeling of environmental effects is nontrivial and such model misspecifications can lead to false positive interaction findings. To address the lack of statistical power, recent methods aim to identify interactions on an aggregated level using, for example, polygenic risk scores. While this strategy can increase the power to detect interactions, identifying contributing genes and pathways is difficult based on these relatively global results. Here, we propose RITSS (Robust Interaction Testing using Sample Splitting), a gene-environment interaction testing framework for quantitative traits that is based on sample splitting and robust test statistics. RITSS can incorporate sets of genetic variants and/or multiple environmental factors. Based on the user's choice of statistical/machine learning approaches, a screening step selects and combines potential interactions into scores with improved interpretability. In the testing step, the application of robust statistics minimizes the susceptibility to main effect misspecifications. Using extensive simulation studies, we demonstrate that RITSS controls the type 1 error rate in a wide range of scenarios, and we show how the screening strategy influences statistical power. In an application to lung function phenotypes and human height in the UK Biobank, RITSS identified highly significant interactions based on subcomponents of genetic risk scores. While the contributing single variant interaction signals are weak, our results indicate interaction patterns that result in strong aggregated effects, providing potential insights into underlying gene-environment interaction mechanisms.
Collapse
Affiliation(s)
- Julian Hecker
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dmitry Prokopenko
- Harvard Medical School, Boston, Massachusetts, United States of America
- Genetics and Aging Unit and McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Matthew Moll
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Sanghun Lee
- Department of Medical Consilience, Division of Medicine, Graduate School, Dankook University, Yongin, South Korea
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kirsten Voorhies
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Population Medicine, PRecisiOn Medicine Translational Research (PROMoTeR) Center, Harvard Pilgrim Health Care, Boston, Massachusetts, United States of America
| | - Woori Kim
- Harvard Medical School, Boston, Massachusetts, United States of America
- Systems Biology and Computer Science Program, Ann Romney Center for Neurological Diseases, Department of Neurology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Stijn Vansteelandt
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
- Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Sharon M. Lutz
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Population Medicine, PRecisiOn Medicine Translational Research (PROMoTeR) Center, Harvard Pilgrim Health Care, Boston, Massachusetts, United States of America
| | - Dawn L. DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Christoph Lange
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
22
|
Song H, Wang X, Guo Y, Ding X. G × EBLUP: A novel method for exploring genotype by environment interactions and genomic prediction. Front Genet 2022; 13:972557. [PMID: 36171888 PMCID: PMC9510768 DOI: 10.3389/fgene.2022.972557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Accepted: 08/01/2022] [Indexed: 11/17/2022] Open
Abstract
Genotype by environment (G × E) interaction is fundamental in the biology of complex traits and diseases. However, most of the existing methods for genomic prediction tend to ignore G × E interaction (GEI). In this study, we proposed the genomic prediction method G × EBLUP by considering GEI. Meanwhile, G × EBLUP can also detect the genome-wide single nucleotide polymorphisms (SNPs) subject to GEI. Using comprehensive simulations and analysis of real data from pigs and maize, we showed that G × EBLUP achieved higher efficiency in mapping GEI SNPs and higher prediction accuracy than the existing methods, and its superiority was more obvious when the GEI variance was large. For pig and maize real data, compared with GBLUP, G × EBLUP showed improvement by 3% in the prediction accuracy for backfat thickness, while our findings indicated that the trait of days to 100 kg of pig was not affected by GEI and G × EBLUP did not improve the accuracy of genomic prediction for the trait. A significant advantage was observed for G × EBLUP in maize; the prediction accuracy was improved by ∼5.0 and 7.7% for grain weight and water content, respectively. Furthermore, G × EBLUP was not influenced by the number of environment levels. It could determine a favourable environment using SNP Bayes factors for each environment, implying that it is a robust and useful method for market-specific animal and plant breeding. We proposed G × EBLUP, a novel method for the estimation of genomic breeding value by considering GEI. This method identified the genome-wide SNPs that were susceptible to GEI and yielded higher genomic prediction accuracies and lower mean squared error compared with the GBLUP method.
Collapse
Affiliation(s)
- Hailiang Song
- Beijing Key Laboratory of Fisheries Biotechnology, Fisheries Science Institute, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China
| | - Xue Wang
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Yi Guo
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing, China
- *Correspondence: Xiangdong Ding, , orcid.org/0000000226842551
| |
Collapse
|
23
|
Zhai S, Zhang H, Mehrotra DV, Shen J. Pharmacogenomics polygenic risk score for drug response prediction using PRS-PGx methods. Nat Commun 2022; 13:5278. [PMID: 36075892 PMCID: PMC9458667 DOI: 10.1038/s41467-022-32407-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/27/2022] [Indexed: 11/23/2022] Open
Abstract
Polygenic risk scores (PRS) have been successfully developed for the prediction of human diseases and complex traits in the past years. For drug response prediction in randomized clinical trials, a common practice is to apply PRS built from a disease genome-wide association study (GWAS) directly to a corresponding pharmacogenomics (PGx) setting. Here, we show that such an approach relies on stringent assumptions about the prognostic and predictive effects of the selected genetic variants. We propose a shift from disease PRS to PGx PRS approaches by simultaneously modeling both the prognostic and predictive effects and further make this shift possible by developing a series of PRS-PGx methods, including a novel Bayesian regression approach (PRS-PGx-Bayes). Simulation studies show that PRS-PGx methods generally outperform the disease PRS methods and PRS-PGx-Bayes is superior to all other PRS-PGx methods. We further apply the PRS-PGx methods to PGx GWAS data from a large cardiovascular randomized clinical trial (IMPROVE-IT) to predict treatment related LDL cholesterol reduction. The results demonstrate substantial improvement of PRS-PGx-Bayes in both prediction accuracy and the capability of capturing the treatment-specific predictive effects while compared with the disease PRS approaches. To try to predict an individual’s drug response using genetic data, most studies have used traditional polygenic risk score (PRS) methods. Here, the authors develop a pharmacogenomics-specific PRS method, which can improve drug response prediction and patient stratification in pharmacogenomics studies.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, 07065, USA
| | - Hong Zhang
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, 07065, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA, 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ, 07065, USA.
| |
Collapse
|
24
|
Shi G. Genome-wide variance quantitative trait locus analysis suggests small interaction effects in blood pressure traits. Sci Rep 2022; 12:12649. [PMID: 35879408 PMCID: PMC9314370 DOI: 10.1038/s41598-022-16908-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 07/18/2022] [Indexed: 11/09/2022] Open
Abstract
Genome-wide variance quantitative trait loci (vQTL) analysis complements genome-wide association study (GWAS) and has the potential to identify novel variants associated with the trait, explain additional trait variance and lead to the identification of factors that modulate the genetic effects. I conducted genome-wide analysis of the UK Biobank data and identified 27 vQTLs associated with systolic blood pressure (SBP), diastolic blood pressure (DBP) and pulse pressure (PP). The top single-nucleotide polymorphisms (SNPs) are enriched for expression QTLs (eQTLs) or splicing QTLs (sQTLs) annotated by GTEx, suggesting their regulatory roles in mediating the associations with blood pressure (BP). Of the 27 vQTLs, 14 are known BP-associated QTLs discovered by GWASs. The heteroscedasticity effects of the 13 novel vQTLs are larger than their genetic main effects, which were not detected by existing GWASs. The total R-squared of the 27 top SNPs due to variance heteroscedasticity is 0.28%, compared with 0.50% owing to their main effects. The overall effect size of the variance heteroscedasticity is small in GWAS SNPs compared with their main effects. For the 411, 384 and 285 GWAS SNPs associated with SBP, DBP and PP, respectively, their heteroscedasticity effects were 0.52%, 0.43%, and 0.16%, and their main effects were 5.13%, 5.61%, and 3.75%, respectively. The number and effects of the vQTLs are small, which suggests that the effects of gene-environment and gene-gene interactions are small. The main effects of the SNPs remain the major source of genetic variance for BP, which would probably be true for other complex traits as well.
Collapse
Affiliation(s)
- Gang Shi
- School of Telecommunications Engineering, Xidian University, 2 South Taibai Road, Xi'an, 710071, Shaanxi, China.
| |
Collapse
|
25
|
Abstract
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and yet, the genetics of most traits is still poorly understood. In this review, we highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field. We cover general issues such as population structure, epistasis and gene-environment interactions, data-related issues such as ancestry diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies, and polygenic risk scores. We emphasize the interconnectedness of these problems and suggest promising avenues to address them.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | - Omer Weissbrod
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
26
|
Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH, Zuo JF, Zhang HQ, Chen Y, Zhang YM. A compressed variance component mixed model for detecting QTNs and QTN-by-environment and QTN-by-QTN interactions in genome-wide association studies. MOLECULAR PLANT 2022; 15:630-650. [PMID: 35202864 DOI: 10.1016/j.molp.2022.02.012] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2021] [Revised: 01/26/2022] [Accepted: 02/19/2022] [Indexed: 05/25/2023]
Abstract
Although genome-wide association studies are widely used to mine genes for quantitative traits, the effects to be estimated are confounded, and the methodologies for detecting interactions are imperfect. To address these issues, the mixed model proposed here first estimates the genotypic effects for AA, Aa, and aa, and the genotypic polygenic background replaces additive and dominance polygenic backgrounds. Then, the estimated genotypic effects are partitioned into additive and dominance effects using a one-way analysis of variance model. This strategy was further expanded to cover QTN-by-environment interactions (QEIs) and QTN-by-QTN interactions (QQIs) using the same mixed-model framework. Thus, a three-variance-component mixed model was integrated with our multi-locus random-SNP-effect mixed linear model (mrMLM) method to establish a new methodological framework, 3VmrMLM, that detects all types of loci and estimates their effects. In Monte Carlo studies, 3VmrMLM correctly detected all types of loci and almost unbiasedly estimated their effects, with high powers and accuracies and a low false positive rate. In re-analyses of 10 traits in 1439 rice hybrids, detection of 269 known genes, 45 known gene-by-environment interactions, and 20 known gene-by-gene interactions strongly validated 3VmrMLM. Further analyses of known genes showed more small (67.49%), minor-allele-frequency (35.52%), and pleiotropic (30.54%) genes, with higher repeatability across datasets (54.36%) and more dominance loci. In addition, a heteroscedasticity mixed model in multiple environments and dimension reduction methods in quite a number of environments were developed to detect QEIs, and variable selection under a polygenic background was proposed for QQI detection. This study provides a new approach for revealing the genetic architecture of quantitative traits.
Collapse
Affiliation(s)
- Mei Li
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Ya-Wen Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China; State Key Laboratory of Cotton Biology, Anyang 455000, China
| | - Ze-Chang Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yu Xiang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Ming-Hui Liu
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Ya-Hui Zhou
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Jian-Fang Zuo
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Han-Qing Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Ying Chen
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yuan-Ming Zhang
- Crop Information Center, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
27
|
Li C, Liang X, Cheng S, Wen Y, Pan C, Zhang H, Chen Y, Zhang J, Zhang Z, Yang X, Meng P, Zhang F. A multi-environments-gene interaction study of anxiety, depression and self-harm in the UK Biobank cohort. J Psychiatr Res 2022; 147:59-66. [PMID: 35026594 DOI: 10.1016/j.jpsychires.2022.01.009] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/26/2021] [Accepted: 01/03/2022] [Indexed: 12/22/2022]
Abstract
The effects of gene-by-environment (G×E) interactions on complex diseases are significant, especially the superimposed effects of multiple environmental factors. However, research on the multi-environments-gene interactions of anxiety, depression, and self-harm is still limited. This study included white individuals (N = 66,041-74,482) from the UK Biobank. We fitted all environmental factors to a single environmental score (ES), and the estimated ES was used to calculate the multiplicative interaction effects between ES and genome-wide SNPs. Heritability was stratified by minor allele frequency (MAF) and linkage disequilibrium (LD). Our research found 10 loci with significant interaction effects, such as rs114830993 (PRICKLE2, P = 2.30 × 10-8), rs151323364 (ASTN2, P = 2.71 × 10-10) and rs536631793 (SYN3, P = 4.09 × 10-8). In addition, we found that G×E heritability has a significant contribution to the depression of Patient Health Questionnaire-9 (PHQ-9) scores (h2G×E (female) = 6.1%, h2G×E (male) = 8.7%). Our research supported the important influence of multi-environments-gene interactions on anxiety, depression, and self-harm and provided clues for the prevention and etiological research of them.
Collapse
Affiliation(s)
- Chun'e Li
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xiao Liang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Shiqiang Cheng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Yan Wen
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Chuyu Pan
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Huijie Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Yujing Chen
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Jingxi Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Zhen Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Xuena Yang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Peilin Meng
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China
| | - Feng Zhang
- Key Laboratory of Trace Elements and Endemic Diseases of National Health and Family Planning Commission, School of Public Health, Health Science Center, Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
28
|
Li X, Song H, Zhang Z, Huang Y, Zhang Q, Ding X. The theory on and software simulating large-scale genomic data for genotype-by-environment interactions. BMC Genomics 2021; 22:877. [PMID: 34865618 PMCID: PMC8647494 DOI: 10.1186/s12864-021-08191-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 11/19/2021] [Indexed: 11/10/2022] Open
Abstract
Background With the emphasis on analysing genotype-by-environment interactions within the framework of genomic selection and genome-wide association analysis, there is an increasing demand for reliable tools that can be used to simulate large-scale genomic data in order to assess related approaches. Results We proposed a theory to simulate large-scale genomic data on genotype-by-environment interactions and added this new function to our developed tool GPOPSIM. Additionally, a simulated threshold trait with large-scale genomic data was also added. The validation of the simulated data indicated that GPOSPIM2.0 is an efficient tool for mimicking the phenotypic data of quantitative traits, threshold traits, and genetically correlated traits with large-scale genomic data while taking genotype-by-environment interactions into account. Conclusions This tool is useful for assessing genotype-by-environment interactions and threshold traits methods.
Collapse
Affiliation(s)
- Xiujin Li
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangdong, 510225, Guangzhou, People's Republic of China
| | - Hailiang Song
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, 510642, People's Republic of China
| | - Yunmao Huang
- Guangdong Provincial Key Laboratory of Waterfowl Healthy Breeding, College of Animal Science & Technology, Zhongkai University of Agriculture and Engineering, Guangdong, 510225, Guangzhou, People's Republic of China
| | - Qin Zhang
- Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, 271001, Taian, China
| | - Xiangdong Ding
- Key Laboratory of Animal Genetics and Breeding of the Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, College of Animal Science and Technology, China Agricultural University, 100193, Beijing, China.
| |
Collapse
|
29
|
Hartiala JA, Hilser JR, Biswas S, Lusis AJ, Allayee H. Gene-Environment Interactions for Cardiovascular Disease. Curr Atheroscler Rep 2021; 23:75. [PMID: 34648097 PMCID: PMC8903169 DOI: 10.1007/s11883-021-00974-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/12/2021] [Indexed: 10/20/2022]
Abstract
PURPOSE OF REVIEW We provide an overview of recent findings with respect to gene-environment (GxE) interactions for cardiovascular disease (CVD) risk and discuss future opportunities for advancing the field. RECENT FINDINGS Over the last several years, GxE interactions for CVD have mostly been identified for smoking and coronary artery disease (CAD) or related risk factors. By comparison, there is more limited evidence for GxE interactions between CVD outcomes and other exposures, such as physical activity, air pollution, diet, and sex. The establishment of large consortia and population-based cohorts, in combination with new computational tools and mouse genetics platforms, can potentially overcome some of the limitations that have hindered human GxE interaction studies and reveal additional association signals for CVD-related traits. The identification of novel GxE interactions is likely to provide a better understanding of the pathogenesis and genetic liability of CVD, with significant implications for healthy lifestyles and therapeutic strategies.
Collapse
Affiliation(s)
- Jaana A Hartiala
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2250 Alcazar Street, CSC202, Los Angeles, CA, 90033, USA
| | - James R Hilser
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2250 Alcazar Street, CSC202, Los Angeles, CA, 90033, USA
- Department of Biochemistry and Molecular Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Subarna Biswas
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2250 Alcazar Street, CSC202, Los Angeles, CA, 90033, USA
- Department of Surgery, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA
| | - Aldons J Lusis
- Department of Medicine, David Geffen School of Medicine of UCLA, Los Angeles, CA, 90095, USA
- Department of Microbiology, David Geffen School of Medicine of UCLA, Los Angeles, CA, 90095, USA
- Department of Human Genetics, David Geffen School of Medicine of UCLA, Los Angeles, CA, 90095, USA
| | - Hooman Allayee
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, 2250 Alcazar Street, CSC202, Los Angeles, CA, 90033, USA.
- Department of Biochemistry and Molecular Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, 90033, USA.
| |
Collapse
|
30
|
Mbatchou J, Barnard L, Backman J, Marcketta A, Kosmicki JA, Ziyatdinov A, Benner C, O'Dushlaine C, Barber M, Boutkov B, Habegger L, Ferreira M, Baras A, Reid J, Abecasis G, Maxwell E, Marchini J. Computationally efficient whole-genome regression for quantitative and binary traits. Nat Genet 2021; 53:1097-1103. [PMID: 34017140 DOI: 10.1038/s41588-021-00870-7] [Citation(s) in RCA: 469] [Impact Index Per Article: 156.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 04/13/2021] [Indexed: 11/08/2022]
Abstract
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | - Aris Baras
- Regeneron Genetics Center, Tarrytown, NY, USA
| | | | | | | | | |
Collapse
|
31
|
Werme J, van der Sluis S, Posthuma D, de Leeuw CA. Genome-wide gene-environment interactions in neuroticism: an exploratory study across 25 environments. Transl Psychiatry 2021; 11:180. [PMID: 33753719 PMCID: PMC7985503 DOI: 10.1038/s41398-021-01288-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 01/25/2021] [Accepted: 02/15/2021] [Indexed: 11/20/2022] Open
Abstract
Gene-environment interactions (GxE) are often suggested to play an important role in the aetiology of psychiatric phenotypes, yet so far, only a handful of genome-wide environment interaction studies (GWEIS) of psychiatric phenotypes have been conducted. Representing the most comprehensive effort of its kind to date, we used data from the UK Biobank to perform a series of GWEIS for neuroticism across 25 broadly conceptualised environmental risk factors (trauma, social support, drug use, physical health). We investigated interactions on the level of SNPs, genes, and gene-sets, and computed interaction-based polygenic risk scores (PRS) to predict neuroticism in an independent sample subset (N = 10,000). We found that the predictive ability of the interaction-based PRSs did not significantly improve beyond that of a traditional PRS based on SNP main effects from GWAS, but detected one variant and two gene-sets showing significant interaction signal after correction for the number of analysed environments. This study illustrates the possibilities and limitations of a comprehensive GWEIS in currently available sample sizes.
Collapse
Affiliation(s)
- Josefin Werme
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands.
| | - Sophie van der Sluis
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands
| | - Danielle Posthuma
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands
- Department of Child and Adolescent Psychology and Psychiatry, section Complex Trait Genetics, Amsterdam Neuroscience, VU University Medical Center, Amsterdam, The Netherlands
| | - Christiaan A de Leeuw
- Department of Complex Trait Genetics, Centre for Neurogenomics and Cognitive Research, VU University, Amsterdam, The Netherlands.
| |
Collapse
|
32
|
Marderstein AR, Davenport ER, Kulm S, Van Hout CV, Elemento O, Clark AG. Leveraging phenotypic variability to identify genetic interactions in human phenotypes. Am J Hum Genet 2021; 108:49-67. [PMID: 33326753 PMCID: PMC7820920 DOI: 10.1016/j.ajhg.2020.11.016] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/23/2020] [Indexed: 12/13/2022] Open
Abstract
Although thousands of loci have been associated with human phenotypes, the role of gene-environment (GxE) interactions in determining individual risk of human diseases remains unclear. This is partly because of the severe erosion of statistical power resulting from the massive number of statistical tests required to detect such interactions. Here, we focus on improving the power of GxE tests by developing a statistical framework for assessing quantitative trait loci (QTLs) associated with the trait means and/or trait variances. When applying this framework to body mass index (BMI), we find that GxE discovery and replication rates are significantly higher when prioritizing genetic variants associated with the variance of the phenotype (vQTLs) compared to when assessing all genetic variants. Moreover, we find that vQTLs are enriched for associations with other non-BMI phenotypes having strong environmental influences, such as diabetes or ulcerative colitis. We show that GxE effects first identified in quantitative traits such as BMI can be used for GxE discovery in disease phenotypes such as diabetes. A clear conclusion is that strong GxE interactions mediate the genetic contribution to body weight and diabetes risk.
Collapse
Affiliation(s)
- Andrew R Marderstein
- Tri-Institutional Program in Computational Biology & Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Institute of Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Emily R Davenport
- Department of Biology, Huck Institutes of the Life Sciences, Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Scott Kulm
- Institute of Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | | - Olivier Elemento
- Institute of Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA; Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY 10021, USA.
| | - Andrew G Clark
- Department of Computational Biology, Cornell University, Ithaca, NY 14850, USA.
| |
Collapse
|
33
|
Kerin M, Marchini J. A non-linear regression method for estimation of gene-environment heritability. Bioinformatics 2020; 36:5632-5639. [PMID: 33367483 PMCID: PMC8023682 DOI: 10.1093/bioinformatics/btaa1079] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 11/27/2020] [Accepted: 12/16/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Gene-environment (GxE) interactions are one of the least studied aspects of the genetic architecture of human traits and diseases. The environment of an individual is inherently high dimensional, evolves through time and can be expensive and time consuming to measure. The UK Biobank study, with all 500,000 participants having undergone an extensive baseline questionnaire, represents a unique opportunity to assess GxE heritability for many traits and diseases in a well powered setting. RESULTS We have developed a randomized Haseman-Elston non-linear regression method applicable when many environmental variables have been measured on each individual. The method (GPLEMMA) simultaneously estimates a linear environmental score (ES) and its GxE heritability. We compare the method via simulation to a whole-genome regression approach (LEMMA) for estimating GxE heritability. We show that GPLEMMA is more computationally efficient than LEMMA on large datasets, and produces results highly correlated with those from LEMMA when applied to simulated data and real data from the UK Biobank. AVAILABILITY Software implementing the GPLEMMA method is available from https://jmarchini.org/gplemma/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matthew Kerin
- Wellcome Trust Center for Human Genetics, Oxford, UK
| | | |
Collapse
|