1
|
Zhou H, McPeek MS. Overcoming the "feast or famine" effect: improved interaction testing in genome-wide association studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.580168. [PMID: 38405994 PMCID: PMC10888770 DOI: 10.1101/2024.02.13.580168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
In genetic association analysis of complex traits, detection of interaction (either GxG or GxE) can help to elucidate the genetic architecture and biological mechanisms underlying the trait. Detection of interaction in a genome-wide interaction study (GWIS) can be methodologically challenging for various reasons, including a high burden of multiple comparisons when testing for epistasis between all possible pairs of a set of genomewide variants, as well as heteroscedasticity effects occurring in the presence of GxG or GxE interaction. In this paper, we address the problem of an even more striking phenomenon that we call the "feast or famine" effect that occurs when testing interaction in a genomewide context. We show that in any given GxE GWIS, the type 1 error of standard interaction tests performed genomewide can vary widely from the nominal level, where the actual type 1 error in any given GWIS varies as a predictable function of the observed trait and environmental values. Using standard methods, some GWISs will have systematically underinflated p-values ("feast"), and others will have systematically overinflated p-values ("famine"), which can lead to false detection of interaction, reduced power, inconsistent results across studies, and failure to replicate true signal. This startling phenomenon is specific to detection of interaction in a GWIS, and it may partly explain why such detection has often proved challenging and difficult to replicate. We show that the feast or famine effect occurs across a wide range of GxE analysis methods, including but not limited to (1) testing interaction in a linear or linear mixed model (LMM) using standard approaches such as t-tests/Wald tests, likelihood ratio tests, or score tests; (2) doing a combined interaction-association test in a linear model or LMM using standard approaches such as F-tests or likelihood ratio tests; (3) testing interaction with multiple environments or multiple SNPs, where these are modeled as random effects in a LMM using standard approaches; (4) performing tests of interaction in a GWIS where significance is assessed using permutation of the trait residuals. We show theoretically that the key cause of this phenomenon is which variables are conditioned on in the analysis, and this suggests an approach to correct the problem by changing the way the conditioning is done. Using this insight, we have developed the TINGA method to adjust the interaction test statistics to make their p-values closer to uniform under the null hypothesis. In simulations we show that TINGA both controls type 1 error and improves power. TINGA allows for covariates and population structure through use of a linear mixed model and accounts for heteroscedasticity. We apply TINGA to detection of epistasis in a study of flowering time in Arabidopsis thaliana.
Collapse
Affiliation(s)
- Huanlin Zhou
- Department of Statistics, The University of Chicago, Chicago, Illinois, U.S.A
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, Illinois, U.S.A
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, U.S.A
| |
Collapse
|
2
|
Fu B, Anand P, Anand A, Mefford J, Sankararaman S. A scalable adaptive quadratic kernel method for interpretable epistasis analysis in complex traits. Genome Res 2024; 34:1294-1303. [PMID: 39209554 PMCID: PMC11529862 DOI: 10.1101/gr.279140.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024]
Abstract
Our knowledge of the contribution of genetic interactions (epistasis) to variation in human complex traits remains limited, partly due to the lack of efficient, powerful, and interpretable algorithms to detect interactions. Recently proposed approaches for set-based association tests show promise in improving the power to detect epistasis by examining the aggregated effects of multiple variants. Nevertheless, these methods either do not scale to large Biobank data sets or lack interpretability. We propose QuadKAST, a scalable algorithm focused on testing pairwise interaction effects (quadratic effects) within small to medium-sized sets of genetic variants (window size ≤100) on a trait and provide quantified interpretation of these effects. Comprehensive simulations show that QuadKAST is well-calibrated. Additionally, QuadKAST is highly sensitive in detecting loci with epistatic signals and accurate in its estimation of quadratic effects. We applied QuadKAST to 52 quantitative phenotypes measured in ≈300,000 unrelated white British individuals in the UK Biobank to test for quadratic effects within each of 9515 protein-coding genes. We detect 32 trait-gene pairs across 17 traits and 29 genes that demonstrate statistically significant signals of quadratic effects (accounting for the number of genes and traits tested). Across these trait-gene pairs, the proportion of trait variance explained by quadratic effects is comparable to additive effects, with five pairs having a ratio >1. Our method enables the detailed investigation of epistasis on a large scale, offering new insights into its role and importance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
| | - Prateek Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Aakarsh Anand
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Joel Mefford
- Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, Los Angeles, California 90024, USA
| | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, California 90095, USA;
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
- Department of Computational Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California 90095, USA
| |
Collapse
|
3
|
Lin HY, Mazumder H, Sarkar I, Huang PY, Eeles RA, Kote-Jarai Z, Muir KR, Schleutker J, Pashayan N, Batra J, Neal DE, Nielsen SF, Nordestgaard BG, Grönberg H, Wiklund F, MacInnis RJ, Haiman CA, Travis RC, Stanford JL, Kibel AS, Cybulski C, Khaw KT, Maier C, Thibodeau SN, Teixeira MR, Cannon-Albright L, Brenner H, Kaneva R, Pandha H, Park JY. Cluster effect for SNP-SNP interaction pairs for predicting complex traits. Sci Rep 2024; 14:18677. [PMID: 39134575 PMCID: PMC11319716 DOI: 10.1038/s41598-024-66311-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/01/2024] [Indexed: 08/15/2024] Open
Abstract
Single nucleotide polymorphism (SNP) interactions are the key to improving polygenic risk scores. Previous studies reported several significant SNP-SNP interaction pairs that shared a common SNP to form a cluster, but some identified pairs might be false positives. This study aims to identify factors associated with the cluster effect of false positivity and develop strategies to enhance the accuracy of SNP-SNP interactions. The results showed the cluster effect is a major cause of false-positive findings of SNP-SNP interactions. This cluster effect is due to high correlations between a causal pair and null pairs in a cluster. The clusters with a hub SNP with a significant main effect and a large minor allele frequency (MAF) tended to have a higher false-positive rate. In addition, peripheral null SNPs in a cluster with a small MAF tended to enhance false positivity. We also demonstrated that using the modified significance criterion based on the 3 p-value rules and the bootstrap approach (3pRule + bootstrap) can reduce false positivity and maintain high true positivity. In addition, our results also showed that a pair without a significant main effect tends to have weak or no interaction. This study identified the cluster effect and suggested using the 3pRule + bootstrap approach to enhance SNP-SNP interaction detection accuracy.
Collapse
Affiliation(s)
- Hui-Yi Lin
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Harun Mazumder
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Indrani Sarkar
- Biostatistics and Data Science Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA
| | - Po-Yu Huang
- Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan
| | - Rosalind A Eeles
- The Institute of Cancer Research, London, SM2 5NG, UK
- Royal Marsden NHS Foundation Trust, London, SW3 6JJ, UK
| | | | - Kenneth R Muir
- Division of Population Health, Health Services Research and Primary Care, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
| | - Johanna Schleutker
- Institute of Biomedicine, University of Turku, Turku, Finland
- Department of Medical Genetics, Genomics, Laboratory Division, Turku University Hospital, PO Box 52, 20521, Turku, Finland
| | - Nora Pashayan
- Department of Applied Health Research, University College London, London, WC1E 7HB, UK
- Centre for Cancer Genetic Epidemiology, Department of Oncology, University of Cambridge, Strangeways Laboratory, Worts Causeway, Cambridge, CB1 8RN, UK
| | - Jyotsna Batra
- Australian Prostate Cancer Research Centre-Qld, Institute of Health and Biomedical Innovation and School of Biomedical Science, Queensland University of Technology, Brisbane, QLD, 4059, Australia
- Translational Research Institute, Brisbane, QLD, 4102, Australia
| | - David E Neal
- Nuffield Department of Surgical Sciences, University of Oxford, John Radcliffe Hospital, Room 6603, Level 6, Headley Way, Headington, Oxford, OX3 9DU, UK
- Department of Oncology, University of Cambridge, Addenbrooke's Hospital, Hills Road, Box 279, Cambridge, CB2 0QQ, UK
- Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Sune F Nielsen
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Børge G Nordestgaard
- Faculty of Health and Medical Sciences, University of Copenhagen, 2200, Copenhagen, Denmark
- Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev, 2200, Copenhagen, Denmark
| | - Henrik Grönberg
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Fredrik Wiklund
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, 171 77, Stockholm, Sweden
| | - Robert J MacInnis
- Cancer Epidemiology Division, Cancer Council Victoria, 200 Victoria Parade, East Melbourne, 3002, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Grattan Street, Parkville, VIC, 3010, Australia
| | - Christopher A Haiman
- Center for Genetic Epidemiology, Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, CA, 90015, USA
| | - Ruth C Travis
- Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
| | - Janet L Stanford
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109-1024, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, 98195, USA
| | - Adam S Kibel
- Division of Urologic Surgery, Brigham and Womens Hospital, 75 Francis Street, Boston, MA, 02115, USA
| | - Cezary Cybulski
- International Hereditary Cancer Center, Department of Genetics and Pathology, Pomeranian Medical University, 70-115, Szczecin, Poland
| | - Kay-Tee Khaw
- Clinical Gerontology Unit, University of Cambridge, Cambridge, CB2 2QQ, UK
| | - Christiane Maier
- Humangenetik Tuebingen, Paul-Ehrlich-Str 23, 72076, Tuebingen, Germany
| | - Stephen N Thibodeau
- Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, 55905, USA
| | - Manuel R Teixeira
- Department of Laboratory Genetics, Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- Cancer Genetics Group, IPO Porto Research Center (CI-IPOP)/RISE@CI-IPOP (Health Research Network), Portuguese Oncology Institute of Porto (IPO Porto)/Porto Comprehensive Cancer Center, Porto, Portugal
- School of Medicine and Biomedical Sciences (ICBAS), University of Porto, Porto, Portugal
| | - Lisa Cannon-Albright
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, 84132, USA
- George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, 84148, USA
| | - Hermann Brenner
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), 69120, Heidelberg, Germany
- Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 460, 69120, Heidelberg, Germany
| | - Radka Kaneva
- Molecular Medicine Center, Department of Medical Chemistry and Biochemistry, Medical University of Sofia, Sofia, 2 Zdrave Str., 1431, Sofia, Bulgaria
| | - Hardev Pandha
- The University of Surrey, Guildford, Surrey, GU2 7XH, UK
| | - Jong Y Park
- Department of Cancer Epidemiology, Moffitt Cancer Center, 12902 Magnolia Drive, Tampa, FL, 33612, USA
| |
Collapse
|