1
|
Grinde KE, Browning BL, Reiner AP, Thornton TA, Browning SR. Adjusting for principal components can induce spurious associations in genome-wide association studies in admixed populations. bioRxiv 2024:2024.04.02.587682. [PMID: 38617337 PMCID: PMC11014513 DOI: 10.1101/2024.04.02.587682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/24/2024]
Abstract
Principal component analysis (PCA) is widely used to control for population structure in genome-wide association studies (GWAS). Top principal components (PCs) typically reflect population structure, but challenges arise in deciding how many PCs are needed and ensuring that PCs do not capture other artifacts such as regions with atypical linkage disequilibrium (LD). In response to the latter, many groups suggest performing LD pruning or excluding known high LD regions prior to PCA. However, these suggestions are not universally implemented and the implications for GWAS are not fully understood, especially in the context of admixed populations. In this paper, we investigate the impact of pre-processing and the number of PCs included in GWAS models in African American samples from the Women's Women's Health Initiative SNP Health Association Resource and two Trans-Omics for Precision Medicine Whole Genome Sequencing Project contributing studies (Jackson Heart Study and Genetic Epidemiology of Chronic Obstructive Pulmonary Disease Study). In all three samples, we find the first PC is highly correlated with genome-wide ancestry whereas later PCs often capture local genomic features. The pattern of which, and how many, genetic variants are highly correlated with individual PCs differs from what has been observed in prior studies focused on European populations and leads to distinct downstream consequences: adjusting for such PCs yields biased effect size estimates and elevated rates of spurious associations due to the phenomenon of collider bias. Excluding high LD regions identified in previous studies does not resolve these issues. LD pruning proves more effective, but the optimal choice of thresholds varies across datasets. Altogether, our work highlights unique issues that arise when using PCA to control for ancestral heterogeneity in admixed populations and demonstrates the importance of careful pre-processing and diagnostics to ensure that PCs capturing multiple local genomic features are not included in GWAS models.
Collapse
Affiliation(s)
- Kelsey E. Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, Saint Paul, Minnesota, 55105, USA
| | - Brian L. Browning
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, 98195, USA
| | - Alexander P. Reiner
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109, USA
- Department of Epidemiology, University of Washington, Seattle, Washington, 98195, USA
| | - Timothy A. Thornton
- Regeneron Genetics Center, Tarrytown, New York, 10591, USA
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, Washington, 98195, USA
| |
Collapse
|
2
|
Horimoto AR, Boyken LA, Blue EE, Grinde KE, Nafikov RA, Sohi HK, Nato AQ, Bis JC, Brusco LI, Morelli L, Ramirez A, Dalmasso MC, Temple S, Satizabal C, Browning SR, Seshadri S, Wijsman EM, Thornton TA. Admixture mapping implicates 13q33.3 as ancestry-of-origin locus for Alzheimer disease in Hispanic and Latino populations. HGG Adv 2023; 4:100207. [PMID: 37333771 PMCID: PMC10276158 DOI: 10.1016/j.xhgg.2023.100207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 05/16/2023] [Indexed: 06/20/2023] Open
Abstract
Alzheimer disease (AD) is the most common form of senile dementia, with high incidence late in life in many populations including Caribbean Hispanic (CH) populations. Such admixed populations, descended from more than one ancestral population, can present challenges for genetic studies, including limited sample sizes and unique analytical constraints. Therefore, CH populations and other admixed populations have not been well represented in studies of AD, and much of the genetic variation contributing to AD risk in these populations remains unknown. Here, we conduct genome-wide analysis of AD in multiplex CH families from the Alzheimer Disease Sequencing Project (ADSP). We developed, validated, and applied an implementation of a logistic mixed model for admixture mapping with binary traits that leverages genetic ancestry to identify ancestry-of-origin loci contributing to AD. We identified three loci on chromosome 13q33.3 associated with reduced risk of AD, where associations were driven by Native American (NAM) ancestry. This AD admixture mapping signal spans the FAM155A, ABHD13, TNFSF13B, LIG4, and MYO16 genes and was supported by evidence for association in an independent sample from the Alzheimer's Genetics in Argentina-Alzheimer Argentina consortium (AGA-ALZAR) study with considerable NAM ancestry. We also provide evidence of NAM haplotypes and key variants within 13q33.3 that segregate with AD in the ADSP whole-genome sequencing data. Interestingly, the widely used genome-wide association study approach failed to identify associations in this region. Our findings underscore the potential of leveraging genetic ancestry diversity in recently admixed populations to improve genetic mapping, in this case for AD-relevant loci.
Collapse
Affiliation(s)
| | - Lisa A. Boyken
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Elizabeth E. Blue
- Division of Medical Genetics/Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Kelsey E. Grinde
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Department of Mathematics, Statistics and Computer Science, Macalester College, Saint Paul, MN 55105, USA
| | - Rafael A. Nafikov
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Division of Medical Genetics/Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Harkirat K. Sohi
- Division of Medical Genetics/Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Biomedical and Health Informatics Program, University of Washington, Seattle, WA 98195, USA
| | - Alejandro Q. Nato
- Division of Medical Genetics/Department of Medicine, University of Washington, Seattle, WA 98195, USA
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV 25755, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
| | - Luis I. Brusco
- CENECON - Center of Behavioural Neurology and Neuropsychiatry, School of Medicine, University of Buenos Aires, C1121A6B Buenos Aires, Argentina
| | - Laura Morelli
- Laboratory of Brain Aging and Neurodegeneration-Fundación Instituto Leloir-IIBBA- National Scientific and Technical Research Council (CONICET), C1405BWE Ciudad Autónoma de Buenos Aires, Argentina
| | - Alfredo Ramirez
- Division of Neurogenetics and Molecular Psychiatry, Department of Psychiatry and Psychotherapy, University of Cologne, Medical Faculty, 50937 Cologne, Germany
- Department of Neurodegeneration and Gerontopsychiatry, University of Bonn, 53127 Bonn, Germany
- German Center for Neurodegenerative Diseases (DZNE), 53127 Bonn, Germany
- Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD) University of Cologne, 50674 Cologne, Germany
- Department of Psychiatry, UT Health San Antonio, San Antonio, TX 78229, USA
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, UT Health San Antonio, San Antonio, TX 78229, USA
| | - Maria Carolina Dalmasso
- Division of Neurogenetics and Molecular Psychiatry, Department of Psychiatry and Psychotherapy, University of Cologne, Medical Faculty, 50937 Cologne, Germany
- Neurosciences and Complex Systems Unit (EnyS), CONICET, Hospital El Cruce, National University A. Jauretche (UNAJ), B1888AAE Florencio Varela, Argentina
| | - Seth Temple
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Claudia Satizabal
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, UT Health San Antonio, San Antonio, TX 78229, USA
- Department of Population Health Sciences, University of Texas, San Antonio, TX 78229, USA
- Department of Neurology, University of Texas, San Antonio, TX 78229, USA
| | - Sharon R. Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sudha Seshadri
- Department of Neurology, University of Texas, San Antonio, TX 78229, USA
| | - Ellen M. Wijsman
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Division of Medical Genetics/Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Timothy A. Thornton
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
3
|
Barragan FA, Mills LJ, Raduski AR, Marcotte EL, Grinde KE, Spector LG, Williams LA. Genetic ancestry, differential gene expression, and survival in pediatric B-cell acute lymphoblastic leukemia. Cancer Med 2023; 12:4761-4772. [PMID: 36127808 PMCID: PMC9972134 DOI: 10.1002/cam4.5266] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 08/23/2022] [Accepted: 09/01/2022] [Indexed: 11/09/2022] Open
Abstract
BACKGROUND Black children have lower incidence yet worse survival than White and Latinx children with B-cell acute lymphoblastic leukemia (B-ALL). It is unclear how reported race/ethnicity (RRE) is associated with death in B-ALL after accounting for differentially expressed genes associated with genetic ancestry. METHODS Using Phase 1 and 2 NCI TARGET B-ALL cases (N = 273; RRE-Black = 21, RRE-White = 162, RRE-Latinx = 69, RRE-Other = 9, RRE-Unknown = 12), we estimated proportions of African (AFR), European (EUR), and Amerindian (AMR) genetic ancestry. We estimated hazard ratios (HR) and 95% confidence intervals (95% CI) between ancestry and death while adjusting for RRE and clinical measures. We identified genes associated with genetic ancestry and adjusted for them in RRE and death associations. RESULTS Genetic ancestry varied within RRE (RRE-Black, AFR proportion: Mean: 78.5%, Range: 38.2%-93.6%; RRE-White, EUR proportion: Mean: 94%, Range: 1.6%-99.9%; RRE-Latinx, AMR proportion: Mean: 52.0%, Range: 1.2%-98.7%). We identified 10, 1, and 6 differentially expressed genes (padjusted <0.05) associated with AFR, AMR, and EUR ancestry proportion, respectively. We found AMR and AFR ancestry were statistically significantly associated with death (AMR each 10% HR: 1.05, 95% CI: 1.03-1.17, AFR each 10% increase HR: 1.03, 95% CI:1.01-1.19). RRE differences in the risk of death were larger in magnitude upon adjustment for genes associated with genetic ancestry for RRE-Black, but not RRE-Latinx children (RRE-Black HR: 3.35, 95% CI: 1.31, 8.53; RRE-Latinx HR: 1.47, 0.88-2.45). CONCLUSIONS Our work highlights B-ALL survival differences by RRE after adjusting for ancestry differentially expressed genes suggesting other factors impacting survival are important.
Collapse
Affiliation(s)
- Freddy A Barragan
- Department of Mathematics, Statistics, and Computer Science, Macalester College, St. Paul, Minnesota, USA.,Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Lauren J Mills
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Andrew R Raduski
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Erin L Marcotte
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA.,Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Kelsey E Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, St. Paul, Minnesota, USA
| | - Logan G Spector
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA.,Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, USA
| | - Lindsay A Williams
- Division of Epidemiology and Clinical Research, Department of Pediatrics, University of Minnesota, Minneapolis, Minnesota, USA.,Masonic Cancer Center, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
4
|
Raffield LM, Lu AT, Szeto MD, Little A, Grinde KE, Shaw J, Auer PL, Cushman M, Horvath S, Irvin MR, Lange EM, Lange LA, Nickerson DA, Thornton TA, Wilson JG, Wheeler MM, Zakai NA, Reiner AP. Coagulation factor VIII: Relationship to cardiovascular disease risk and whole genome sequence and epigenome-wide analysis in African Americans. J Thromb Haemost 2020; 18:1335-1347. [PMID: 31985870 PMCID: PMC7274883 DOI: 10.1111/jth.14741] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 01/02/2020] [Accepted: 01/21/2020] [Indexed: 12/12/2022]
Abstract
BACKGROUND Prospective studies have suggested higher factor VIII (FVIII) levels are an independent risk factor for coronary heart disease (CHD) and stroke. However, limited information, including on genetic and epigenetic contributors to FVIII variation, is available specifically among African Americans (AAs), who have higher FVIII levels than Europeans. OBJECTIVES We measured FVIII levels in ~3400 AAs from the community-based Jackson Heart Study and assessed genetic, epigenetic, and epidemiological correlates of FVIII, as well as incident cardiovascular disease (CVD) associations. METHODS We assessed cross-sectional associations of FVIII with CVD risk factors as well as incident CHD, stroke, heart failure, and mortality associations. We additionally assessed associations with TOPMed whole genome sequencing data and an epigenome-wide methylation array. RESULTS Our results confirmed associations between FVIII and risk of incident CHD events and total mortality in AAs; mortality associations were largely independent of traditional risk factors. We also demonstrate an association of FVIII with incident heart failure, independent of B-type natriuretic peptide. Two genomic regions were strongly associated with FVIII (ABO and VWF). The index variant at VWF is specific to individuals of African descent and is distinct from the previously reported European VWF association signal. Epigenome-wide association analysis showed significant FVIII associations with several CpG sites in the ABO region. However, after adjusting for ABO genetic variants, ABO CpG sites were not significant. CONCLUSIONS Larger sample sizes of AAs will be required to discover additional genetic and epigenetic contributors to FVIII phenotypic variation, which may have consequences for CVD health disparities.
Collapse
Affiliation(s)
- Laura M Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina
| | - Ake T Lu
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California
| | - Mindy D Szeto
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, Colorado
| | - Amarise Little
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Kelsey E Grinde
- Department of Mathematics, Statistics, and Computer Science, Macalester College, St. Paul, Minnesota
| | - Jessica Shaw
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, Colorado
| | - Paul L Auer
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin
| | - Mary Cushman
- Department of Medicine, Larner College of Medicine at the University of Vermont, Burlington, Vermont
- Department of Pathology & Laboratory Medicine, Larner College of Medicine at the University of Vermont, Burlington, Vermont
| | - Steve Horvath
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, California
- Department of Biostatistics, David Geffen School of Medicine, UCLA, Los Angeles, California
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham (UAB) School of Public Health, Birmingham, Alabama
| | - Ethan M Lange
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, Colorado
| | - Leslie A Lange
- Division of Biomedical Informatics and Personalized Medicine, School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, Colorado
| | | | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi
| | | | - Neil A Zakai
- Department of Medicine, Larner College of Medicine at the University of Vermont, Burlington, Vermont
- Department of Pathology & Laboratory Medicine, Larner College of Medicine at the University of Vermont, Burlington, Vermont
| | - Alex P Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington
| |
Collapse
|
5
|
Grinde KE, Brown LA, Reiner AP, Thornton TA, Browning SR. Genome-wide Significance Thresholds for Admixture Mapping Studies. Am J Hum Genet 2019; 104:454-465. [PMID: 30773276 PMCID: PMC6407497 DOI: 10.1016/j.ajhg.2019.01.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 01/17/2019] [Indexed: 01/25/2023] Open
Abstract
Admixture mapping studies have become more common in recent years, due in part to technological advances and growing international efforts to increase the diversity of genetic studies. However, many open questions remain about appropriate implementation of admixture mapping studies, including how best to control for multiple testing, particularly in the presence of population structure. In this study, we develop a theoretical framework to characterize the correlation of local ancestry and admixture mapping test statistics in admixed populations with contributions from any number of ancestral populations and arbitrary population structure. Based on this framework, we develop an analytical approach for obtaining genome-wide significance thresholds for admixture mapping studies. We validate our approach via analysis of simulated traits with real genotype data for 8,064 unrelated African American and 3,425 Hispanic/Latina women from the Women's Health Initiative SNP Health Association Resource (WHI SHARe). In an application to these WHI SHARe data, our approach yields genome-wide significant p value thresholds of 2.1 × 10-5 and 4.5 × 10-6 for admixture mapping studies in the African American and Hispanic/Latina cohorts, respectively. Compared to other commonly used multiple testing correction procedures, our method is fast, easy to implement (using our publicly available R package), and controls the family-wise error rate even in structured populations. Importantly, we note that the appropriate admixture mapping significance threshold depends on the number of ancestral populations, generations since admixture, and population structure of the sample; as a result, significance thresholds are not, in general, transferable across studies.
Collapse
Affiliation(s)
- Kelsey E Grinde
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
| | - Lisa A Brown
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Seattle Genetics, Bothell, WA 98021, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Sharon R Browning
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
6
|
Grinde KE, Qi Q, Thornton TA, Liu S, Shadyab AH, Chan KHK, Reiner AP, Sofer T. Generalizing polygenic risk scores from Europeans to Hispanics/Latinos. Genet Epidemiol 2019; 43:50-62. [PMID: 30368908 PMCID: PMC6330129 DOI: 10.1002/gepi.22166] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 08/12/2018] [Accepted: 08/28/2018] [Indexed: 12/17/2022]
Abstract
Polygenic risk scores (PRSs) are weighted sums of risk allele counts of single-nucleotide polymorphisms (SNPs) associated with a disease or trait. PRSs are typically constructed based on published results from Genome-Wide Association Studies (GWASs), and the majority of which has been performed in large populations of European ancestry (EA) individuals. Although many genotype-trait associations have generalized across populations, the optimal choice of SNPs and weights for PRSs may differ between populations due to different linkage disequilibrium (LD) and allele frequency patterns. We compare various approaches for PRS construction, using GWAS results from both large EA studies and a smaller study in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos (HCHS/SOL, n = 12 , 803 ). We consider multiple approaches for selecting SNPs and for computing SNP weights. We study the performance of the resulting PRSs in an independent study of Hispanics/Latinos from the Women's Health Initiative (WHI, n = 3 , 582 ). We support our investigation with simulation studies of potential genetic architectures in a single locus. We observed that selecting variants based on EA GWASs generally performs well, except for blood pressure trait. However, the use of EA GWASs for weight estimation was suboptimal. Using non-EA GWAS results to estimate weights improved results.
Collapse
Affiliation(s)
- Kelsey E. Grinde
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Qibin Qi
- Department of Epidemiology & Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
| | | | - Simin Liu
- Department of Epidemiology, Brown University, Providence, RI, USA
| | - Aladdin H. Shadyab
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, CA, USA
| | - Kei Hang K. Chan
- Department of Epidemiology, Brown University, Providence, RI, USA
- Departments of Biomedical Sciences and Electronic Engineering, City University of Hong Kong, HKSAR
| | - Alexander P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
7
|
Grinde KE, Arbet J, Green A, O'Connell M, Valcarcel A, Westra J, Tintle N. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association. Front Genet 2017; 8:117. [PMID: 28959274 PMCID: PMC5603735 DOI: 10.3389/fgene.2017.00117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 08/25/2017] [Indexed: 11/13/2022] Open
Abstract
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as "winner's curse." We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10-6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures.
Collapse
Affiliation(s)
- Kelsey E Grinde
- Department of Biostatistics, University of WashingtonSeattle, WA, United States
| | - Jaron Arbet
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alden Green
- Department of Statistics, Carnegie Mellon UniversityPittsburgh, PA, United States
| | - Michael O'Connell
- Department of Biostatistics, University of MinnesotaMinneapolis, MN, United States
| | - Alessandra Valcarcel
- Department of Biostatistics and Epidemiology, University of PennsylvaniaPhiladelphia, PA, United States
| | - Jason Westra
- Department of Statistics, Iowa State UniversityAmes, IA, United States.,Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| | - Nathan Tintle
- Department of Mathematics, Statistics, and Computer Science, Dordt CollegeSioux Center, IA, United States
| |
Collapse
|