1
|
Breeyear JH, Mautz BS, Keaton JM, Hellwege JN, Torstenson ES, Liang J, Bray MJ, Giri A, Warren HR, Munroe PB, Velez Edwards DR, Zhu X, Li C, Edwards TL. A new test for trait mean and variance detects unreported loci for blood-pressure variation. Am J Hum Genet 2024; 111:954-965. [PMID: 38614075 PMCID: PMC11080606 DOI: 10.1016/j.ajhg.2024.03.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/15/2024] Open
Abstract
Variability in quantitative traits has clinical, ecological, and evolutionary significance. Most genetic variants identified for complex quantitative traits have only a detectable effect on the mean of trait. We have developed the mean-variance test (MVtest) to simultaneously model the mean and log-variance of a quantitative trait as functions of genotypes and covariates by using estimating equations. The advantages of MVtest include the facts that it can detect effect modification, that multiple testing can follow conventional thresholds, that it is robust to non-normal outcomes, and that association statistics can be meta-analyzed. In simulations, we show control of type I error of MVtest over several alternatives. We identified 51 and 37 previously unreported associations for effects on blood-pressure variance and mean, respectively, in the UK Biobank. Transcriptome-wide association studies revealed 633 significant unique gene associations with blood-pressure mean variance. MVtest is broadly applicable to studies of complex quantitative traits and provides an important opportunity to detect novel loci.
Collapse
Affiliation(s)
- Joseph H Breeyear
- Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Biostatistics and Computational Biology Branch, Division of Intramural Research, National Institute of Environmental Health Sciences, Durham, NC, USA
| | - Brian S Mautz
- Population Analytics and Insights, Data Sciences, Janssen Research and Development, Spring House, PA, USA
| | - Jacob M Keaton
- Medical Genomics and Metabolic Genetics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jacklyn N Hellwege
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Eric S Torstenson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jingjing Liang
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, AZ, USA
| | - Michael J Bray
- Department of Maternal and Fetal Medicine, Orlando Health, Orlando, FL, USA; Genetic Counseling Program, Bay Path University, Longmeadow, MA, USA
| | - Ayush Giri
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA; Division of Quantitative Sciences, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Helen R Warren
- Center of Clinical Pharmacology and Precision Medicine, Queen Mary University, London, England
| | - Patricia B Munroe
- Center of Clinical Pharmacology and Precision Medicine, Queen Mary University, London, England
| | - Digna R Velez Edwards
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA; Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA; Division of Quantitative Sciences, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Chun Li
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA, USA
| | - Todd L Edwards
- Division of Epidemiology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
2
|
Zhang X, Bell JT. Detecting genetic effects on phenotype variability to capture gene-by-environment interactions: a systematic method comparison. G3 (BETHESDA, MD.) 2024; 14:jkae022. [PMID: 38289865 PMCID: PMC10989912 DOI: 10.1093/g3journal/jkae022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/16/2024] [Accepted: 01/19/2024] [Indexed: 02/01/2024]
Abstract
Genetically associated phenotypic variability has been widely observed across organisms and traits, including in humans. Both gene-gene and gene-environment interactions can lead to an increase in genetically associated phenotypic variability. Therefore, detecting the underlying genetic variants, or variance Quantitative Trait Loci (vQTLs), can provide novel insights into complex traits. Established approaches to detect vQTLs apply different methodologies from variance-only approaches to mean-variance joint tests, but a comprehensive comparison of these methods is lacking. Here, we review available methods to detect vQTLs in humans, carry out a simulation study to assess their performance under different biological scenarios of gene-environment interactions, and apply the optimal approaches for vQTL identification to gene expression data. Overall, with a minor allele frequency (MAF) of less than 0.2, the squared residual value linear model (SVLM) and the deviation regression model (DRM) are optimal when the data follow normal and non-normal distributions, respectively. In addition, the Brown-Forsythe (BF) test is one of the optimal methods when the MAF is 0.2 or larger, irrespective of phenotype distribution. Additionally, a larger sample size and more balanced sample distribution in different exposure categories increase the power of BF, SVLM, and DRM. Our results highlight vQTL detection methods that perform optimally under realistic simulation settings and show that their relative performance depends on the phenotype distribution, allele frequency, sample size, and the type of exposure in the interaction model underlying the vQTL.
Collapse
Affiliation(s)
- Xiaopu Zhang
- Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas’ Hospital, Westminster Bridge Road, London SE1 7EH, UK
| | - Jordana T Bell
- Department of Twin Research and Genetic Epidemiology, King's College London, St Thomas’ Hospital, Westminster Bridge Road, London SE1 7EH, UK
| |
Collapse
|
3
|
Chu W, Li R, Liu J, Reimherr M. FEATURE SELECTION FOR GENERALIZED VARYING COEFFICIENT MIXED-EFFECT MODELS WITH APPLICATION TO OBESITY GWAS. Ann Appl Stat 2020; 14:276-298. [PMID: 32802245 PMCID: PMC7426018 DOI: 10.1214/19-aoas1310] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2023]
Abstract
Motivated by an empirical analysis of data from a genome-wide association study on obesity, measured by the body mass index (BMI), we propose a two-step gene-detection procedure for generalized varying coefficient mixed-effects models with ultrahigh dimensional covariates. The proposed procedure selects significant single nucleotide polymorphisms (SNPs) impacting the mean BMI trend, some of which have already been biologically proven to be "fat genes." The method also discovers SNPs that significantly influence the age-dependent variability of BMI. The proposed procedure takes into account individual variations of genetic effects and can also be directly applied to longitudinal data with continuous, binary or count responses. We employ Monte Carlo simulation studies to assess the performance of the proposed method and further carry out causal inference for the selected SNPs.
Collapse
Affiliation(s)
| | - Runze Li
- Department of Statistics and the Methodology Center, Pennsylvania State University
| | - Jingyuan Liu
- MOE Key Laboratory of Econometrics, Department of Statistics, School of Economics, Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University
| | | |
Collapse
|
4
|
Duan R, Ning Y, Wang S, Lindsay BG, Carroll RJ, Chen Y. A fast score test for generalized mixture models. Biometrics 2019; 76:811-820. [PMID: 31863595 DOI: 10.1111/biom.13204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 11/15/2019] [Accepted: 12/04/2019] [Indexed: 11/30/2022]
Abstract
In biomedical studies, testing for homogeneity between two groups, where one group is modeled by mixture models, is often of great interest. This paper considers the semiparametric exponential family mixture model proposed by Hong et al. (2017) and studies the score test for homogeneity under this model. The score test is nonregular in the sense that nuisance parameters disappear under the null hypothesis. To address this difficulty, we propose a modification of the score test, so that the resulting test enjoys the Wilks phenomenon. In finite samples, we show that with fixed nuisance parameters the score test is locally most powerful. In large samples, we establish the asymptotic power functions under two types of local alternative hypotheses. Our simulation studies illustrate that the proposed score test is powerful and computationally fast. We apply the proposed score test to an UK ovarian cancer DNA methylation data for identification of differentially methylated CpG sites.
Collapse
Affiliation(s)
- Rui Duan
- Department of Biostatistics, Epidemiology, and Informatics, The University of Pennsylvania, Philadelphia, Pennsylvania
| | - Yang Ning
- Department of Statistical Science, Cornell University, Ithaca, New York
| | - Shuang Wang
- Department of Biostatistics, Columbia University, New York, New York
| | - Bruce G Lindsay
- Department of Statistics, Pennsylvania State University, State College, Pennsylvania
| | - Raymond J Carroll
- Department of Statistics, Texas A&M University, College Station, Texas
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, The University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
5
|
Deng WQ, Mao S, Kalnapenkis A, Esko T, Mägi R, Paré G, Sun L. Analytical strategies to include the X-chromosome in variance heterogeneity analyses: Evidence for trait-specific polygenic variance structure. Genet Epidemiol 2019; 43:815-830. [PMID: 31332826 DOI: 10.1002/gepi.22247] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/07/2019] [Accepted: 06/13/2019] [Indexed: 12/12/2022]
Abstract
Genotype-stratified variance of a quantitative trait could differ in the presence of gene-gene or gene-environment interactions. Genetic markers associated with phenotypic variance are thus considered promising candidates for follow-up interaction or joint location-scale analyses. However, as in studies of main effects, the X-chromosome is routinely excluded from "whole-genome" scans due to analytical challenges. Specifically, as males carry only one copy of the X-chromosome, the inherent sex-genotype dependency could bias the trait-genotype association, through sexual dimorphism in quantitative traits with sex-specific means or variances. Here we investigate phenotypic variance heterogeneity associated with X-chromosome single nucleotide polymorphisms (SNPs) and propose valid and powerful strategies. Among those, a generalized Levene's test has adequate power and remains robust to sexual dimorphism. An alternative approach is a sex-stratified analysis but at the cost of slightly reduced power and modeling flexibility. We applied both methods to an Estonian study of gene expression quantitative trait loci (eQTL; n = 841), and two complex trait studies of height, hip, and waist circumferences, and body mass index from Multi-Ethnic Study of Atherosclerosis (MESA; n = 2,073) and UK Biobank (UKB; n = 327,393). Consistent with previous eQTL findings on mean, we found some but no conclusive evidence for cis regulators being enriched for variance association. SNP rs2681646 is associated with variance of waist circumference (p = 9.5E-07) at X-chromosome-wide significance in UKB, with a suggestive female-specific effect in MESA (p = 0.048). Collectively, an enrichment analysis using permutated UKB (p < 0.1) and MESA (p < 0.01) datasets, suggests a possible polygenic structure for the variance of human height.
Collapse
Affiliation(s)
- Wei Q Deng
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, Canada
| | - Shihong Mao
- Department of Pathology and Molecular Medicine, Population Health Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada
| | - Anette Kalnapenkis
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tõnu Esko
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia.,Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts
| | - Reedik Mägi
- Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Guillaume Paré
- Department of Pathology and Molecular Medicine, Population Health Research Institute, Hamilton Health Sciences and McMaster University, Hamilton, Canada.,Department of Pathology and Molecular Medicine, McMaster University, Hamilton, Canada
| | - Lei Sun
- Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, Canada.,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
6
|
Corty RW, Valdar W. QTL Mapping on a Background of Variance Heterogeneity. G3 (BETHESDA, MD.) 2018; 8:3767-3782. [PMID: 30389794 DOI: 10.1101/276980] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Standard QTL mapping procedures seek to identify genetic loci affecting the phenotypic mean while assuming that all individuals have the same residual variance. But when the residual variance differs systematically between groups, perhaps due to a genetic or environmental factor, such standard procedures can falter: in testing for QTL associations, they attribute too much weight to observations that are noisy and too little to those that are precise, resulting in reduced power and and increased susceptibility to false positives. The negative effects of such "background variance heterogeneity" (BVH) on standard QTL mapping have received little attention until now, although the subject is closely related to work on the detection of variance-controlling genes. Here we use simulation to examine how BVH affects power and false positive rate for detecting QTL affecting the mean (mQTL), the variance (vQTL), or both (mvQTL). We compare linear regression for mQTL and Levene's test for vQTL, with tests more recently developed, including tests based on the double generalized linear model (DGLM), which can model BVH explicitly. We show that, when used in conjunction with a suitable permutation procedure, the DGLM-based tests accurately control false positive rate and are more powerful than the other tests. We also find that some adverse effects of BVH can be mitigated by applying a rank inverse normal transform. We apply our novel approach, which we term "mean-variance QTL mapping", to publicly available data on a mouse backcross and, after accommodating BVH driven by sire, detect a new mQTL for bodyweight.
Collapse
Affiliation(s)
- Robert W Corty
- Department of Genetics
- Bioinformatics and Computational Biology Curriculum
| | - William Valdar
- Department of Genetics
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
7
|
Corty RW, Valdar W. QTL Mapping on a Background of Variance Heterogeneity. G3 (BETHESDA, MD.) 2018; 8:3767-3782. [PMID: 30389794 PMCID: PMC6288843 DOI: 10.1534/g3.118.200790] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 10/28/2018] [Indexed: 12/21/2022]
Abstract
Standard QTL mapping procedures seek to identify genetic loci affecting the phenotypic mean while assuming that all individuals have the same residual variance. But when the residual variance differs systematically between groups, perhaps due to a genetic or environmental factor, such standard procedures can falter: in testing for QTL associations, they attribute too much weight to observations that are noisy and too little to those that are precise, resulting in reduced power and and increased susceptibility to false positives. The negative effects of such "background variance heterogeneity" (BVH) on standard QTL mapping have received little attention until now, although the subject is closely related to work on the detection of variance-controlling genes. Here we use simulation to examine how BVH affects power and false positive rate for detecting QTL affecting the mean (mQTL), the variance (vQTL), or both (mvQTL). We compare linear regression for mQTL and Levene's test for vQTL, with tests more recently developed, including tests based on the double generalized linear model (DGLM), which can model BVH explicitly. We show that, when used in conjunction with a suitable permutation procedure, the DGLM-based tests accurately control false positive rate and are more powerful than the other tests. We also find that some adverse effects of BVH can be mitigated by applying a rank inverse normal transform. We apply our novel approach, which we term "mean-variance QTL mapping", to publicly available data on a mouse backcross and, after accommodating BVH driven by sire, detect a new mQTL for bodyweight.
Collapse
Affiliation(s)
- Robert W Corty
- Department of Genetics
- Bioinformatics and Computational Biology Curriculum
| | - William Valdar
- Department of Genetics
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
8
|
Zhang T, Sun L. Beyond the traditional simulation design for evaluating type 1 error control: From the "theoretical" null to "empirical" null. Genet Epidemiol 2018; 43:166-179. [PMID: 30478944 PMCID: PMC6518945 DOI: 10.1002/gepi.22172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 09/10/2018] [Accepted: 09/21/2018] [Indexed: 01/25/2023]
Abstract
When evaluating a newly developed statistical test, an important step is to check its type 1 error (T1E) control using simulations. This is often achieved by the standard simulation design S0 under the so-called "theoretical" null of no association. In practice, the whole-genome association analyses scan through a large number of genetic markers ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s) for the ones associated with an outcome of interest ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ), where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> comes from an alternative while the majority of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s are not associated with <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ; the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi> <mml:mo>-</mml:mo> <mml:mi>G</mml:mi></mml:math> relationships are under the "empirical" null. This reality can be better represented by two other simulation designs, where design S1.1 simulates <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> from analternative model based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> , then evaluates its association with independently generated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow/> <mml:msub><mml:mi>G</mml:mi> <mml:mrow><mml:mi>n</mml:mi> <mml:mi>e</mml:mi> <mml:mi>w</mml:mi></mml:mrow> </mml:msub> </mml:mrow> </mml:math> ; while design S1.2 evaluates the association between permutated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> . More than a decade ago, Efron (2004) has noted the important distinction between the "theoretical" and "empirical" null in false discovery rate control. Using scale tests for variance heterogeneity, direct univariate, and multivariate interaction tests as examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 suggested the method being accurate, designs S1.1 and S1.2 revealed its increased empirical T1E rate if applied in real data setting. The inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and T1E control interpretation.
Collapse
Affiliation(s)
- Ting Zhang
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Lei Sun
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
9
|
Conley D, Johnson R, Domingue B, Dawes C, Boardman J, Siegal M. A sibling method for identifying vQTLs. PLoS One 2018; 13:e0194541. [PMID: 29617452 PMCID: PMC5884517 DOI: 10.1371/journal.pone.0194541] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2017] [Accepted: 03/05/2018] [Indexed: 12/11/2022] Open
Abstract
The propensity of a trait to vary within a population may have evolutionary, ecological, or clinical significance. In the present study we deploy sibling models to offer a novel and unbiased way to ascertain loci associated with the extent to which phenotypes vary (variance-controlling quantitative trait loci, or vQTLs). Previous methods for vQTL-mapping either exclude genetically related individuals or treat genetic relatedness among individuals as a complicating factor addressed by adjusting estimates for non-independence in phenotypes. The present method uses genetic relatedness as a tool to obtain unbiased estimates of variance effects rather than as a nuisance. The family-based approach, which utilizes random variation between siblings in minor allele counts at a locus, also allows controls for parental genotype, mean effects, and non-linear (dominance) effects that may spuriously appear to generate variation. Simulations show that the approach performs equally well as two existing methods (squared Z-score and DGLM) in controlling type I error rates when there is no unobserved confounding, and performs significantly better than these methods in the presence of small degrees of confounding. Using height and BMI as empirical applications, we investigate SNPs that alter within-family variation in height and BMI, as well as pathways that appear to be enriched. One significant SNP for BMI variability, in the MAST4 gene, replicated. Pathway analysis revealed one gene set, encoding members of several signaling pathways related to gap junction function, which appears significantly enriched for associations with within-family height variation in both datasets (while not enriched in analysis of mean levels). We recommend approximating laboratory random assignment of genotype using family data and more careful attention to the possible conflation of mean and variance effects.
Collapse
Affiliation(s)
- Dalton Conley
- Department of Sociology, Princeton University, Princeton, NJ, United States of America
| | - Rebecca Johnson
- Department of Sociology, Princeton University, Princeton, NJ, United States of America
| | - Ben Domingue
- Graduate School of Education, Stanford University, Stanford, CA, United States of America
| | - Christopher Dawes
- Wilff Family Department of Politics, New York University, New York City, NY, United States of America
| | - Jason Boardman
- Institute for Behavioral Sciences, University of Colorado, Boulder, Boulder, CO, United States of America
| | - Mark Siegal
- Center for Genomics and Systems Biology, New York University, New York University, New York City, NY, United States of America
| |
Collapse
|
10
|
McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, Conti D, Gauderman WJ, Hsu L, Hutter CM, Jankowska MM, Kerr J, Kraft P, Montgomery SB, Mukherjee B, Papanicolaou GJ, Patel CJ, Ritchie MD, Ritz BR, Thomas DC, Wei P, Witte JS. Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 2017; 186:753-761. [PMID: 28978193 PMCID: PMC5860428 DOI: 10.1093/aje/kwx227] [Citation(s) in RCA: 116] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/14/2017] [Accepted: 03/16/2017] [Indexed: 12/25/2022] Open
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Collapse
Affiliation(s)
| | - Leah E. Mechanic
- Correspondence to Dr. Leah E. Mechanic, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, 9609 Medical Center Drive, Room 4E104, MSC 9763, Bethesda, MD 20892 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Gauderman WJ, Mukherjee B, Aschard H, Hsu L, Lewinger JP, Patel CJ, Witte JS, Amos C, Tai CG, Conti D, Torgerson DG, Lee S, Chatterjee N. Update on the State of the Science for Analytical Methods for Gene-Environment Interactions. Am J Epidemiol 2017; 186:762-770. [PMID: 28978192 PMCID: PMC5859988 DOI: 10.1093/aje/kwx228] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/24/2017] [Accepted: 04/25/2017] [Indexed: 12/14/2022] Open
Abstract
The analysis of gene-environment interaction (G×E) may hold the key for further understanding the etiology of many complex traits. The current availability of high-volume genetic data, the wide range in types of environmental data that can be measured, and the formation of consortiums of multiple studies provide new opportunities to identify G×E but also new analytical challenges. In this article, we summarize several statistical approaches that can be used to test for G×E in a genome-wide association study. These include traditional models of G×E in a case-control or quantitative trait study as well as alternative approaches that can provide substantially greater power. The latest methods for analyzing G×E with gene sets and with data in a consortium setting are summarized, as are issues that arise due to the complexity of environmental data. We provide some speculation on why detecting G×E in a genome-wide association study has thus far been difficult. We conclude with a description of software programs that can be used to implement most of the methods described in the paper.
Collapse
Affiliation(s)
- W. James Gauderman
- Correspondence to Dr. W. James Gauderman, Division of Biostatistics, Department of Preventive Medicine, Keck School of Medicine, University of Southern California, 2001 North Soto Street, 202-K, Los Angeles, CA 90032 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Hong C, Ning Y, Wei P, Cao Y, Chen Y. A semiparametric model for vQTL mapping. Biometrics 2017; 73:571-581. [PMID: 27861717 PMCID: PMC5780188 DOI: 10.1111/biom.12612] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 07/01/2016] [Accepted: 08/01/2016] [Indexed: 11/30/2022]
Abstract
Quantitative trait locus analysis has been used as an important tool to identify markers where the phenotype or quantitative trait is linked with the genotype. Most existing tests for single locus association with quantitative traits aim at the detection of the mean differences across genotypic groups. However, recent research has revealed functional genetic loci that affect the variance of traits, known as variability-controlling quantitative trait locus. In addition, it has been suggested that many genotypes have both mean and variance effects, while the mean effects or variance effects alone may not be strong enough to be detected. The existing methods accounting for unequal variances include the Levene's test, the Lepage test, and the D-test, but suffer from their limitations of lack of robustness or lack of power. We propose a semiparametric model and a novel pairwise conditional likelihood ratio test. Specifically, the semiparametric model is designed to identify the combined differences in higher moments among genotypic groups. The pairwise likelihood is constructed based on conditioning procedure, where the unknown reference distribution is eliminated. We show that the proposed pairwise likelihood ratio test has a simple asymptotic chi-square distribution, which does not require permutation or bootstrap procedures. Simulation studies show that the proposed test performs well in controlling Type I errors and having competitive power in identifying the differences across genotypic groups. In addition, the proposed test has certain robustness to model mis-specifications. The proposed test is illustrated by an example of identifying both mean and variances effects in body mass index using the Framingham Heart Study data.
Collapse
Affiliation(s)
- Chuan Hong
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - Yang Ning
- Department of Statistical Science, Cornell University, Ithaca, NY 14853, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Ying Cao
- Department of Biostatistics, The University of Texas School of Public Health, Houston, TX 77030, USA
| | - Yong Chen
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
13
|
Castaldi PJ, Cho MH, Liang L, Silverman EK, Hersh CP, Rice K, Aschard H. Screening for interaction effects in gene expression data. PLoS One 2017; 12:e0173847. [PMID: 28301596 PMCID: PMC5354413 DOI: 10.1371/journal.pone.0173847] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 02/27/2017] [Indexed: 11/27/2022] Open
Abstract
Expression quantitative trait (eQTL) studies are a powerful tool for identifying genetic variants that affect levels of messenger RNA. Since gene expression is controlled by a complex network of gene-regulating factors, one way to identify these factors is to search for interaction effects between genetic variants and mRNA levels of transcription factors (TFs) and their respective target genes. However, identification of interaction effects in gene expression data pose a variety of methodological challenges, and it has become clear that such analyses should be conducted and interpreted with caution. Investigating the validity and interpretability of several interaction tests when screening for eQTL SNPs whose effect on the target gene expression is modified by the expression level of a transcription factor, we characterized two important methodological issues. First, we stress the scale-dependency of interaction effects and highlight that commonly applied transformation of gene expression data can induce or remove interactions, making interpretation of results more challenging. We then demonstrate that, in the setting of moderate to strong interaction effects on the order of what may be reasonably expected for eQTL studies, standard interaction screening can be biased due to heteroscedasticity induced by true interactions. Using simulation and real data analysis, we outline a set of reasonable minimum conditions and sample size requirements for reliable detection of variant-by-environment and variant-by-TF interactions using the heteroscedasticity consistent covariance-based approach.
Collapse
Affiliation(s)
- Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, United States of America
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Liming Liang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Craig P. Hersh
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
| | - Kenneth Rice
- Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | - Hugues Aschard
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France
| |
Collapse
|
14
|
Aschard H. A perspective on interaction effects in genetic association studies. Genet Epidemiol 2016; 40:678-688. [PMID: 27390122 PMCID: PMC5132101 DOI: 10.1002/gepi.21989] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Revised: 05/20/2016] [Accepted: 06/05/2016] [Indexed: 11/29/2022]
Abstract
The identification of gene–gene and gene–environment interaction in human traits and diseases is an active area of research that generates high expectation, and most often lead to high disappointment. This is partly explained by a misunderstanding of the inherent characteristics of standard regression‐based interaction analyses. Here, I revisit and untangle major theoretical aspects of interaction tests in the special case of linear regression; in particular, I discuss variables coding scheme, interpretation of effect estimate, statistical power, and estimation of variance explained in regard of various hypothetical interaction patterns. Linking this components it appears first that the simplest biological interaction models—in which the magnitude of a genetic effect depends on a common exposure—are among the most difficult to identify. Second, I highlight the demerit of the current strategy to evaluate the contribution of interaction effects to the variance of quantitative outcomes and argue for the use of new approaches to overcome this issue. Finally, I explore the advantages and limitations of multivariate interaction models, when testing for interaction between multiple SNPs and/or multiple exposures, over univariate approaches. Together, these new insights can be leveraged for future method development and to improve our understanding of the genetic architecture of multifactorial traits.
Collapse
Affiliation(s)
- Hugues Aschard
- Department of Epidemiology, Harvard T.H. School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
15
|
Soave D, Corvol H, Panjwani N, Gong J, Li W, Boëlle PY, Durie PR, Paterson AD, Rommens JM, Strug LJ, Sun L. A Joint Location-Scale Test Improves Power to Detect Associated SNPs, Gene Sets, and Pathways. Am J Hum Genet 2015; 97:125-38. [PMID: 26140448 PMCID: PMC4572492 DOI: 10.1016/j.ajhg.2015.05.015] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 05/26/2015] [Indexed: 11/28/2022] Open
Abstract
Gene-based, pathway, and other multivariate association methods are motivated by the possibility of GxG and GxE interactions; however, accounting for such interactions is limited by the challenges associated with adequate modeling information. Here we propose an easy-to-implement joint location-scale (JLS) association testing framework for single-variant and multivariate analysis that accounts for interactions without explicitly modeling them. We apply the JLS method to a gene-set analysis of cystic fibrosis (CF) lung disease, which is influenced by multiple environmental and genetic factors. We identify and replicate an association between the constituents of the apical plasma membrane and CF lung disease (p = 0.0099 and p = 0.0180, respectively) and highlight a role for the SLC9A3-SLC9A3R1/2-EZR complex in contributing to CF lung disease. Many association studies could benefit from re-analysis with the JLS method that leverages complex genetic architecture for SNP, gene, and pathway identification. Analytical verification, simulation, and additional proof-of-principle applications support our approach.
Collapse
Affiliation(s)
- David Soave
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Harriet Corvol
- Assistance Publique-Hôpitaux de Paris (AP-HP), Trousseau Hospital, Pediatric Pulmonology Department; Institut National de la Santé et la Recherche Médicale (INSERM), UMR_S 938, CDR Saint-Antoine, 75012 Paris, France; Sorbonne Universités, Université Pierre et Marie Curie (UPMC) Paris 06, 75005 Paris, France
| | - Naim Panjwani
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Jiafen Gong
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Weili Li
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada
| | - Pierre-Yves Boëlle
- Sorbonne Universités, Université Pierre et Marie Curie (UPMC) Paris 06, 75005 Paris, France; AP-HP, Saint-Antoine Hospital, Biostatistics Department, INSERM, UMR_S 1136, 75012 Paris, France
| | - Peter R Durie
- Program in Physiology and Experimental Medicine, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Pediatrics, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada
| | - Johanna M Rommens
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Lisa J Strug
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.
| | - Lei Sun
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; Department of Statistical Sciences, University of Toronto, Toronto, ON M5S 3G3, Canada.
| |
Collapse
|
16
|
Ye C, Jiang B, Zhang X, Liu JS. dslice: an R package for nonparametric testing of associations with application in QTL and gene set analysis. Bioinformatics 2015; 31:1842-4. [PMID: 25609796 DOI: 10.1093/bioinformatics/btv021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2014] [Accepted: 01/12/2015] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Many statistical problems in bioinformatics and genetics can be formulated as the testing of associations between a categorical variable and a continuous variable. A dynamic slicing method was proposed for non-parametric dependence testing, which has been demonstrated to have higher powers compared with traditional methods such as Kolmogorov-Smirnov test. We introduce an R package dslice to facilitate the use of dynamic slicing method in bioinformatic applications such as quantitative trait loci study and gene set enrichment analysis. AVAILABILITY AND IMPLEMENTATION dslice is implemented in Rcpp and available in the Comprehensive R Archive Network. The package is distributed under the GNU General Public License (version 2 or later).
Collapse
Affiliation(s)
- Chao Ye
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China
| | - Bo Jiang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China
| | - Xuegong Zhang
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China
| | - Jun S Liu
- MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China MOE Key Laboratory of Bioinformatics, Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Department of Automation, Tsinghua University, Beijing 100084, China, Department of Statistics, Harvard University, Cambridge, MA 02138, USA, School of Life Sciences, Tsinghua University, Beijing 100084, China and Center of Statistics, Tsinghua University, Beijing 100084, China
| |
Collapse
|
17
|
Wang X, Zhang D, Tzeng JY. Pathway-guided identification of gene-gene interactions. Ann Hum Genet 2014; 78:478-91. [PMID: 25227508 PMCID: PMC4363308 DOI: 10.1111/ahg.12080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]
Abstract
Assessing gene-gene interactions (GxG) at the gene level can permit examination of epistasis at biologically functional units with amplified interaction signals from marker-marker pairs. While current gene-based GxG methods tend to be designed for two or a few genes, for complex traits, it is often common to have a list of many candidate genes to explore GxG. We propose a regression model with pathway-guided regularization for detecting interactions among genes. Specifically, we use the principal components to summarize the SNP-SNP interactions between a gene pair, and use an L1 penalty that incorporates adaptive weights based on biological guidance and trait supervision to identify important main and interaction effects. Our approach aims to combine biological guidance and data adaptiveness, and yields credible findings that may be likely to shed insights in order to formulate biological hypotheses for further molecular studies. The proposed approach can be used to explore the GxG with a list of many candidate genes and is applicable even when sample size is smaller than the number of predictors studied. We evaluate the utility of the proposed method using simulation and real data analysis. The results suggest improved performance over methods not utilizing pathway and trait guidance.
Collapse
Affiliation(s)
- Xin Wang
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Daowen Zhang
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, North Carolina State University, Raleigh, NC, USA
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| |
Collapse
|
18
|
Cao Y, Wei P, Bailey M, Kauwe JSK, Maxwell TJ. A versatile omnibus test for detecting mean and variance heterogeneity. Genet Epidemiol 2014; 38:51-59. [PMID: 24482837 DOI: 10.1002/gepi.21778] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Recent research has revealed loci that display variance heterogeneity through various means such as biological disruption, linkage disequilibrium (LD), gene-by-gene (G × G), or gene-by-environment interaction. We propose a versatile likelihood ratio test that allows joint testing for mean and variance heterogeneity (LRT(MV)) or either effect alone (LRT(M) or LRT(V)) in the presence of covariates. Using extensive simulations for our method and others, we found that all parametric tests were sensitive to nonnormality regardless of any trait transformations. Coupling our test with the parametric bootstrap solves this issue. Using simulations and empirical data from a known mean-only functional variant, we demonstrate how LD can produce variance-heterogeneity loci (vQTL) in a predictable fashion based on differential allele frequencies, high D', and relatively low r² values. We propose that a joint test for mean and variance heterogeneity is more powerful than a variance-only test for detecting vQTL. This takes advantage of loci that also have mean effects without sacrificing much power to detect variance only effects. We discuss using vQTL as an approach to detect G × G interactions and also how vQTL are related to relationship loci, and how both can create prior hypothesis for each other and reveal the relationships between traits and possibly between components of a composite trait.
Collapse
Affiliation(s)
- Ying Cao
- Human Genetics Center, UT School of Public Health, Houston, TX 77030, USA.,Division of Biostatistics, UT School of Public Health, Houston, TX 77030, USA
| | - Peng Wei
- Human Genetics Center, UT School of Public Health, Houston, TX 77030, USA.,Division of Biostatistics, UT School of Public Health, Houston, TX 77030, USA
| | - Matthew Bailey
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - John S K Kauwe
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Taylor J Maxwell
- Human Genetics Center, UT School of Public Health, Houston, TX 77030, USA
| |
Collapse
|
19
|
Aschard H, Vilhjálmsson BJ, Greliche N, Morange PE, Trégouët DA, Kraft P. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am J Hum Genet 2014; 94:662-76. [PMID: 24746957 DOI: 10.1016/j.ajhg.2014.03.016] [Citation(s) in RCA: 118] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 03/24/2014] [Indexed: 01/13/2023] Open
Abstract
Many human traits are highly correlated. This correlation can be leveraged to improve the power of genetic association tests to identify markers associated with one or more of the traits. Principal component analysis (PCA) is a useful tool that has been widely used for the multivariate analysis of correlated variables. PCA is usually applied as a dimension reduction method: the few top principal components (PCs) explaining most of total trait variance are tested for association with a predictor of interest, and the remaining components are not analyzed. In this study we review the theoretical basis of PCA and describe the behavior of PCA when testing for association between a SNP and correlated traits. We then use simulation to compare the power of various PCA-based strategies when analyzing up to 100 correlated traits. We show that contrary to widespread practice, testing only the top PCs often has low power, whereas combining signal across all PCs can have greater power. This power gain is primarily due to increased power to detect genetic variants with opposite effects on positively correlated traits and variants that are exclusively associated with a single trait. Relative to other methods, the combined-PC approach has close to optimal power in all scenarios considered while offering more flexibility and more robustness to potential confounders. Finally, we apply the proposed PCA strategy to the genome-wide association study of five correlated coagulation traits where we identify two candidate SNPs that were not found by the standard approach.
Collapse
Affiliation(s)
- Hugues Aschard
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | - Bjarni J Vilhjálmsson
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; Medical and Population Genetics Program, Broad Institute, Cambridge, MA 02142, USA
| | - Nicolas Greliche
- Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France
| | | | - David-Alexandre Trégouët
- Sorbonne Universités, UPMC Univ Paris 06, UMR_S 1166, 75005 Paris, France; INSERM, UMR_S 1166, Genomics and Physiopathology of Cardiovascular Diseases, 75013 Paris, France; Institute for Cardiometabolism and Nutrition (ICAN), 75013 Paris, France
| | - Peter Kraft
- Program in Genetic Epidemiology and Statistical Genetics, Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|