1
|
Tian Y, Zhang H, Bureau A, Hochner H, Chen J. Efficient inference of parent-of-origin effect using case-control mother-child genotype data. J Stat Plan Inference 2024; 233:106190. [PMID: 38818512 PMCID: PMC11135462 DOI: 10.1016/j.jspi.2024.106190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Parent-of-origin effect plays an important role in mammal development and disorder. Case-control mother-child pair genotype data can be used to detect parent-of-origin effect and is often convenient to collect in practice. Most existing methods for assessing parent-of-origin effect do not incorporate any covariates, which may be required to control for confounding factors. We propose to model the parent-of-origin effect through a logistic regression model, with predictors including maternal and child genotypes, parental origins, and covariates. The parental origins may not be fully inferred from genotypes of a target genetic marker, so we propose to use genotypes of markers tightly linked to the target marker to increase inference efficiency. A robust statistical inference procedure is developed based on a modified profile log-likelihood in a retrospective way. A computationally feasible expectation-maximization algorithm is devised to estimate all unknown parameters involved in the modified profile log-likelihood. This algorithm differs from the conventional expectation-maximization algorithm in the sense that it is based on a modified instead of the original profile log-likelihood function. The convergence of the algorithm is established under some mild regularity conditions. This expectation-maximization algorithm also allows convenient handling of missing child genotypes. Large sample properties, including weak consistency, asymptotic normality, and asymptotic efficiency, are established for the proposed estimator under some mild regularity conditions. Finite sample properties are evaluated through extensive simulation studies and the application to a real dataset.
Collapse
Affiliation(s)
- Yuang Tian
- Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| | - Hong Zhang
- Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, Anhui, China
| | - Alexandre Bureau
- Department of Social and Preventive Medicine, Université Laval, Québec, Canada
| | - Hagit Hochner
- Braun School of Public Health, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Jinbo Chen
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, U.S.A
| |
Collapse
|
2
|
Xie H, Cao X, Zhang S, Sha Q. Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies using multi-layer network. Bioinformatics 2023; 39:btad707. [PMID: 37991852 PMCID: PMC10697735 DOI: 10.1093/bioinformatics/btad707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 09/29/2023] [Accepted: 11/21/2023] [Indexed: 11/24/2023] Open
Abstract
MOTIVATION Genome-wide association studies is an essential tool for analyzing associations between phenotypes and single nucleotide polymorphisms (SNPs). Most of binary phenotypes in large biobanks are extremely unbalanced, which leads to inflated type I error rates for many widely used association tests for joint analysis of multiple phenotypes. In this article, we first propose a novel method to construct a Multi-Layer Network (MLN) using individuals with at least one case status among all phenotypes. Then, we introduce a computationally efficient community detection method to group phenotypes into disjoint clusters based on the MLN. Finally, we propose a novel approach, MLN with Omnibus (MLN-O), to jointly analyse the association between phenotypes and a SNP. MLN-O uses the score test to test the association of each merged phenotype in a cluster and a SNP, then uses the Omnibus test to obtain an overall test statistic to test the association between all phenotypes and a SNP. RESULTS We conduct extensive simulation studies to reveal that the proposed approach can control type I error rates and is more powerful than some existing methods. Meanwhile, we apply the proposed method to a real data set in the UK Biobank. Using phenotypes in Chapter XIII (Diseases of the musculoskeletal system and connective tissue) in the UK Biobank, we find that MLN-O identifies more significant SNPs than other methods we compare with. AVAILABILITY AND IMPLEMENTATION https://github.com/Hongjing-Xie/Multi-Layer-Network-with-Omnibus-MLN-O.
Collapse
Affiliation(s)
- Hongjing Xie
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
3
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
4
|
Ray D, Loomis SJ, Venkataraghavan S, Tin A, Yu B, Chatterjee N, Selvin E, Duggal P. Characterizing common and rare variations in non-traditional glycemic biomarkers using multivariate approaches on multi-ancestry ARIC study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.06.13.23289200. [PMID: 37398180 PMCID: PMC10312851 DOI: 10.1101/2023.06.13.23289200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Glycated hemoglobin, fasting glucose, glycated albumin, and fructosamine are biomarkers that reflect different aspects of the glycemic process. Genetic studies of these glycemic biomarkers can shed light on unknown aspects of type 2 diabetes genetics and biology. While there exists several GWAS of glycated hemoglobin and fasting glucose, very few GWAS have focused on glycated albumin or fructosamine. We performed a multi-phenotype GWAS of glycated albumin and fructosamine from 7,395 White and 2,016 Black participants in the Atherosclerosis Risk in Communities (ARIC) study on the common variants from genotyped/imputed data. We found 2 genome-wide significant loci, one mapping to known type 2 diabetes gene (ARAP1/STARD10, p = 2.8 × 10-8) and another mapping to a novel gene (UGT1A, p = 1.4 × 10-8) using multi-omics gene mapping strategies in diabetes-relevant tissues. We identified additional loci that were ancestry-specific (e.g., PRKCA from African ancestry individuals, p = 1.7 × 10-8) and sex-specific (TEX29 locus in males only, p = 3.0 × 10-8). Further, we implemented multi-phenotype gene-burden tests on whole-exome sequence data from 6,590 White and 2,309 Black ARIC participants. Eleven genes across different rare variant aggregation strategies were exome-wide significant only in multi-ancestry analysis. Four out of 11 genes had notable enrichment of rare predicted loss of function variants in African ancestry participants despite smaller sample size. Overall, 8 out of 15 loci/genes were implicated to influence these biomarkers via glycemic pathways. This study illustrates improved locus discovery and potential effector gene discovery by leveraging joint patterns of related biomarkers across entire allele frequency spectrum in multi-ancestry analyses. Most of the loci/genes we identified have not been previously implicated in studies of type 2 diabetes, and future investigation of the loci/genes potentially acting through glycemic pathways may help us better understand risk of developing type 2 diabetes.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | | | - Sowmya Venkataraghavan
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| | - Adrienne Tin
- School of Medicine, University of Mississippi Medical Center, Jackson, MS
| | - Bing Yu
- Department of Epidemiology, UTHealth School of Public Health, Houston, TX
| | - Nilanjan Chatterjee
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Department of Oncology, School of Medicine, Johns Hopkins University, Baltimore, MD
| | - Elizabeth Selvin
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
- Welch Center for Prevention, Epidemiology, & Clinical Research, Johns Hopkins University, Baltimore, MD
| | - Priya Duggal
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
5
|
Head ST, Leslie EJ, Cutler DJ, Epstein MP. POIROT: a powerful test for parent-of-origin effects in unrelated samples leveraging multiple phenotypes. Bioinformatics 2023; 39:btad199. [PMID: 37067493 PMCID: PMC10148680 DOI: 10.1093/bioinformatics/btad199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 04/03/2023] [Accepted: 04/06/2023] [Indexed: 04/18/2023] Open
Abstract
MOTIVATION There is widespread interest in identifying genetic variants that exhibit parent-of-origin effects (POEs) wherein the effect of an allele on phenotype expression depends on its parental origin. POEs can arise from different phenomena including genomic imprinting and have been documented for many complex traits. Traditional tests for POEs require family data to determine parental origins of transmitted alleles. As most genome-wide association studies (GWAS) sample unrelated individuals (where allelic parental origin is unknown), the study of POEs in such datasets requires sophisticated statistical methods that exploit genetic patterns we anticipate observing when POEs exist. We propose a method to improve discovery of POE variants in large-scale GWAS samples that leverages potential pleiotropy among multiple correlated traits often collected in such studies. Our method compares the phenotypic covariance matrix of heterozygotes to homozygotes based on a Robust Omnibus Test. We refer to our method as the Parent of Origin Inference using Robust Omnibus Test (POIROT) of multiple quantitative traits. RESULTS Through simulation studies, we compared POIROT to a competing univariate variance-based method which considers separate analysis of each phenotype. We observed POIROT to be well-calibrated with improved power to detect POEs compared to univariate methods. POIROT is robust to non-normality of phenotypes and can adjust for population stratification and other confounders. Finally, we applied POIROT to GWAS data from the UK Biobank using BMI and two cholesterol phenotypes. We identified 338 genome-wide significant loci for follow-up investigation. AVAILABILITY AND IMPLEMENTATION The code for this method is available at https://github.com/staylorhead/POIROT-POE.
Collapse
Affiliation(s)
- S Taylor Head
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, United States
| | - Elizabeth J Leslie
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, United States
| | - David J Cutler
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, United States
| | - Michael P Epstein
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, United States
| |
Collapse
|
6
|
Sajal IH, Biswas S. Bivariate quantitative Bayesian LASSO for detecting association of rare haplotypes with two correlated continuous phenotypes. Front Genet 2023; 14:1104727. [PMID: 36968609 PMCID: PMC10033866 DOI: 10.3389/fgene.2023.1104727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/21/2023] [Indexed: 03/12/2023] Open
Abstract
In genetic association studies, the multivariate analysis of correlated phenotypes offers statistical and biological advantages compared to analyzing one phenotype at a time. The joint analysis utilizes additional information contained in the correlation and avoids multiple testing. It also provides an opportunity to investigate and understand shared genetic mechanisms of multiple phenotypes. Bivariate logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotypes associated with two binary phenotypes or one binary and one continuous phenotype jointly. There is currently no haplotype association test available that can handle multiple continuous phenotypes. In this study, by employing the framework of bivariate LBL, we propose bivariate quantitative Bayesian LASSO (QBL) to detect rare haplotypes associated with two continuous phenotypes. Bivariate QBL removes unassociated haplotypes by regularizing the regression coefficients and utilizing a latent variable to model correlation between two phenotypes. We carry out extensive simulations to investigate the performance of bivariate QBL and compare it with that of a standard (univariate) haplotype association test, Haplo.score (applied twice to two phenotypes individually). Bivariate QBL performs better than Haplo.score in all simulations with varying degrees of power gain. We analyze Genetic Analysis Workshop 19 exome sequencing data on systolic and diastolic blood pressures and detect several rare haplotypes associated with the two phenotypes.
Collapse
|
7
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
8
|
Ray D, Vergara C, Taub MA, Wojcik G, Ladd‐Acosta C, Beaty TH, Duggal P. Benchmarking statistical methods for analyzing parent-child dyads in genetic association studies. Genet Epidemiol 2022; 46:266-284. [PMID: 35451532 PMCID: PMC9356976 DOI: 10.1002/gepi.22453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 02/06/2022] [Accepted: 03/15/2022] [Indexed: 11/24/2022]
Abstract
Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear family consisting of two parents and their affected child. This trio design is particularly advantageous for studying relatively rare disorders because it is less prone to type 1 error inflation due to population stratification compared to population-based study designs (e.g., case-control studies). However, obtaining genetic data from both parents is difficult, from a practical perspective, and many large studies predominantly measure genetic variants in mother-child dyads. While some statistical methods for analyzing parent-child dyad data (most commonly involving mother-child pairs) exist, it is not clear if they provide the same advantage as trio methods in protecting against population stratification, or if a specific dyad design (e.g., case-mother dyads vs. case-mother/control-mother dyads) is more advantageous. In this article, we review existing statistical methods for analyzing genome-wide marker data on dyads and perform extensive simulation experiments to benchmark their type I errors and statistical power under different scenarios. We extend our evaluation to existing methods for analyzing a combination of case-parent trios and dyads together. We apply these methods on genotyped and imputed data from multiethnic mother-child pairs only, case-parent trios only or combinations of both dyads and trios from the Gene, Environment Association Studies consortium (GENEVA), where each family was ascertained through a child affected by nonsyndromic cleft lip with or without cleft palate. Results from the GENEVA study corroborate the findings from our simulation experiments. Finally, we provide recommendations for using statistical genetic association methods for dyads.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
- Department of Biostatistics, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Candelaria Vergara
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Margaret A. Taub
- Department of Biostatistics, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Genevieve Wojcik
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Christine Ladd‐Acosta
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Terri H. Beaty
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Priya Duggal
- Department of Epidemiology, Bloomberg School of Public HealthJohns Hopkins UniversityBaltimoreMarylandUSA
| |
Collapse
|
9
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for gene-based association testing with multiple traits in genome-wide association studies. Genet Epidemiol 2022; 46:63-72. [PMID: 34787916 PMCID: PMC8795481 DOI: 10.1002/gepi.22437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/28/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Although genome-wide association studies (GWAS) often collect data on multiple correlated traits for complex diseases, conventional gene-based analysis is usually univariate, and therefore, treating traits as uncorrelated. Multivariate analysis of multiple correlated traits can potentially increase the power to detect genes that affect some or all of these traits. In this study, we propose the multivariate hierarchically structured variable selection (HSVS-M) model, a flexible Bayesian model that tests the association of a gene with multiple correlated traits. With only summary statistics, HSVS-M can account for the correlations among genetic variants and among traits simultaneously and can also estimate the various directions and magnitudes of associations between a gene and multiple traits. Simulation studies show that HSVS-M substantially outperforms competing methods in various scenarios, particularly when variants in a gene are associated with a trait in similar directions and magnitudes. We applied HSVS-M to the summary statistics of a meta-analysis GWAS on four lipid traits from the Global Lipids Genetics Consortium and identified 15 genes that have also been confirmed as risk factors in previous studies.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Biostatistics, Columbia University, New York, NY 10032, USA,Correspondence:
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
10
|
Ray D, Venkataraghavan S, Zhang W, Leslie EJ, Hetmanski JB, Weinberg SM, Murray JC, Marazita ML, Ruczinski I, Taub MA, Beaty TH. Pleiotropy method reveals genetic overlap between orofacial clefts at multiple novel loci from GWAS of multi-ethnic trios. PLoS Genet 2021; 17:e1009584. [PMID: 34242216 PMCID: PMC8270211 DOI: 10.1371/journal.pgen.1009584] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 05/06/2021] [Indexed: 12/19/2022] Open
Abstract
Based on epidemiologic and embryologic patterns, nonsyndromic orofacial clefts- the most common craniofacial birth defects in humans- are commonly categorized into cleft lip with or without cleft palate (CL/P) and cleft palate alone (CP), which are traditionally considered to be etiologically distinct. However, some evidence of shared genetic risk in IRF6, GRHL3 and ARHGAP29 regions exists; only FOXE1 has been recognized as significantly associated with both CL/P and CP in genome-wide association studies (GWAS). We used a new statistical approach, PLACO (pleiotropic analysis under composite null), on a combined multi-ethnic GWAS of 2,771 CL/P and 611 CP case-parent trios. At the genome-wide significance threshold of 5 × 10-8, PLACO identified 1 locus in 1q32.2 (IRF6) that appears to increase risk for one OFC subgroup but decrease risk for the other. At a suggestive significance threshold of 10-6, we found 5 more loci with compelling candidate genes having opposite effects on CL/P and CP: 1p36.13 (PAX7), 3q29 (DLG1), 4p13 (LIMCH1), 4q21.1 (SHROOM3) and 17q22 (NOG). Additionally, we replicated the recognized shared locus 9q22.33 (FOXE1), and identified 2 loci in 19p13.12 (RAB8A) and 20q12 (MAFB) that appear to influence risk of both CL/P and CP in the same direction. We found locus-specific effects may vary by racial/ethnic group at these regions of genetic overlap, and failed to find evidence of sex-specific differences. We confirmed shared etiology of the two OFC subtypes comprising CL/P, and additionally found suggestive evidence of differences in their pathogenesis at 2 loci of genetic overlap. Our novel findings include 6 new loci of genetic overlap between CL/P and CP; 3 new loci between pairwise OFC subtypes; and 4 loci not previously implicated in OFCs. Our in-silico validation showed PLACO is robust to subtype-specific effects, and can achieve massive power gains over existing approaches for identifying genetic overlap between disease subtypes. In summary, we found suggestive evidence for new genetic regions and confirmed some recognized OFC genes either exerting shared risk or with opposite effects on risk to OFC subtypes.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (DR); (THB)
| | - Sowmya Venkataraghavan
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Wanying Zhang
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Elizabeth J. Leslie
- Department of Human Genetics, School of Medicine, Emory University, Atlanta, Georgia, United States of America
| | - Jacqueline B. Hetmanski
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Seth M. Weinberg
- Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Jeffrey C. Murray
- Department of Pediatrics, University of Iowa Carver College of Medicine, Iowa City, Iowa, United States of America
| | - Mary L. Marazita
- Department of Oral Biology, School of Dental Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Center for Craniofacial and Dental Genetics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ingo Ruczinski
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Margaret A. Taub
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Terri H. Beaty
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (DR); (THB)
| |
Collapse
|
11
|
Bi W, Lee S. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data. Front Genet 2021; 12:682638. [PMID: 34211504 PMCID: PMC8239389 DOI: 10.3389/fgene.2021.682638] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
With the advances in genotyping technologies and electronic health records (EHRs), large biobanks have been great resources to identify novel genetic associations and gene-environment interactions on a genome-wide and even a phenome-wide scale. To date, several phenome-wide association studies (PheWAS) have been performed on biobank data, which provides comprehensive insights into many aspects of human genetics and biology. Although inspiring, PheWAS on large-scale biobank data encounters new challenges including computational burden, unbalanced phenotypic distribution, and genetic relationship. In this paper, we first discuss these new challenges and their potential impact on data analysis. Then, we summarize approaches that are scalable and robust in GWAS and PheWAS. This review can serve as a practical guide for geneticists, epidemiologists, and other medical researchers to identify genetic variations associated with health-related phenotypes in large-scale biobank data analysis. Meanwhile, it can also help statisticians to gain a comprehensive and up-to-date understanding of the current technical tool development.
Collapse
Affiliation(s)
- Wenjian Bi
- Department of Medical Genetics, School of Basic Medical Sciences, Peking University, Beijing, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| | - Seunggeun Lee
- Graduate School of Data Science, Seoul National University, Seoul, South Korea
| |
Collapse
|
12
|
Zhu J, Fan Q, Deng W, Wang Y, Guo X. BTOB: Extending the Biased GWAS to Bivariate GWAS. Front Genet 2021; 12:654821. [PMID: 34025719 PMCID: PMC8134661 DOI: 10.3389/fgene.2021.654821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2021] [Accepted: 04/07/2021] [Indexed: 11/13/2022] Open
Abstract
In recent years, a number of literatures published large-scale genome-wide association studies (GWASs) for human diseases or traits while adjusting for other heritable covariate. However, it is known that these GWASs are biased, which may lead to biased genetic estimates or even false positives. In this study, we provide a method called "BTOB" which extends the biased GWAS to bivariate GWAS by integrating the summary association statistics from the biased GWAS and the GWAS for the adjusted heritable covariate. We employ the proposed BTOB method to analyze the summary association statistics from the large scale meta-GWASs for waist-to-hip ratio (WHR) and body mass index (BMI), and show that the proposed approach can help identify more susceptible genes compared with the corresponding univariate GWASs. Theoretical results and simulations also confirm the validity and efficiency of the proposed BTOB method.
Collapse
Affiliation(s)
- Junxian Zhu
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Qiao Fan
- Center for Quantitative Medicine, Duke-National University of Singapore Medical School, Singapore, Singapore
| | - Wenying Deng
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, United States
| | - Yimeng Wang
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| | - Xiaobo Guo
- Department of Statistical Science, School of Mathematics, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
13
|
Zhan X, Banerjee K, Chen J. Variant-set association test for generalized linear mixed model. Genet Epidemiol 2021; 45:402-412. [PMID: 33604919 DOI: 10.1002/gepi.22378] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 01/18/2021] [Accepted: 01/25/2021] [Indexed: 12/22/2022]
Abstract
Advances in high-throughput biotechnologies have culminated in a wide range of omics (such as genomics, epigenomics, transcriptomics, metabolomics, and metagenomics) studies, and increasing evidence in these studies indicates that the biological architecture of complex traits involves a large number of omics variants each with minor effects but collectively accounting for the full phenotypic variability. Thus, a major challenge in many "ome-wide" association analyses is to achieve adequate statistical power to identify multiple variants of small effect sizes, which is notoriously difficult for studies with relatively small-sample sizes. A small-sample adjustment incorporated in the kernel machine regression framework was proposed to solve this for association studies under various settings. However, such an adjustment in the generalized linear mixed model (GLMM) framework, which accounts for both sample relatedness and non-Gaussian outcomes, has not yet been attempted. In this study, we fill this gap by extending small-sample adjustment in kernel machine association test to GLMM. We propose a new Variant-Set Association Test (VSAT), a powerful and efficient analysis tool in GLMM, to examine the association between a set of omics variants and correlated phenotypes. The usefulness of VSAT is demonstrated using both numerical simulation studies and applications to data collected from multiple association studies. The software for implementing the proposed method in R is available at https://www.github.com/jchen1981/SSKAT.
Collapse
Affiliation(s)
- Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Kalins Banerjee
- Department of Public Health Sciences, Pennsylvania State University, Hershey, Pennsylvania, USA
| | - Jun Chen
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
14
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
15
|
Majumdar A, Burch KS, Haldar T, Sankararaman S, Pasaniuc B, Gauderman WJ, Witte JS. A two-step approach to testing overall effect of gene-environment interaction for multiple phenotypes. Bioinformatics 2021; 36:5640-5648. [PMID: 33453114 DOI: 10.1093/bioinformatics/btaa1083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION While gene-environment (GxE) interactions contribute importantly to many different phenotypes, detecting such interactions requires well-powered studies and has proven difficult. To address this, we combine two approaches to improve GxE power: simultaneously evaluating multiple phenotypes and using a two-step analysis approach. Previous work shows that the power to identify a main genetic effect can be improved by simultaneously analyzing multiple related phenotypes. For a univariate phenotype, two-step methods produce higher power for detecting a GxE interaction compared to single step analysis. Therefore, we propose a two-step approach to test for an overall GxE effect for multiple phenotypes. RESULTS Using simulations we demonstrate that, when more than one phenotype has GxE effect (i.e., GxE pleiotropy), our approach offers substantial gain in power (18%-43%) to detect an aggregate-level GxE effect for a multivariate phenotype compared to an analogous two-step method to identify GxE effect for a univariate phenotype. We applied the proposed approach to simultaneously analyze three lipids, LDL, HDL and Triglyceride with the frequency of alcohol consumption as environmental factor in the UK Biobank. The method identified two loci with an overall GxE effect on the vector of lipids, one of which was missed by the competing approaches. AVAILABILITY We provide an R package MPGE implementing the proposed approach which is available from CRAN: https://cran.r-project.org/web/packages/MPGE/index.html.
Collapse
Affiliation(s)
- Arunabha Majumdar
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | - Kathryn S Burch
- Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Sriram Sankararaman
- Department of Computer Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.,Bioinformatics Interdepartmental Program, University of California, Los Angeles, Los Angeles, CA, USA
| | - W James Gauderman
- Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - John S Witte
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| |
Collapse
|
16
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
17
|
Effect of non-normality and low count variants on cross-phenotype association tests in GWAS. Eur J Hum Genet 2019; 28:300-312. [PMID: 31582815 DOI: 10.1038/s41431-019-0514-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 09/01/2019] [Accepted: 09/05/2019] [Indexed: 01/21/2023] Open
Abstract
Many complex human diseases, such as type 2 diabetes, are characterized by multiple underlying traits/phenotypes that have substantially shared genetic architecture. Multivariate analysis of correlated traits has the potential to increase the power of detecting underlying common genetic loci. Several cross-phenotype association methods have been proposed-some require individual-level data on traits and genotypes, while the others require only summary-level data. In this article, we explore whether non-normality of multivariate trait distribution affects the inference from some of the existing multi-trait methods and how that effect is dependent on the allele count of the genetic variant being tested. We find that most of these tests are susceptible to biases that lead to spurious association signals. Even after controlling for confounders that may contribute to non-normality and then applying inverse normal transformation on the residuals of each trait, these tests may have inflated type I errors for variants with low minor allele counts (MACs). A likelihood ratio test of association based on the ordinal regression of individual-level genotype conditional on the traits seems to be the least biased and can maintain type I error when the MAC is reasonably large (e.g., MAC > 30). Application of these methods to publicly available summary statistics of eight amino acid traits on European samples seem to exhibit systematic inflation (especially for variants with low MAC), which is consistent with our findings from simulation experiments.
Collapse
|
18
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
19
|
Dutta D, Gagliano Taliun SA, Weinstock JS, Zawistowski M, Sidore C, Fritsche LG, Cucca F, Schlessinger D, Abecasis GR, Brummett CM, Lee S. Meta-MultiSKAT: Multiple phenotype meta-analysis for region-based association test. Genet Epidemiol 2019; 43:800-814. [PMID: 31433078 DOI: 10.1002/gepi.22248] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 06/13/2019] [Indexed: 12/17/2022]
Abstract
The power of genetic association analyses can be increased by jointly meta-analyzing multiple correlated phenotypes. Here, we develop a meta-analysis framework, Meta-MultiSKAT, that uses summary statistics to test for association between multiple continuous phenotypes and variants in a region of interest. Our approach models the heterogeneity of effects between studies through a kernel matrix and performs a variance component test for association. Using a genotype kernel, our approach can test for rare-variants and the combined effects of both common and rare-variants. To achieve robust power, within Meta-MultiSKAT, we developed fast and accurate omnibus tests combining different models of genetic effects, functional genomic annotations, multiple correlated phenotypes, and heterogeneity across studies. In addition, Meta-MultiSKAT accommodates situations where studies do not share exactly the same set of phenotypes or have differing correlation patterns among the phenotypes. Simulation studies confirm that Meta-MultiSKAT can maintain the type-I error rate at the exome-wide level of 2.5 × 10-6 . Further simulations under different models of association show that Meta-MultiSKAT can improve the power of detection from 23% to 38% on average over single phenotype-based meta-analysis approaches. We demonstrate the utility and improved power of Meta-MultiSKAT in the meta-analyses of four white blood cell subtype traits from the Michigan Genomics Initiative (MGI) and SardiNIA studies.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Sarah A Gagliano Taliun
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Joshua S Weinstock
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Matthew Zawistowski
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Cagliari, Italy
| | - Lars G Fritsche
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Monserrato, Cagliari, Italy.,Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| | - David Schlessinger
- Laboratory of Genetics, National Institute on Aging, US National Institutes of Health, Baltimore, Maryland
| | - Gonçalo R Abecasis
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| | - Chad M Brummett
- Division of Pain Medicine, Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, Michigan.,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor, Michigan
| | - Seunggeun Lee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan.,Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
20
|
Holleman AM, Broadaway KA, Duncan R, Todor A, Almli LM, Bradley B, Ressler KJ, Ghosh D, Mulle JG, Epstein MP. Powerful and Efficient Strategies for Genetic Association Testing of Symptom and Questionnaire Data in Psychiatric Genetic Studies. Sci Rep 2019; 9:7523. [PMID: 31101869 PMCID: PMC6525248 DOI: 10.1038/s41598-019-44046-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 05/01/2019] [Indexed: 11/09/2022] Open
Abstract
Genetic studies of psychiatric disorders often deal with phenotypes that are not directly measurable. Instead, researchers rely on multivariate symptom data from questionnaires and surveys like the PTSD Symptom Scale (PSS) and Beck Depression Inventory (BDI) to indirectly assess a latent phenotype of interest. Researchers subsequently collapse such multivariate questionnaire data into a univariate outcome to represent a surrogate for the latent phenotype. However, when a causal variant is only associated with a subset of collapsed symptoms, the effect will be challenging to detect using the univariate outcome. We describe a more powerful strategy for genetic association testing in this situation that jointly analyzes the original multivariate symptom data collectively using a statistical framework that compares similarity in multivariate symptom-scale data from questionnaires to similarity in common genetic variants across a gene. We use simulated data to demonstrate this strategy provides substantially increased power over standard approaches that collapse questionnaire data into a single surrogate outcome. We also illustrate our approach using GWAS data from the Grady Trauma Project and identify genes associated with BDI not identified using standard univariate techniques. The approach is computationally efficient, scales to genome-wide studies, and is applicable to correlated symptom data of arbitrary dimension.
Collapse
Affiliation(s)
- Aaron M Holleman
- Department of Epidemiology, Emory University, Atlanta, GA, USA.,Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA, USA
| | | | - Richard Duncan
- Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Andrei Todor
- Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA, USA.,Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Lynn M Almli
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, USA
| | - Bekh Bradley
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, GA, USA.,Clinical Psychologist, Mental Health Service Line, Department of Veterans Affairs Medical Center, Atlanta, GA, USA
| | - Kerry J Ressler
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, MA, USA
| | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Jennifer G Mulle
- Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA, USA.,Department of Human Genetics, Emory University, Atlanta, GA, USA
| | - Michael P Epstein
- Center for Computational and Quantitative Genetics, Emory University, Atlanta, GA, USA. .,Department of Human Genetics, Emory University, Atlanta, GA, USA.
| |
Collapse
|
21
|
Onogi A. Comparison of F-tests for Univariate and Multivariate Mixed-Effect Models in Genome-Wide Association Mapping. Front Genet 2019; 10:30. [PMID: 30778369 PMCID: PMC6369166 DOI: 10.3389/fgene.2019.00030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/17/2019] [Indexed: 01/24/2023] Open
Abstract
Genome-wide association mapping (GWA) has been widely applied to a variety of species to identify genomic regions responsible for quantitative traits. The use of multivariate information could enhance the detection power of GWA. Although mixed-effect models are frequently used for GWA, the utility of F-tests for multivariate mixed-effect models is not well-recognized. Thus, we compared the F-tests for univariate and multivariate mixed-effect models with simulations. The superiority of the multivariate F-test over the univariate test varied depending on three parameters: phenotypic correlation between variates (r), relative size of quantitative trait locus effects between variates (ad), and missing proportion of phenotypic records (mprop). Simulation results showed that, when mprop was low, the multivariate F-test outperformed the univariate test as r and ad differ, and as mprop increased, the multivariate F-test outperformed as ad increased. These observations were consistent with results of the analytical evaluation of the F-value. When mprop was at the maximum, i.e., when no individual had phenotypic values for multiple variates, as in the case of meta-analysis, the multivariate F-test gained more detection power as ad increased. Although using multivariate information in mixed-effect model contexts did not always ensure more detection power than with univariate tests, the multivariate F-test will be a method applied when multivariate data are available because it does not show inflation of signals and could lead to new findings.
Collapse
Affiliation(s)
- Akio Onogi
- Institute of Crop Science, National Agriculture and Food Research Organization, Tsukuba, Japan.,Japan Science and Technology Agency PRESTO, Kawaguchi, Japan
| |
Collapse
|
22
|
Dutta D, Scott L, Boehnke M, Lee S. Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes. Genet Epidemiol 2019; 43:4-23. [PMID: 30298564 PMCID: PMC6330125 DOI: 10.1002/gepi.22156] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 07/12/2018] [Accepted: 07/15/2018] [Indexed: 12/13/2022]
Abstract
In genetic association analysis, a joint test of multiple distinct phenotypes can increase power to identify sets of trait-associated variants within genes or regions of interest. Existing multiphenotype tests for rare variants make specific assumptions about the patterns of association with underlying causal variants, and the violation of these assumptions can reduce power to detect association. Here, we develop a general framework for testing pleiotropic effects of rare variants on multiple continuous phenotypes using multivariate kernel regression (Multi-SKAT). Multi-SKAT models affect sizes of variants on the phenotypes through a kernel matrix and perform a variance component test of association. We show that many existing tests are equivalent to specific choices of kernel matrices with the Multi-SKAT framework. To increase power of detecting association across tests with different kernel matrices, we developed a fast and accurate approximation of the significance of the minimum observed P value across tests. To account for related individuals, our framework uses random effects for the kinship matrix. Using simulated data and amino acid and exome-array data from the METabolic Syndrome In Men (METSIM) study, we show that Multi-SKAT can improve power over single-phenotype SKAT-O test and existing multiple-phenotype tests, while maintaining Type I error rate.
Collapse
Affiliation(s)
- Diptavo Dutta
- Department of Biostatistics, University of Michigan Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan Ann Arbor, Michigan, USA
| | - Laura Scott
- Department of Biostatistics, University of Michigan Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan Ann Arbor, Michigan, USA
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan Ann Arbor, Michigan, USA
| | - Seunggeun Lee
- Department of Biostatistics, University of Michigan Ann Arbor, Michigan, USA
- Center for Statistical Genetics, University of Michigan Ann Arbor, Michigan, USA
| |
Collapse
|
23
|
Liang X, Sha Q, Zhang S. Joint analysis of multiple phenotypes in association studies using allele-based clustering approach for non-normal distributions. Ann Hum Genet 2018; 82:389-395. [PMID: 29932453 PMCID: PMC6188849 DOI: 10.1111/ahg.12260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 03/15/2018] [Accepted: 05/11/2018] [Indexed: 11/29/2022]
Abstract
In the study of complex diseases, several correlated phenotypes are usually measured. There is also increasing evidence showing that testing the association between a single-nucleotide polymorphism (SNP) and multiple-dependent phenotypes jointly is often more powerful than analyzing only one phenotype at a time. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. In this paper, we develop an Allele-based Clustering Approach (ACA) for the joint analysis of multiple non-normal phenotypes in association studies. In ACA, we consider the alleles at a SNP of interest as a dependent variable with two classes, and the correlated phenotypes as predictors to predict the alleles at the SNP of interest. We perform extensive simulation studies to evaluate the performance of ACA and compare the power of ACA with the powers of Adaptive Fisher's Combination test, Trait-based Association Test that uses Extended Simes procedure, Fisher's Combination test, the standard MANOVA, and the joint model of Multiple Phenotypes. Our simulation studies show that the proposed method has correct type I error rates and is much more powerful than other methods for some non-normal distributions.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| |
Collapse
|
24
|
Qi G, Chatterjee N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLoS Genet 2018; 14:e1007549. [PMID: 30289880 PMCID: PMC6192650 DOI: 10.1371/journal.pgen.1007549] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 10/17/2018] [Accepted: 07/09/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
25
|
Deng X, Wang B, Fisher V, Peloso G, Cupples A, Liu CT. Genome-wide association study for multiple phenotype analysis. BMC Proc 2018; 12:55. [PMID: 30263053 PMCID: PMC6156845 DOI: 10.1186/s12919-018-0135-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies often collect multiple phenotypes for complex diseases. Multivariate joint analyses have higher power to detect genetic variants compared with the marginal analysis of each phenotype and are also able to identify loci with pleiotropic effects. We extend the unified score-based association test to incorporate family structure, apply different approaches to analyze multiple traits in GAW20 real samples, and compare the results. Through simulation studies, we confirm that the Type I error rate of the pedigree-based unified score association test is appropriately controlled. In marginalanalysis of triglyceride levels, we found 1 subgenome-wide significant variant on chromosome 6. Joint analyses identified several suggestive genome-wide significant signals, with the pedigree-based unified score association test yielding the greatest number of significant results.
Collapse
Affiliation(s)
- Xuan Deng
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Biqi Wang
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Virginia Fisher
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Gina Peloso
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Adrienne Cupples
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| | - Ching-Ti Liu
- Department of Biostatistics, School of Public Health, Boston University, 801 Massachusetts Avenue 3rd Floor, Boston, MA 02118 USA
| |
Collapse
|
26
|
Wang X, Boekstegers F, Brinster R. Methods and results from the genome-wide association group at GAW20. BMC Genet 2018; 19:79. [PMID: 30255814 PMCID: PMC6157187 DOI: 10.1186/s12863-018-0649-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND This paper summarizes the contributions from the Genome-wide Association Study group (GWAS group) of the GAW20. The GWAS group contributions focused on topics such as association tests, phenotype imputation, and application of empirical kinships. The goals of the GWAS group contributions were varied. A real or a simulated data set based on the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study was employed by different methods. Different outcomes and covariates were considered, and quality control procedures varied throughout the contributions. RESULTS The consideration of heritability and family structure played a major role in some contributions. The inclusion of family information and adaptive weights based on data were found to improve power in genome-wide association studies. It was proven that gene-level approaches are more powerful than single-marker analysis. Other contributions focused on the comparison between pedigree-based kinship and empirical kinship matrices, and investigated similar results in heritability estimation, association mapping, and genomic prediction. A new approach for linkage mapping of triglyceride levels was able to identify a novel linkage signal. CONCLUSIONS This summary paper reports on promising statistical approaches and findings of the members of the GWAS group applied on real and simulated data which encompass the current topics of epigenetic and pharmacogenomics.
Collapse
Affiliation(s)
- Xuexia Wang
- University of North Texas, GAB 459, 1155 Union Circle #311430, Denton, TX 76203 USA
| | - Felix Boekstegers
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| | - Regina Brinster
- Institute of Medical Biometry and Informatics, University of Heidelberg, Im Neuenheimer Feld 130.3, 69120 Heidelberg, Germany
| |
Collapse
|
27
|
Zhu H, Zhang S, Sha Q. A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLoS One 2018; 13:e0190788. [PMID: 29329304 PMCID: PMC5766098 DOI: 10.1371/journal.pone.0190788] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 12/20/2017] [Indexed: 11/18/2022] Open
Abstract
Many complex diseases like diabetes, hypertension, metabolic syndrome, et cetera, are measured by multiple correlated phenotypes. However, most genome-wide association studies (GWAS) focus on one phenotype of interest or study multiple phenotypes separately for identifying genetic variants associated with complex diseases. Analyzing one phenotype or the related phenotypes separately may lose power due to ignoring the information obtained by combining phenotypes, such as the correlation between phenotypes. In order to increase statistical power to detect genetic variants associated with complex diseases, we develop a novel method to test a weighted combination of multiple phenotypes (WCmulP). We perform extensive simulation studies as well as real data (COPDGene) analysis to evaluate the performance of the proposed method. Our simulation results show that WCmulP has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the methods we compared. WCmulP also has an outstanding performance for identifying single-nucleotide polymorphisms (SNPs) associated with COPD-related phenotypes.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
28
|
Ray D, Boehnke M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet Epidemiol 2017; 42:134-145. [PMID: 29226385 DOI: 10.1002/gepi.22105] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 10/27/2017] [Accepted: 11/08/2017] [Indexed: 12/21/2022]
Abstract
Genome-wide association studies (GWAS) for complex diseases have focused primarily on single-trait analyses for disease status and disease-related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL-cholesterol, HDL-cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual-level data. Here, we develop metaUSAT (where USAT is unified score-based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual-level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P-value for association and is computationally efficient for implementation at a genome-wide level. Simulation experiments show that metaUSAT maintains proper type-I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D-GENES studies, metaUSAT detected genome-wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
29
|
Lin N, Zhu Y, Fan R, Xiong M. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data. PLoS Comput Biol 2017; 13:e1005788. [PMID: 29040274 PMCID: PMC5659802 DOI: 10.1371/journal.pcbi.1005788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 10/27/2017] [Accepted: 09/21/2017] [Indexed: 01/12/2023] Open
Abstract
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.
Collapse
Affiliation(s)
- Nan Lin
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Yun Zhu
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch (BBB), Division of Intramural Population Health Research (DIPHR), Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, United States of America
| | - Momiao Xiong
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| |
Collapse
|
30
|
Zhang W, Yang L, Tang LL, Liu A, Mills JL, Sun Y, Li Q. GATE: an efficient procedure in study of pleiotropic genetic associations. BMC Genomics 2017; 18:552. [PMID: 28732532 PMCID: PMC5521155 DOI: 10.1186/s12864-017-3928-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Background The association studies on human complex traits are admittedly propitious to identify deleterious genetic markers. Compared to single-trait analyses, multiple-trait analyses can arguably make better use of the information on both traits and markers, and thus improve statistical power of association tests prominently. Principal component analysis (PCA) is a well-known useful tool in multivariate analysis and can be applied to this task. Generally, PCA is first performed on all traits and then a certain number of top principal components (PCs) that explain most of the trait variations are selected to construct the test statistics. However, under some situations, only utilizing these top PCs would lead to a loss of important evidences from discarded PCs and thus makes the capability compromised. Methods To overcome this drawback while keeping the advantages of using the top PCs, we propose a group accumulated test evidence (GATE) procedure. By dividing the PCs which is sorted in the descending order according to the corresponding eigenvalues into a few groups, GATE integrates the information of traits at the group level. Results Simulation studies demonstrate the superiority of the proposed approach over several existing methods in terms of statistical power. Sometimes, the increase of power can reach 25%. These methods are further illustrated using the Heterogeneous Stock Mice data which is collected from a quantitative genome-wide association study. Conclusions Overall, GATE provides a powerful test for pleiotropic genetic associations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3928-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | - Liu Yang
- College of Geoscience and Surveying Engineering, China University of Mining and Technology, Beijing, China
| | - Larry L Tang
- Department of Statistics, George Mason University, Fairfax, VA, USA.,Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - James L Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Yuanchang Sun
- Department of Mathematics and Statistics, Florida International University, Miami, FL, USA
| | - Qizhai Li
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
31
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|
32
|
Ray D, Basu S. A novel association test for multiple secondary phenotypes from a case-control GWAS. Genet Epidemiol 2017; 41:413-426. [PMID: 28393390 DOI: 10.1002/gepi.22045] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Revised: 12/22/2016] [Accepted: 02/05/2017] [Indexed: 12/13/2022]
Abstract
In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case-control data for exploring genetic associations of some additional traits (secondary phenotypes, Y) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non-random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM-PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM-PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y, X, and D. Finally, we use POM-PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case-control sample from the population-based Metabolic Syndrome in Men (METSIM) study. Only POM-PS analysis of the T2D case-control sample seems to provide valid association signals.
Collapse
Affiliation(s)
- Debashree Ray
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
33
|
Gordon D, Londono D, Patel P, Kim W, Finch SJ, Heiman GA. An Analytic Solution to the Computation of Power and Sample Size for Genetic Association Studies under a Pleiotropic Mode of Inheritance. Hum Hered 2017; 81:194-209. [PMID: 28315880 DOI: 10.1159/000457135] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/20/2017] [Indexed: 01/14/2023] Open
Abstract
Our motivation here is to calculate the power of 3 statistical tests used when there are genetic traits that operate under a pleiotropic mode of inheritance and when qualitative phenotypes are defined by use of thresholds for the multiple quantitative phenotypes. Specifically, we formulate a multivariate function that provides the probability that an individual has a vector of specific quantitative trait values conditional on having a risk locus genotype, and we apply thresholds to define qualitative phenotypes (affected, unaffected) and compute penetrances and conditional genotype frequencies based on the multivariate function. We extend the analytic power and minimum-sample-size-necessary (MSSN) formulas for 2 categorical data-based tests (genotype, linear trend test [LTT]) of genetic association to the pleiotropic model. We further compare the MSSN of the genotype test and the LTT with that of a multivariate ANOVA (Pillai). We approximate the MSSN for statistics by linear models using a factorial design and ANOVA. With ANOVA decomposition, we determine which factors most significantly change the power/MSSN for all statistics. Finally, we determine which test statistics have the smallest MSSN. In this work, MSSN calculations are for 2 traits (bivariate distributions) only (for illustrative purposes). We note that the calculations may be extended to address any number of traits. Our key findings are that the genotype test usually has lower MSSN requirements than the LTT. More inclusive thresholds (top/bottom 25% vs. top/bottom 10%) have higher sample size requirements. The Pillai test has a much larger MSSN than both the genotype test and the LTT, as a result of sample selection. With these formulas, researchers can specify how many subjects they must collect to localize genes for pleiotropic phenotypes.
Collapse
Affiliation(s)
- Derek Gordon
- Department of Genetics, The State University of New Jersey, Piscataway, NJ, USA
| | | | | | | | | | | |
Collapse
|
34
|
An Adaptive Fisher's Combination Method for Joint Analysis of Multiple Phenotypes in Association Studies. Sci Rep 2016; 6:34323. [PMID: 27694844 PMCID: PMC5046106 DOI: 10.1038/srep34323] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 09/12/2016] [Indexed: 12/22/2022] Open
Abstract
Currently, the analyses of most genome-wide association studies (GWAS) have been performed on a single phenotype. There is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. Therefore, using only one single phenotype may lose statistical power to identify the underlying genetic mechanism. There is an increasing need to develop and apply powerful statistical tests to detect association between multiple phenotypes and a genetic variant. In this paper, we develop an Adaptive Fisher’s Combination (AFC) method for joint analysis of multiple phenotypes in association studies. The AFC method combines p-values obtained in standard univariate GWAS by using the optimal number of p-values which is determined by the data. We perform extensive simulations to evaluate the performance of the AFC method and compare the power of our method with the powers of TATES, Tippett’s method, Fisher’s combination test, MANOVA, MultiPhen, and SUMSCORE. Our simulation studies show that the proposed method has correct type I error rates and is either the most powerful test or comparable with the most powerful test. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|