1
|
Dwivedi SL, Heslop‐Harrison P, Amas J, Ortiz R, Edwards D. Epistasis and pleiotropy-induced variation for plant breeding. PLANT BIOTECHNOLOGY JOURNAL 2024; 22:2788-2807. [PMID: 38875130 PMCID: PMC11536456 DOI: 10.1111/pbi.14405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 05/07/2024] [Accepted: 05/24/2024] [Indexed: 06/16/2024]
Abstract
Epistasis refers to nonallelic interaction between genes that cause bias in estimates of genetic parameters for a phenotype with interactions of two or more genes affecting the same trait. Partitioning of epistatic effects allows true estimation of the genetic parameters affecting phenotypes. Multigenic variation plays a central role in the evolution of complex characteristics, among which pleiotropy, where a single gene affects several phenotypic characters, has a large influence. While pleiotropic interactions provide functional specificity, they increase the challenge of gene discovery and functional analysis. Overcoming pleiotropy-based phenotypic trade-offs offers potential for assisting breeding for complex traits. Modelling higher order nonallelic epistatic interaction, pleiotropy and non-pleiotropy-induced variation, and genotype × environment interaction in genomic selection may provide new paths to increase the productivity and stress tolerance for next generation of crop cultivars. Advances in statistical models, software and algorithm developments, and genomic research have facilitated dissecting the nature and extent of pleiotropy and epistasis. We overview emerging approaches to exploit positive (and avoid negative) epistatic and pleiotropic interactions in a plant breeding context, including developing avenues of artificial intelligence, novel exploitation of large-scale genomics and phenomics data, and involvement of genes with minor effects to analyse epistatic interactions and pleiotropic quantitative trait loci, including missing heritability.
Collapse
Affiliation(s)
| | - Pat Heslop‐Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical GardenChinese Academy of SciencesGuangzhouChina
- Department of Genetics and Genome Biology, Institute for Environmental FuturesUniversity of LeicesterLeicesterUK
| | - Junrey Amas
- Centre for Applied Bioinformatics, School of Biological SciencesUniversity of Western AustraliaPerthWAAustralia
| | - Rodomiro Ortiz
- Department of Plant BreedingSwedish University of Agricultural SciencesAlnarpSweden
| | - David Edwards
- Centre for Applied Bioinformatics, School of Biological SciencesUniversity of Western AustraliaPerthWAAustralia
| |
Collapse
|
2
|
Guo H, Li T, Wang Z. Pleiotropic genetic association analysis with multiple phenotypes using multivariate response best-subset selection. BMC Genomics 2023; 24:759. [PMID: 38082214 PMCID: PMC10712198 DOI: 10.1186/s12864-023-09820-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/20/2023] [Indexed: 12/18/2023] Open
Abstract
Genetic pleiotropy refers to the simultaneous association of a gene with multiple phenotypes. It is widely distributed in the whole genome and can help to understand the common genetic mechanism of diseases or traits. In this study, a multivariate response best-subset selection (MRBSS) model based pleiotropic association analysis method is proposed. Different from the traditional genetic association model, the high-dimensional genotypic data are viewed as response variables while the multiple phenotypic data as predictor variables. Moreover, the response best-subset selection procedure is converted into an 0-1 integer optimization problem by introducing a separation parameter and a tuning parameter. Furthermore, the model parameters are estimated by using the curve search under the modified Bayesian information criterion. Simulation experiments show that the proposed method MRBSS remarkably reduces the computational time, obtains higher statistical power under most of the considered scenarios, and controls the type I error rate at a low level. The application studies in the datasets of maize yield traits and pig lipid traits further verifies the effectiveness.
Collapse
Affiliation(s)
- Hongping Guo
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China.
| | - Tong Li
- School of Mathematics and Statistics, Hubei Normal University, Huangshi, 435002, People's Republic of China
| | - Zixuan Wang
- School of Mathematics and Statistics, South-Central Minzu University, Wuhan, 430074, People's Republic of China
| |
Collapse
|
3
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
4
|
Anwar MY, Baldassari AR, Polikowsky HG, Sitlani CM, Highland HM, Chami N, Chen HH, Graff M, Howard AG, Jung SY, Petty LE, Wang Z, Zhu W, Buyske S, Cheng I, Kaplan R, Kooperberg C, Loos RJF, Peters U, McCormick JB, Fisher-Hoch SP, Avery CL, Taylor KC, Below JE, North KE. Genetic pleiotropy underpinning adiposity and inflammation in self-identified Hispanic/Latino populations. BMC Med Genomics 2022; 15:192. [PMID: 36088317 PMCID: PMC9464371 DOI: 10.1186/s12920-022-01352-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 09/02/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Concurrent variation in adiposity and inflammation suggests potential shared functional pathways and pleiotropic disease underpinning. Yet, exploration of pleiotropy in the context of adiposity-inflammation has been scarce, and none has included self-identified Hispanic/Latino populations. Given the high level of ancestral diversity in Hispanic American population, genetic studies may reveal variants that are infrequent/monomorphic in more homogeneous populations. METHODS Using multi-trait Adaptive Sum of Powered Score (aSPU) method, we examined individual and shared genetic effects underlying inflammatory (CRP) and adiposity-related traits (Body Mass Index [BMI]), and central adiposity (Waist to Hip Ratio [WHR]) in HLA participating in the Population Architecture Using Genomics and Epidemiology (PAGE) cohort (N = 35,871) with replication of effects in the Cameron County Hispanic Cohort (CCHC) which consists of Mexican American individuals. RESULTS Of the > 16 million SNPs tested, variants representing 7 independent loci were found to illustrate significant association with multiple traits. Two out of 7 variants were replicated at statistically significant level in multi-trait analyses in CCHC. The lead variant on APOE (rs439401) and rs11208712 were found to harbor multi-trait associations with adiposity and inflammation. CONCLUSIONS Results from this study demonstrate the importance of considering pleiotropy for improving our understanding of the etiology of the various metabolic pathways that regulate cardiovascular disease development.
Collapse
Affiliation(s)
- Mohammad Yaser Anwar
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA.
| | - Antoine R Baldassari
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Hannah G Polikowsky
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Colleen M Sitlani
- Department of Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Heather M Highland
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Mariaelisa Graff
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| | - Annie Green Howard
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Su Yon Jung
- Translational Sciences Section, School of Nursing, University of California, Los Angeles, Los Angeles, CA, 90095, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Lauren E Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wanying Zhu
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Steven Buyske
- Department of Statistics, Rutgers University, Piscataway, NJ, 08854, USA
| | - Iona Cheng
- Department of Epidemiology and Biostatistics, Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, 94115, USA
| | - Robert Kaplan
- Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Joseph B McCormick
- School of Public Health, University of Texas Health Science Center at Houston, Brownsville Regional Campus, Brownsville, TX, 78520, USA
| | - Susan P Fisher-Hoch
- School of Public Health, University of Texas Health Science Center at Houston, Brownsville Regional Campus, Brownsville, TX, 78520, USA
| | - Christy L Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27516, USA
| | - Kira C Taylor
- Department of Epidemiology and Population Health, University of Louisville School of Public Health and Information Sciences, Louisville, KT, 40202, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, 123 West Franklin Street, CVD Genetic Epidemiology Lab, Fl #4, Room A7, Chapel Hill, NC, 27599, USA
| |
Collapse
|
5
|
Sun J, Wang W, Zhang R, Duan H, Tian X, Xu C, Li X, Zhang D. Multivariate genome-wide association study of depression, cognition, and memory phenotypes and validation analysis identify 12 cross-ethnic variants. Transl Psychiatry 2022; 12:304. [PMID: 35907915 PMCID: PMC9338946 DOI: 10.1038/s41398-022-02074-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 07/15/2022] [Accepted: 07/19/2022] [Indexed: 11/10/2022] Open
Abstract
To date, little is known about the pleiotropic genetic variants among depression, cognition, and memory. The current research aimed to identify the potential pleiotropic single nucleotide polymorphisms (SNPs), genes, and pathways of the three phenotypes by conducting a multivariate genome-wide association study and an additional pleiotropy analysis among Chinese individuals and further validate the top variants in the UK Biobank (UKB). In the discovery phase, the participants were 139 pairs of dizygotic twins from the Qingdao Twins Registry. The genome-wide efficient mixed-model analysis identified 164 SNPs reaching suggestive significance (P < 1 × 10-5). Among them, rs3967317 (P = 1.21 × 10-8) exceeded the genome-wide significance level (P < 5 × 10-8) and was also demonstrated to be associated with depression and memory in pleiotropy analysis, followed by rs9863698, rs3967316, and rs9261381 (P = 7.80 × 10-8-5.68 × 10-7), which were associated with all three phenotypes. After imputation, a total of 457 SNPs reached suggestive significance. The top SNP chr6:24597173 was located in the KIAA0319 gene, which had biased expression in brain tissues. Genes and pathways related to metabolism, immunity, and neuronal systems demonstrated nominal significance (P < 0.05) in gene-based and pathway enrichment analyses. In the validation phase, 12 of the abovementioned SNPs reached the nominal significance level (P < 0.05) in the UKB. Among them, three SNPs were located in the KIAA0319 gene, and four SNPs were identified as significant expression quantitative trait loci in brain tissues. These findings may provide evidence for pleiotropic variants among depression, cognition, and memory and clues for further exploring the shared genetic pathogenesis of depression with Alzheimer's disease.
Collapse
Affiliation(s)
- Jing Sun
- Department of Epidemiology and Health Statistics, The School of Public Health of Qingdao University, Qingdao, Shandong Province, China
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Weijing Wang
- Department of Epidemiology and Health Statistics, The School of Public Health of Qingdao University, Qingdao, Shandong Province, China
| | - Ronghui Zhang
- Department of Epidemiology and Health Statistics, The School of Public Health of Qingdao University, Qingdao, Shandong Province, China
| | - Haiping Duan
- Qingdao Municipal Center for Disease Control and Prevention, No. 175 Shandong Road, Shibei District, Qingdao, Shandong Province, China
| | - Xiaocao Tian
- Qingdao Municipal Center for Disease Control and Prevention, No. 175 Shandong Road, Shibei District, Qingdao, Shandong Province, China
| | - Chunsheng Xu
- Qingdao Municipal Center for Disease Control and Prevention, No. 175 Shandong Road, Shibei District, Qingdao, Shandong Province, China
| | - Xue Li
- Department of Big Data in Health Science School of Public Health, Center of Clinical Big Data and Analytics of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.
| | - Dongfeng Zhang
- Department of Epidemiology and Health Statistics, The School of Public Health of Qingdao University, Qingdao, Shandong Province, China.
| |
Collapse
|
6
|
Fu L, Wang Y, Li T, Yang S, Hu YQ. A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC. Front Genet 2022; 13:791920. [PMID: 35391794 PMCID: PMC8981031 DOI: 10.3389/fgene.2022.791920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, UQCC1-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Collapse
Affiliation(s)
- Liwan Fu
- Center for Non-communicable Disease Management, National Center for Children's Health, Beijing Children's Hospital, Capital Medical University, Beijing, China.,State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Siqian Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
7
|
Bielak LF, Peyser PA, Smith JA, Zhao W, Ruiz‐Narvaez EA, Kardia SLR, Harlow SD. Multivariate, region-based genetic analyses of facets of reproductive aging in White and Black women. Mol Genet Genomic Med 2022; 10:e1896. [PMID: 35179313 PMCID: PMC9000932 DOI: 10.1002/mgg3.1896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 01/14/2022] [Accepted: 01/31/2022] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Age at final menstrual period (FMP) and the accompanying hormone trajectories across the menopause transition do not occur in isolation, but likely share molecular pathways. Understanding the genetics underlying the endocrinology of the menopause transition may be enhanced by jointly analyzing multiple interrelated traits. METHODS In a sample of 347 White and 164 Black women from the Study of Women's Health Across the Nation (SWAN), we investigated pleiotropic effects of 54 candidate genetic regions of interest (ROI) on 5 menopausal traits (age at FMP and premenopausal and postmenopausal levels of follicle stimulation hormone and estradiol) using multivariate kernel regression (Multi-SKAT). A backward elimination procedure was used to identify which subset of traits were most strongly associated with a specific ROI. RESULTS In White women, the 20 kb ROI around rs10734411 was significantly associated with the multivariate distribution of age at FMP, premenopausal estradiol, and postmenopausal estradiol (omnibus p-value = .00004). This association did not replicate in the smaller sample of Black women. CONCLUSION This study using a region-based, multiple-trait approach suggests a shared genetic basis among multiple facets of reproductive aging.
Collapse
Affiliation(s)
- Lawrence F. Bielak
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Patricia A. Peyser
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA,Survey Research Center, Institute for Social ResearchUniversity of MichiganAnn ArborMichiganUSA
| | - Wei Zhao
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Edward A. Ruiz‐Narvaez
- Department of Nutritional Sciences, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Sharon L. R. Kardia
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| | - Sioban D. Harlow
- Department of Epidemiology, School of Public HealthUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
8
|
Aguate FM, Vazquez AI, Merriman TR, de Los Campos G. Mapping pleiotropic loci using a fast-sequential testing algorithm. Eur J Hum Genet 2021; 29:1762-1773. [PMID: 34145383 PMCID: PMC8633382 DOI: 10.1038/s41431-021-00911-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/27/2021] [Accepted: 05/19/2021] [Indexed: 02/07/2023] Open
Abstract
Pleiotropy (i.e., genes with effects on multiple traits) leads to genetic correlations between traits and contributes to the development of many syndromes. Identifying variants with pleiotropic effects on multiple health-related traits can improve the biological understanding of gene action and disease etiology, and can help to advance disease-risk prediction. Sequential testing is a powerful approach for mapping genes with pleiotropic effects. However, the existing methods and the available software do not scale to analyses involving millions of SNPs and large datasets. This has limited the adoption of sequential testing for pleiotropy mapping at large scale. In this study, we present a sequential test and software that can be used to test pleiotropy in large systems of traits with biobank-sized data. Using simulations, we show that the methods implemented in the software are powerful and have adequate type-I error rate control. To demonstrate the use of the methods and software, we present a whole-genome scan in search of loci with pleiotropic effects on seven traits related to metabolic syndrome (MetS) using UK-Biobank data (n~300 K distantly related white European participants). We found abundant pleiotropy and report 170, 44, and 18 genomic regions harboring SNPs with pleiotropic effects in at least two, three, and four of the seven traits, respectively. We validate our results using previous studies documented in the GWAS-catalog and using data from GTEx. Our results confirm previously reported loci and lead to several novel discoveries that link MetS-related traits through plausible biological pathways.
Collapse
Affiliation(s)
- Fernando M Aguate
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
| | - Ana I Vazquez
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Tony R Merriman
- Department of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Gustavo de Los Campos
- Department of Epidemiology & Biostatistics, IQ - Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI, USA.
- Department of Statistics & Probability, Michigan State University, East Lansing, MI, USA.
| |
Collapse
|
9
|
Fisch GS. Associating complex traits with genetic variants: polygenic risk scores, pleiotropy and endophenotypes. Genetica 2021; 150:183-197. [PMID: 34677750 DOI: 10.1007/s10709-021-00138-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 10/07/2021] [Indexed: 11/29/2022]
Abstract
Genotype-phenotype causal modeling has evolved significantly since Johannsen's and Wright's original designs were published. The development of genomewide assays to interrogate and detect possible causal variants associated with complex traits has expanded the scope of genotype-phenotype research considerably. Clusters of causal variants discovered by genomewide assays and associated with complex traits have been used to develop polygenic risk scores to predict clinical diagnoses of multidimensional human disorders. However, genomewide investigations have met with many challenges to their research designs and statistical complexities which have hindered the reliability and validity of their predictions. Findings linked to differences in heritability estimates between causal clusters and complex traits among unrelated individuals remain a research area of some controversy. Causal models developed from case-control studies as opposed to experiments, as well as other issues concerning the genotype-phenotype causal model and the extent to which various forms of pleiotropy and the concept of the endophenotype add to its complexity, will be reviewed.
Collapse
Affiliation(s)
- Gene S Fisch
- Paul H. Chook Dept. of CIS & Statistics, CUNY/Baruch College, New York, NY, USA.
| |
Collapse
|
10
|
Wang T, Lu H, Zeng P. Identifying pleiotropic genes for complex phenotypes with summary statistics from a perspective of composite null hypothesis testing. Brief Bioinform 2021; 23:6375058. [PMID: 34571531 DOI: 10.1093/bib/bbab389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/06/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022] Open
Abstract
Pleiotropy has important implication on genetic connection among complex phenotypes and facilitates our understanding of disease etiology. Genome-wide association studies provide an unprecedented opportunity to detect pleiotropic associations; however, efficient pleiotropy test methods are still lacking. We here consider pleiotropy identification from a methodological perspective of high-dimensional composite null hypothesis and propose a powerful gene-based method called MAIUP. MAIUP is constructed based on the traditional intersection-union test with two sets of independent P-values as input and follows a novel idea that was originally proposed under the high-dimensional mediation analysis framework. The key improvement of MAIUP is that it takes the composite null nature of pleiotropy test into account by fitting a three-component mixture null distribution, which can ultimately generate well-calibrated P-values for effective control of family-wise error rate and false discover rate. Another attractive advantage of MAIUP is its ability to effectively address the issue of overlapping subjects commonly encountered in association studies. Simulation studies demonstrate that compared with other methods, only MAIUP can maintain correct type I error control and has higher power across a wide range of scenarios. We apply MAIUP to detect shared associated genes among 14 psychiatric disorders with summary statistics and discover many new pleiotropic genes that are otherwise not identified if failing to account for the issue of composite null hypothesis testing. Functional and enrichment analyses offer additional evidence supporting the validity of these identified pleiotropic genes associated with psychiatric disorders. Overall, MAIUP represents an efficient method for pleiotropy identification.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
11
|
Wang Y, Wu P, Tong X, Sun J. A weighted method for the exclusive hypothesis test with application to typhoon data. CAN J STAT 2021. [DOI: 10.1002/cjs.11618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Yi Wang
- School of Statistics Beijing Normal University Beijing China
| | - Peng Wu
- School of Statistics Beijing Normal University Beijing China
| | - Xingwei Tong
- School of Statistics Beijing Normal University Beijing China
| | - Jianguo Sun
- Department of Statistics University of Missouri Columbia MO U.S.A
| |
Collapse
|
12
|
Fernandes SB, Zhang KS, Jamann TM, Lipka AE. How Well Can Multivariate and Univariate GWAS Distinguish Between True and Spurious Pleiotropy? Front Genet 2021; 11:602526. [PMID: 33584799 PMCID: PMC7873880 DOI: 10.3389/fgene.2020.602526] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 12/11/2020] [Indexed: 11/13/2022] Open
Abstract
Quantification of the simultaneous contributions of loci to multiple traits, a phenomenon called pleiotropy, is facilitated by the increased availability of high-throughput genotypic and phenotypic data. To understand the prevalence and nature of pleiotropy, the ability of multivariate and univariate genome-wide association study (GWAS) models to distinguish between pleiotropic and non-pleiotropic loci in linkage disequilibrium (LD) first needs to be evaluated. Therefore, we used publicly available maize and soybean genotypic data to simulate multiple pairs of traits that were either (i) controlled by quantitative trait nucleotides (QTNs) on separate chromosomes, (ii) controlled by QTNs in various degrees of LD with each other, or (iii) controlled by a single pleiotropic QTN. We showed that multivariate GWAS could not distinguish between QTNs in LD and a single pleiotropic QTN. In contrast, a unique QTN detection rate pattern was observed for univariate GWAS whenever the simulated QTNs were in high LD or pleiotropic. Collectively, these results suggest that multivariate and univariate GWAS should both be used to infer whether or not causal mutations underlying peak GWAS associations are pleiotropic. Therefore, we recommend that future studies use a combination of multivariate and univariate GWAS models, as both models could be useful for identifying and narrowing down candidate loci with potential pleiotropic effects for downstream biological experiments.
Collapse
Affiliation(s)
- Samuel B. Fernandes
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | | | | | - Alexander E. Lipka
- Department of Crop Science, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
13
|
A powerful method for pleiotropic analysis under composite null hypothesis identifies novel shared loci between Type 2 Diabetes and Prostate Cancer. PLoS Genet 2020; 16:e1009218. [PMID: 33290408 PMCID: PMC7748289 DOI: 10.1371/journal.pgen.1009218] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 12/18/2020] [Accepted: 10/22/2020] [Indexed: 12/24/2022] Open
Abstract
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236). We propose a new approach PLACO that uses aggregate-level genotype-phenotype association statistics—commonly referred to as GWAS summary statistics—to identify genetic variants that influence risk of two traits or diseases. It allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. We demonstrate that PLACO can achieve major power gain over alternative methods that are typically used. We applied PLACO to Type 2 Diabetes and Prostate Cancer summary data from two large case-control studies. Many previous studies have reported an inverse association of these two chronic diseases suggesting shared risk factors; however, shared genetic mechanisms underlying this association is poorly understood. PLACO identified a number of novel shared genetic regions that are not detected by individual trait analysis. Many of the loci implicated by PLACO increase risk for one disease while decreasing risk for the other. PLACO can similarly be used on other traits to shed light on shared genetic risk factors.
Collapse
|
14
|
Knutson KA, Deng Y, Pan W. Implicating causal brain imaging endophenotypes in Alzheimer's disease using multivariable IWAS and GWAS summary data. Neuroimage 2020; 223:117347. [PMID: 32898681 PMCID: PMC7778364 DOI: 10.1016/j.neuroimage.2020.117347] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 08/24/2020] [Accepted: 08/28/2020] [Indexed: 02/06/2023] Open
Abstract
Recent evidence suggests the existence of many undiscovered heritable brain phenotypes involved in Alzheimer's Disease (AD) pathogenesis. This finding necessitates methods for the discovery of causal brain changes in AD that integrate Magnetic Resonance Imaging measures and genotypic data. However, existing approaches for causal inference in this setting, such as the univariate Imaging Wide Association Study (UV-IWAS), suffer from inconsistent effect estimation and inflated Type I errors in the presence of genetic pleiotropy, the phenomenon in which a variant affects multiple causal intermediate risk phenotypes. In this study, we implement a multivariate extension to the IWAS model, namely MV-IWAS, to consistently estimate and test for the causal effects of multiple brain imaging endophenotypes from the Alzheimer's Disease Neuroimaging Initiative (ADNI) in the presence of pleiotropic and possibly correlated SNPs. We further extend MV-IWAS to incorporate variant-specific direct effects on AD, analogous to the existing Egger regression Mendelian Randomization approach, which allows for testing of remaining pleiotropy after adjusting for multiple intermediate pathways. We propose a convenient approach for implementing MV-IWAS that solely relies on publicly available GWAS summary data and a reference panel. Through simulations with either individual-level or summary data, we demonstrate the well controlled Type I errors and superior power of MV-IWAS over UV-IWAS in the presence of pleiotropic SNPs. We apply the summary statistic based tests to 1578 heritable imaging derived phenotypes (IDPs) from the UK Biobank. MV-IWAS detected numerous IDPs as possible false positives by UV-IWAS while uncovering many additional causal neuroimaging phenotypes in AD which are strongly supported by the existing literature.
Collapse
Affiliation(s)
- Katherine A Knutson
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota United States
| | - Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota United States
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota United States.
| |
Collapse
|
15
|
Selection probability of multivariate regularization to identify pleiotropic variants in genetic association studies. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2020. [DOI: 10.29220/csam.2020.27.5.535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
Deng Y, Pan W. A powerful and versatile colocalization test. PLoS Comput Biol 2020; 16:e1007778. [PMID: 32275709 PMCID: PMC7176287 DOI: 10.1371/journal.pcbi.1007778] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 04/22/2020] [Accepted: 03/08/2020] [Indexed: 12/17/2022] Open
Abstract
Transcriptome-wide association studies (TWAS and PrediXcan) have been increasingly applied to detect associations between genetically predicted gene expressions and GWAS traits, which may suggest, however do not completely determine, causal genes for GWAS traits, due to the likely violation of their imposed strong assumptions for causal inference. Testing colocalization moves it closer to establishing causal relationships: if a GWAS trait and a gene's expression share the same associated SNP, it may suggest a regulatory (and thus putative causal) role of the SNP mediated through the gene on the GWAS trait. Accordingly, it is of interest to develop and apply various colocalization testing approaches. The existing approaches may each have some severe limitations. For instance, some methods test the null hypothesis that there is colocalization, which is not ideal because often the null hypothesis cannot be rejected simply due to limited statistical power (with too small sample sizes). Some other methods arbitrarily restrict the maximum number of causal SNPs in a locus, which may lead to loss of power in the presence of wide-spread allelic heterogeneity. Importantly, most methods cannot be applied to either GWAS/eQTL summary statistics or cases with more than two possibly correlated traits. Here we present a simple and general approach based on conditional analysis of a locus on multiple traits, overcoming the above and other shortcomings of the existing methods. We demonstrate that, compared with other methods, our new method can be applied to a wider range of scenarios and often perform better. We showcase its applications to both simulated and real data, including a large-scale Alzheimer's disease GWAS summary dataset and a gene expression dataset, and a large-scale blood lipid GWAS summary association dataset. An R package "jointsum" implementing the proposed method is publicly available at github.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
17
|
Neyhart JL, Lorenz AJ, Smith KP. Multi-trait Improvement by Predicting Genetic Correlations in Breeding Crosses. G3 (BETHESDA, MD.) 2019; 9:3153-3165. [PMID: 31358561 PMCID: PMC6778794 DOI: 10.1534/g3.119.400406] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 07/25/2019] [Indexed: 12/22/2022]
Abstract
The many quantitative traits of interest to plant breeders are often genetically correlated, which can complicate progress from selection. Improving multiple traits may be enhanced by identifying parent combinations - an important breeding step - that will deliver more favorable genetic correlations (rG ). Modeling the segregation of genomewide markers with estimated effects may be one method of predicting rG in a cross, but this approach remains untested. Our objectives were to: (i) use simulations to assess the accuracy of genomewide predictions of rG and the long-term response to selection when selecting crosses on the basis of such predictions; and (ii) empirically measure the ability to predict genetic correlations using data from a barley (Hordeum vulgare L.) breeding program. Using simulations, we found that the accuracy to predict rG was generally moderate and influenced by trait heritability, population size, and genetic correlation architecture (i.e., pleiotropy or linkage disequilibrium). Among 26 barley breeding populations, the empirical prediction accuracy of rG was low (-0.012) to moderate (0.42), depending on trait complexity. Within a simulated plant breeding program employing indirect selection, choosing crosses based on predicted rG increased multi-trait genetic gain by 11-27% compared to selection on the predicted cross mean. Importantly, when the starting genetic correlation was negative, such cross selection mitigated or prevented an unfavorable response in the trait under indirect selection. Prioritizing crosses based on predicted genetic correlation can be a feasible and effective method of improving unfavorably correlated traits in breeding programs.
Collapse
Affiliation(s)
- Jeffrey L Neyhart
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
| | - Aaron J Lorenz
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
| | - Kevin P Smith
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN 55108
| |
Collapse
|
18
|
Schaid DJ, Tong X, Batzler A, Sinnwell JP, Qing J, Biernacka JM. Multivariate generalized linear model for genetic pleiotropy. Biostatistics 2019; 20:111-128. [PMID: 29267957 DOI: 10.1093/biostatistics/kxx067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Accepted: 11/05/2017] [Indexed: 02/07/2023] Open
Abstract
When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that $k+1$ traits are associated, given that the null of $k$ associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Mayo Clinic, Harwick 775, 200 First ST SW, Rochester, MN, USA
| | - Xingwei Tong
- School of Statistics, Beijing Normal University, Beijing, China
| | - Anthony Batzler
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jason P Sinnwell
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Jiang Qing
- School of Statistics, Beijing Normal University, Beijing, China
| | | |
Collapse
|
19
|
Abstract
The high mapping resolution of multiparental populations, combined with technology to measure tens of thousands of phenotypes, presents a need for quantitative methods to enhance understanding of the genetic architecture of complex traits. When multiple traits map to a common genomic region, knowledge of the number of distinct loci provides important insight into the underlying mechanism and can assist planning for subsequent experiments. We extend the method of Jiang and Zeng (1995), for testing pleiotropy with a pair of traits, to the case of more than two alleles. We also incorporate polygenic random effects to account for population structure. We use a parametric bootstrap to determine statistical significance. We apply our methods to a behavioral genetics data set from Diversity Outbred mice. Our methods have been incorporated into the R package qtl2pleio.
Collapse
|
20
|
Yazdani A, Yazdani A, Elsea SH, Schaid DJ, Kosorok MR, Dangol G, Samiei A. Genome analysis and pleiotropy assessment using causal networks with loss of function mutation and metabolomics. BMC Genomics 2019; 20:395. [PMID: 31113383 PMCID: PMC6528192 DOI: 10.1186/s12864-019-5772-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 05/03/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Many genome-wide association studies have detected genomic regions associated with traits, yet understanding the functional causes of association often remains elusive. Utilizing systems approaches and focusing on intermediate molecular phenotypes might facilitate biologic understanding. RESULTS The availability of exome sequencing of two populations of African-Americans and European-Americans from the Atherosclerosis Risk in Communities study allowed us to investigate the effects of annotated loss-of-function (LoF) mutations on 122 serum metabolites. To assess the findings, we built metabolomic causal networks for each population separately and utilized structural equation modeling. We then validated our findings with a set of independent samples. By use of methods based on concepts of Mendelian randomization of genetic variants, we showed that some of the affected metabolites are risk predictors in the causal pathway of disease. For example, LoF mutations in the gene KIAA1755 were identified to elevate the levels of eicosapentaenoate (p-value = 5E-14), an essential fatty acid clinically identified to increase essential hypertension. We showed that this gene is in the pathway to triglycerides, where both triglycerides and essential hypertension are risk factors of metabolomic disorder and heart attack. We also identified that the gene CLDN17, harboring loss-of-function mutations, had pleiotropic actions on metabolites from amino acid and lipid pathways. CONCLUSION Using systems biology approaches for the analysis of metabolomics and genetic data, we integrated several biological processes, which lead to findings that may functionally connect genetic variants with complex diseases.
Collapse
Affiliation(s)
| | - Akram Yazdani
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, 10029 USA
| | - Sarah H. Elsea
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030 USA
| | - Daniel J. Schaid
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN 55905 USA
| | - Michael R. Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599 USA
| | - Gita Dangol
- Health Science Center, The University of Texas MD Anderson Cancer Center, Austin, TX 77030 USA
| | - Ahmad Samiei
- Hasso Plattner Institute, 14482 Potsdam, Germany
- Climax Data Pattern, Boston, MA USA
| |
Collapse
|
21
|
Jiang Q, Zhang X, Wu M, Tong X. Testing economic “genetic pleiotropy” for Box-Cox linear model. COMMUN STAT-THEOR M 2019. [DOI: 10.1080/03610926.2019.1609036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Qing Jiang
- Department of Mathematical Statistics, School of Statistics, Beijing Normal University, Beijing, China
| | - Xun Zhang
- Department of Mathematical Statistics, School of Statistics, Beijing Normal University, Beijing, China
| | - Min Wu
- School of Mathematics and Statistics, Hubei University of Science and Technology, Hubei, China
| | - Xingwei Tong
- Department of Mathematical Statistics, School of Statistics, Beijing Normal University, Beijing, China
| |
Collapse
|
22
|
The pleiotropic effect of rs7903146 on type 2 diabetes and ischemic stroke: a family-based study in a Chinese population. J Thromb Thrombolysis 2019; 48:303-314. [PMID: 30980227 DOI: 10.1007/s11239-019-01855-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The rs7903146, an established genetic variant susceptible to type 2 diabetes (T2D), is also reported to be related to ischemic stroke (IS), though conflicting. Furthermore, it remained unknown whether the genetic association with stroke is independent of T2D. In the current study, 1603 individuals across 986 families were included. The genetic pleiotropic effect on each outcome diseases (T2D, overall IS, or each subtype) was assessed using multilevel logistic regression after adjustment for multiple covariates. Principal component of heritability (PCH) was also used to assess the pleiotropy by combining T2D and IS into one outcome for analysis. To identify the T2D-independent path out of the pleiotropic effect on IS, T2D status was additionally adjusted for the risk of IS or each subtype. The analyses of putative molecular pathways (dyslipidemia, hypertension, obesity and inflammation) and gene-lifestyle interactions were also performed. We found that rs7903146_T allele was associated with a 77% higher risk of T2D, 55% of IS, and 70% of large artery atherosclerosis (LAA) subtype respectively. Particularly, a T2D-independent genetic effect was identified to increase the risk of overall IS and LAA. No evidence on the molecular mechanisms and gene-lifestyle interaction behind the pleiotropic genetic effect was observed. In conclusion, our study provided evidence that a T2D-independent path was identified out of the pleiotropic effect of rs7903146 on IS. However, further studies were needed to validate the biological mechanisms behind the pleiotropic effect and the modification by lifestyle intervention.
Collapse
|
23
|
Yazdani A, Yazdani A, Méndez Giráldez R, Aguilar D, Sartore L. A Multi-Trait Approach Identified Genetic Variants Including a Rare Mutation in RGS3 with Impact on Abnormalities of Cardiac Structure/Function. Sci Rep 2019; 9:5845. [PMID: 30971721 PMCID: PMC6458140 DOI: 10.1038/s41598-019-41362-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 03/05/2019] [Indexed: 01/29/2023] Open
Abstract
Heart failure is a major cause for premature death. Given the heterogeneity of the heart failure syndrome, identifying genetic determinants of cardiac function and structure may provide greater insights into heart failure. Despite progress in understanding the genetic basis of heart failure through genome wide association studies, the heritability of heart failure is not well understood. Gaining further insights into mechanisms that contribute to heart failure requires systematic approaches that go beyond single trait analysis. We integrated a Bayesian multi-trait approach and a Bayesian networks for the analysis of 10 correlated traits of cardiac structure and function measured across 3387 individuals with whole exome sequence data. While using single-trait based approaches did not find any significant genetic variant, applying the integrative Bayesian multi-trait approach, we identified 3 novel variants located in genes, RGS3, CHD3, and MRPL38 with significant impact on the cardiac traits such as left ventricular volume index, parasternal long axis interventricular septum thickness, and mean left ventricular wall thickness. Among these, the rare variant NC_000009.11:g.116346115C > A (rs144636307) in RGS3 showed pleiotropic effect on left ventricular mass index, left ventricular volume index and maximal left atrial anterior-posterior diameter while RGS3 can inhibit TGF-beta signaling associated with left ventricle dilation and systolic dysfunction.
Collapse
Affiliation(s)
- Akram Yazdani
- Department of Genetics and Genomic Science, Icahn School of Medicine at Mount Sinai, New York, NY, USA. .,Climax Data Pattern, Boston, MA, USA.
| | - Azam Yazdani
- School of Medicine, Boston University, Boston, MA, USA
| | - Raúl Méndez Giráldez
- Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Luca Sartore
- National Institute of Statistical Science, Washington, DC, USA
| |
Collapse
|
24
|
A Multivariate Genome-Wide Association Study of Wing Shape in Drosophila melanogaster. Genetics 2019; 211:1429-1447. [PMID: 30792267 DOI: 10.1534/genetics.118.301342] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 02/03/2019] [Indexed: 02/02/2023] Open
Abstract
Due to the complexity of genotype-phenotype relationships, simultaneous analyses of genomic associations with multiple traits will be more powerful and informative than a series of univariate analyses. However, in most cases, studies of genotype-phenotype relationships have been analyzed only one trait at a time. Here, we report the results of a fully integrated multivariate genome-wide association analysis of the shape of the Drosophila melanogaster wing in the Drosophila Genetic Reference Panel. Genotypic effects on wing shape were highly correlated between two different laboratories. We found 2396 significant SNPs using a 5% false discovery rate cutoff in the multivariate analyses, but just four significant SNPs in univariate analyses of scores on the first 20 principal component axes. One quarter of these initially significant SNPs retain their effects in regularized models that take into account population structure and linkage disequilibrium. A key advantage of multivariate analysis is that the direction of the estimated phenotypic effect is much more informative than a univariate one. We exploit this fact to show that the effects of knockdowns of genes implicated in the initial screen were on average more similar than expected under a null model. A subset of SNP effects were replicable in an unrelated panel of inbred lines. Association studies that take a phenomic approach, considering many traits simultaneously, are an important complement to the power of genomics.
Collapse
|
25
|
Grau-Perez M, Agha G, Pang Y, Bermudez JD, Tellez-Plaza M. Mendelian Randomization and the Environmental Epigenetics of Health: a Systematic Review. Curr Environ Health Rep 2019; 6:38-51. [DOI: 10.1007/s40572-019-0226-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
26
|
Zhang X, Veturi Y, Verma S, Bone W, Verma A, Lucas A, Hebbring S, Denny JC, Stanaway IB, Jarvik GP, Crosslin D, Larson EB, Rasmussen-Torvik L, Pendergrass SA, Smoller JW, Hakonarson H, Sleiman P, Weng C, Fasel D, Wei WQ, Kullo I, Schaid D, Chung WK, Ritchie MD. Detecting potential pleiotropy across cardiovascular and neurological diseases using univariate, bivariate, and multivariate methods on 43,870 individuals from the eMERGE network. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2019; 24:272-283. [PMID: 30864329 PMCID: PMC6457436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The link between cardiovascular diseases and neurological disorders has been widely observed in the aging population. Disease prevention and treatment rely on understanding the potential genetic nexus of multiple diseases in these categories. In this study, we were interested in detecting pleiotropy, or the phenomenon in which a genetic variant influences more than one phenotype. Marker-phenotype association approaches can be grouped into univariate, bivariate, and multivariate categories based on the number of phenotypes considered at one time. Here we applied one statistical method per category followed by an eQTL colocalization analysis to identify potential pleiotropic variants that contribute to the link between cardiovascular and neurological diseases. We performed our analyses on ~530,000 common SNPs coupled with 65 electronic health record (EHR)-based phenotypes in 43,870 unrelated European adults from the Electronic Medical Records and Genomics (eMERGE) network. There were 31 variants identified by all three methods that showed significant associations across late onset cardiac- and neurologic- diseases. We further investigated functional implications of gene expression on the detected "lead SNPs" via colocalization analysis, providing a deeper understanding of the discovered associations. In summary, we present the framework and landscape for detecting potential pleiotropy using univariate, bivariate, multivariate, and colocalization methods. Further exploration of these potentially pleiotropic genetic variants will work toward understanding disease causing mechanisms across cardiovascular and neurological diseases and may assist in considering disease prevention as well as drug repositioning in future research.
Collapse
Affiliation(s)
- Xinyuan Zhang
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA*Authors contributed equally to this work
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Tsepilov YA, Sharapov SZ, Zaytseva OO, Krumsiek J, Prehn C, Adamski J, Kastenmüller G, Wang-Sattler R, Strauch K, Gieger C, Aulchenko YS. A network-based conditional genetic association analysis of the human metabolome. Gigascience 2018; 7:5214749. [PMID: 30496450 PMCID: PMC6287100 DOI: 10.1093/gigascience/giy137] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 11/06/2018] [Indexed: 12/24/2022] Open
Abstract
Background Genome-wide association studies have identified hundreds of loci that influence a wide variety of complex human traits; however, little is known regarding the biological mechanism of action of these loci. The recent accumulation of functional genomics (“omics”), including metabolomics data, has created new opportunities for studying the functional role of specific changes in the genome. Functional genomic data are characterized by their high dimensionality, the presence of (strong) statistical dependency between traits, and, potentially, complex genetic control. Therefore, the analysis of such data requires specific statistical genetics methods. Results To facilitate our understanding of the genetic control of omics phenotypes, we propose a trait-centered, network-based conditional genetic association (cGAS) approach for identifying the direct effects of genetic variants on omics-based traits. For each trait of interest, we selected from a biological network a set of other traits to be used as covariates in the cGAS. The network can be reconstructed either from biological pathway databases (a mechanistic approach) or directly from the data, using a Gaussian graphical model applied to the metabolome (a data-driven approach). We derived mathematical expressions that allow comparison of the power of univariate analyses with conditional genetic association analyses. We then tested our approach using data from a population-based Cooperative Health Research in the region of Augsburg (KORA) study (n = 1,784 subjects, 1.7 million single-nucleotide polymorphisms) with measured data for 151 metabolites. Conclusions We found that compared to single-trait analysis, performing a genetic association analysis that includes biologically relevant covariates can either gain or lose power, depending on specific pleiotropic scenarios, for which we provide empirical examples. In the context of analyzed metabolomics data, the mechanistic network approach had more power compared to the data-driven approach. Nevertheless, we believe that our analysis shows that neither a prior-knowledge-only approach nor a phenotypic-data-only approach is optimal, and we discuss possibilities for improvement.
Collapse
Affiliation(s)
- Y A Tsepilov
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Lavrentieva Ave. 10, 630090, Russia.,Natural Scince Department, Novosibirsk State University, Novosibirsk, Pirogova Str. 1, 630090, Russia
| | - S Z Sharapov
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Lavrentieva Ave. 10, 630090, Russia.,Natural Scince Department, Novosibirsk State University, Novosibirsk, Pirogova Str. 1, 630090, Russia
| | - O O Zaytseva
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Lavrentieva Ave. 10, 630090, Russia.,Natural Scince Department, Novosibirsk State University, Novosibirsk, Pirogova Str. 1, 630090, Russia
| | - J Krumsiek
- Institute of Computational Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - C Prehn
- Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - J Adamski
- Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Institute of Experimental Genetics, Life and Food Science Center Weihenstephan, Technical University of Munich, Freising-Weihenstephan, Arcisstrasse 21, 80333, Germany.,German Center for Diabetes Research, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - G Kastenmüller
- Institute of Bioinformatics and Systems Biology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - R Wang-Sattler
- German Center for Diabetes Research, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Institute of Epidemiology II, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - K Strauch
- Institute of Genetic Epidemiology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Butenandstrasse 5, 81377, Germany
| | - C Gieger
- German Center for Diabetes Research, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Research Unit of Molecular Epidemiology, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany.,Institute of Epidemiology II, Helmholtz Center Munich - German Research Center for Environmental Health, Neuherberg, Ingolstadter Landtrasse 1, 85764, Germany
| | - Y S Aulchenko
- Institute of Cytology and Genetics SB RAS, Novosibirsk, Lavrentieva Ave. 10, 630090, Russia.,Natural Scince Department, Novosibirsk State University, Novosibirsk, Pirogova Str. 1, 630090, Russia.,PolyOmica, 's-Hertogenbosch, Het Vlaggeschip 61, 5237 PA, The Netherlands
| |
Collapse
|
28
|
Abstract
It is useful to detect allelic heterogeneity (AH), i.e., the presence of multiple causal SNPs in a locus, which, for example, may guide the development of new methods for fine mapping and determine how to interpret an appearing epistasis. In contrast to Mendelian traits, the existence and extent of AH for complex traits had been largely unknown until Hormozdiari et al. proposed a Bayesian method, called causal variants identification in associated regions (CAVIAR), and uncovered widespread AH in complex traits. However, there are several limitations with CAVIAR. First, it assumes a maximum number of causal SNPs in a locus, typically up to six, to save computing time; this assumption, as will be shown, may influence the outcome. Second, its computational time can be too demanding to be feasible since it examines all possible combinations of causal SNPs (under the assumed upper bound). Finally, it outputs a posterior probability of AH, which may be difficult to calibrate with a commonly used nominal significance level. Here, we introduce an intersection-union test (IUT) based on a joint/conditional regression model with all the SNPs in a locus to infer AH. We also propose two sequential IUT-based testing procedures to estimate the number of causal SNPs. Our proposed methods are applicable to not only individual-level genotypic and phenotypic data, but also genome-wide association study (GWAS) summary statistics. We provide numerical examples based on both simulated and real data, including large-scale schizophrenia (SCZ) and high-density lipoprotein (HDL) GWAS summary data sets, to demonstrate the effectiveness of the new methods. In particular, for both the SCZ and HDL data, our proposed IUT not only was faster, but also detected more AH loci than CAVIAR. Our proposed methods are expected to be useful in further uncovering the extent of AH in complex traits.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
29
|
Rudra P, Broadaway KA, Ware EB, Jhun MA, Bielak LF, Zhao W, Smith JA, Peyser PA, Kardia SL, Epstein MP, Ghosh D. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits. Genet Epidemiol 2018; 42:320-332. [PMID: 29601641 PMCID: PMC5980726 DOI: 10.1002/gepi.22121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Revised: 01/19/2018] [Accepted: 02/19/2018] [Indexed: 01/09/2023]
Abstract
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.
Collapse
Affiliation(s)
- Pratyaydipta Rudra
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO
| | | | - Erin B. Ware
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI
| | - Min A. Jhun
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
| | | | - Wei Zhao
- Department of Epidemiology, University of Michigan, Ann Arbor, MI
| | | | | | | | | | - Debashis Ghosh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO
| |
Collapse
|
30
|
Winkler TW, Günther F, Höllerer S, Zimmermann M, Loos RJ, Kutalik Z, Heid IM. A joint view on genetic variants for adiposity differentiates subtypes with distinct metabolic implications. Nat Commun 2018; 9:1946. [PMID: 29769528 PMCID: PMC5956079 DOI: 10.1038/s41467-018-04124-9] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 04/06/2018] [Indexed: 12/20/2022] Open
Abstract
The problem of the genetics of related phenotypes is often addressed by analyzing adjusted-model traits, but such traits warrant cautious interpretation. Here, we adopt a joint view of adiposity traits in ~322,154 subjects (GIANT consortium). We classify 159 signals associated with body mass index (BMI), waist-to-hip ratio (WHR), or WHR adjusted for BMI (WHRadjBMI) at P < 5 × 10-8, into four classes based on the direction of their effects on BMI and WHR. Our classes help differentiate adiposity genetics with respect to anthropometry, fat depots, and metabolic health. Class-specific Mendelian randomization reveals that variants associated with both WHR-decrease and BMI increase are linked to metabolically rather favorable adiposity through beneficial hip fat. Class-specific enrichment analyses implicate digestive systems as a pathway in adiposity genetics. Our results demonstrate that WHRadjBMI variants capture relevant effects of "unexpected fat distribution given the BMI" and that a joint view of the genetics underlying related phenotypes can inform on important biology.
Collapse
Affiliation(s)
- Thomas W Winkler
- Department of Genetic Epidemiology, University of Regensburg, D-93051, Regensburg, Germany.
| | - Felix Günther
- Department of Genetic Epidemiology, University of Regensburg, D-93051, Regensburg, Germany
- Statistical Consulting Unit StaBLab, Department of Statistics, Ludwig-Maximilians-Universität Munich, D-80539, Munich, Germany
| | - Simon Höllerer
- Department of Genetic Epidemiology, University of Regensburg, D-93051, Regensburg, Germany
| | - Martina Zimmermann
- Department of Genetic Epidemiology, University of Regensburg, D-93051, Regensburg, Germany
| | - Ruth Jf Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, 10029, New York, NY, USA
- The Genetics of Obesity and Related Metabolic Traits Program, Icahn School of Medicine at Mount Sinai, 10029, New York, NY, USA
- The Mindich Child health and Development Institute, Icahn School of Medicine at Mount Sinai, 10029, New York, NY, USA
| | - Zoltán Kutalik
- Institute of Social and Preventive Medicine (IUMSP), Centre Hospitalier Universitaire Vaudois (CHUV), 1010, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Iris M Heid
- Department of Genetic Epidemiology, University of Regensburg, D-93051, Regensburg, Germany.
| |
Collapse
|
31
|
Liang X, Sha Q, Rho Y, Zhang S. A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes. Genet Epidemiol 2018; 42:344-353. [PMID: 29682782 DOI: 10.1002/gepi.22124] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/01/2018] [Accepted: 02/19/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Yeonwoo Rho
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
32
|
Salinas YD, Wang Z, DeWan AT. Statistical Analysis of Multiple Phenotypes in Genetic Epidemiologic Studies: From Cross-Phenotype Associations to Pleiotropy. Am J Epidemiol 2018; 187:855-863. [PMID: 29020254 DOI: 10.1093/aje/kwx296] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 08/03/2017] [Indexed: 12/15/2022] Open
Abstract
In the context of genetics, pleiotropy refers to the phenomenon in which a single genetic locus affects more than 1 trait or disease. Genetic epidemiologic studies have identified loci associated with multiple phenotypes, and these cross-phenotype associations are often incorrectly interpreted as examples of pleiotropy. Pleiotropy is only one possible explanation for cross-phenotype associations. Cross-phenotype associations may also arise due to issues related to study design, confounder bias, or nongenetic causal links between the phenotypes under analysis. Therefore, it is necessary to dissect cross-phenotype associations carefully to uncover true pleiotropic loci. In this review, we describe statistical methods that can be used to identify robust statistical evidence of pleiotropy. First, we provide an overview of univariate and multivariate methods for discovery of cross-phenotype associations and highlight important considerations for choosing among available methods. Then, we describe how to dissect cross-phenotype associations by using mediation analysis. Pleiotropic loci provide insights into the mechanistic underpinnings of disease comorbidity, and they may serve as novel targets for interventions that simultaneously treat multiple diseases. Discerning between different types of cross-phenotype associations is necessary to realize the public health potential of pleiotropic loci.
Collapse
Affiliation(s)
- Yasmmyn D Salinas
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
| | - Andrew T DeWan
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut
| |
Collapse
|
33
|
Deng Y, Pan W. Testing Genetic Pleiotropy with GWAS Summary Statistics for Marginal and Conditional Analyses. Genetics 2017; 207:1285-1299. [PMID: 28971959 PMCID: PMC5714448 DOI: 10.1534/genetics.117.300347] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 09/29/2017] [Indexed: 11/18/2022] Open
Abstract
There is growing interest in testing genetic pleiotropy, which is when a single genetic variant influences multiple traits. Several methods have been proposed; however, these methods have some limitations. First, all the proposed methods are based on the use of individual-level genotype and phenotype data; in contrast, for logistical, and other, reasons, summary statistics of univariate SNP-trait associations are typically only available based on meta- or mega-analyzed large genome-wide association study (GWAS) data. Second, existing tests are based on marginal pleiotropy, which cannot distinguish between direct and indirect associations of a single genetic variant with multiple traits due to correlations among the traits. Hence, it is useful to consider conditional analysis, in which a subset of traits is adjusted for another subset of traits. For example, in spite of substantial lowering of low-density lipoprotein cholesterol (LDL) with statin therapy, some patients still maintain high residual cardiovascular risk, and, for these patients, it might be helpful to reduce their triglyceride (TG) level. For this purpose, in order to identify new therapeutic targets, it would be useful to identify genetic variants with pleiotropic effects on LDL and TG after adjusting the latter for LDL; otherwise, a pleiotropic effect of a genetic variant detected by a marginal model could simply be due to its association with LDL only, given the well-known correlation between the two types of lipids. Here, we develop a new pleiotropy testing procedure based only on GWAS summary statistics that can be applied for both marginal analysis and conditional analysis. Although the main technical development is based on published union-intersection testing methods, care is needed in specifying conditional models to avoid invalid statistical estimation and inference. In addition to the previously used likelihood ratio test, we also propose using generalized estimating equations under the working independence model for robust inference. We provide numerical examples based on both simulated and real data, including two large lipid GWAS summary association datasets based on ∼100,000 and ∼189,000 samples, respectively, to demonstrate the difference between marginal and conditional analyses, as well as the effectiveness of our new approach.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota 55455
| |
Collapse
|
34
|
Lin N, Zhu Y, Fan R, Xiong M. A quadratically regularized functional canonical correlation analysis for identifying the global structure of pleiotropy with NGS data. PLoS Comput Biol 2017; 13:e1005788. [PMID: 29040274 PMCID: PMC5659802 DOI: 10.1371/journal.pcbi.1005788] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 10/27/2017] [Accepted: 09/21/2017] [Indexed: 01/12/2023] Open
Abstract
Investigating the pleiotropic effects of genetic variants can increase statistical power, provide important information to achieve deep understanding of the complex genetic structures of disease, and offer powerful tools for designing effective treatments with fewer side effects. However, the current multiple phenotype association analysis paradigm lacks breadth (number of phenotypes and genetic variants jointly analyzed at the same time) and depth (hierarchical structure of phenotype and genotypes). A key issue for high dimensional pleiotropic analysis is to effectively extract informative internal representation and features from high dimensional genotype and phenotype data. To explore correlation information of genetic variants, effectively reduce data dimensions, and overcome critical barriers in advancing the development of novel statistical methods and computational algorithms for genetic pleiotropic analysis, we proposed a new statistic method referred to as a quadratically regularized functional CCA (QRFCCA) for association analysis which combines three approaches: (1) quadratically regularized matrix factorization, (2) functional data analysis and (3) canonical correlation analysis (CCA). Large-scale simulations show that the QRFCCA has a much higher power than that of the ten competing statistics while retaining the appropriate type 1 errors. To further evaluate performance, the QRFCCA and ten other statistics are applied to the whole genome sequencing dataset from the TwinsUK study. We identify a total of 79 genes with rare variants and 67 genes with common variants significantly associated with the 46 traits using QRFCCA. The results show that the QRFCCA substantially outperforms the ten other statistics.
Collapse
Affiliation(s)
- Nan Lin
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| | - Yun Zhu
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch (BBB), Division of Intramural Population Health Research (DIPHR), Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, United States of America
| | - Momiao Xiong
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, United States of America
| |
Collapse
|
35
|
Deng Y, Pan W. Conditional analysis of multiple quantitative traits based on marginal GWAS summary statistics. Genet Epidemiol 2017; 41:427-436. [PMID: 28464407 PMCID: PMC5536980 DOI: 10.1002/gepi.22046] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 01/09/2017] [Accepted: 02/04/2017] [Indexed: 12/22/2022]
Abstract
There has been an increasing interest in joint association testing of multiple traits for possible pleiotropic effects. However, even in the presence of pleiotropy, most of the existing methods cannot distinguish direct and indirect effects of a genetic variant, say single-nucleotide polymorphism (SNP), on multiple traits, and a conditional analysis of a trait adjusting for other traits is perhaps the simplest and most common approach to addressing this question. However, without individual-level genotypic and phenotypic data but with only genome-wide association study (GWAS) summary statistics, as typical with most large-scale GWAS consortium studies, we are not aware of any existing method for such a conditional analysis. We propose such a conditional analysis, offering formulas of necessary calculations to fit a joint linear regression model for multiple quantitative traits. Furthermore, our method can also accommodate conditional analysis on multiple SNPs in addition to on multiple quantitative traits, which is expected to be useful for fine mapping. We provide numerical examples based on both simulated and real GWAS data to demonstrate the effectiveness of our proposed approach, and illustrate possible usefulness of conditional analysis by contrasting its result differences from those of standard marginal analyses.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
36
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|