1
|
Ahrens CW, Murray K, Mazanec RA, Ferguson S, Jones A, Tissue DT, Byrne M, Borevitz JO, Rymer PD. Genomic determinants, architecture, and constraints in drought-related traits in Corymbia calophylla. BMC Genomics 2024; 25:640. [PMID: 38937661 PMCID: PMC11209971 DOI: 10.1186/s12864-024-10531-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 06/14/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND Drought adaptation is critical to many tree species persisting under climate change, however our knowledge of the genetic basis for trees to adapt to drought is limited. This knowledge gap impedes our fundamental understanding of drought response and application to forest production and conservation. To improve our understanding of the genomic determinants, architecture, and trait constraints, we assembled a reference genome and detected ~ 6.5 M variants in 432 phenotyped individuals for the foundational tree Corymbia calophylla. RESULTS We found 273 genomic variants determining traits with moderate heritability (h2SNP = 0.26-0.64). Significant variants were predominantly in gene regulatory elements distributed among several haplotype blocks across all chromosomes. Furthermore, traits were constrained by frequent epistatic and pleiotropic interactions. CONCLUSIONS Our results on the genetic basis for drought traits in Corymbia calophylla have several implications for the ability to adapt to climate change: (1) drought related traits are controlled by complex genomic architectures with large haplotypes, epistatic, and pleiotropic interactions; (2) the most significant variants determining drought related traits occurred in regulatory regions; and (3) models incorporating epistatic interactions increase trait predictions. Our findings indicate that despite moderate heritability drought traits are likely constrained by complex genomic architecture potentially limiting trees response to climate change.
Collapse
Affiliation(s)
- Collin W Ahrens
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, 2753, Australia.
- Cesar Australia, Brunswick, VIC, 3058, Australia.
| | - Kevin Murray
- Research School of Biology, Australian National University, Canberra, ACT, 2600, Australia
| | - Richard A Mazanec
- Biodiversity and Conservation Science, Western Australian Department of Biodiversity, Conservation and Attractions, Kensington, WA, 6151, Australia
| | - Scott Ferguson
- Research School of Biology, Australian National University, Canberra, ACT, 2600, Australia
| | - Ashley Jones
- Research School of Biology, Australian National University, Canberra, ACT, 2600, Australia
| | - David T Tissue
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, 2753, Australia
| | - Margaret Byrne
- Biodiversity and Conservation Science, Western Australian Department of Biodiversity, Conservation and Attractions, Kensington, WA, 6151, Australia
| | - Justin O Borevitz
- Research School of Biology, Australian National University, Canberra, ACT, 2600, Australia
| | - Paul D Rymer
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW, 2753, Australia
| |
Collapse
|
2
|
Yuan M, Goovaerts S, Vanneste M, Matthews H, Hoskens H, Richmond S, Klein OD, Spritz RA, Hallgrimsson B, Walsh S, Shriver MD, Shaffer JR, Weinberg SM, Peeters H, Claes P. Mapping genes for human face shape: exploration of univariate phenotyping strategies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597731. [PMID: 38895298 PMCID: PMC11185724 DOI: 10.1101/2024.06.06.597731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
Human facial shape, while strongly heritable, involves both genetic and structural complexity, necessitating precise phenotyping for accurate assessment. Common phenotyping strategies include simplifying 3D facial features into univariate traits such as anthropometric measurements (e.g., inter-landmark distances), unsupervised dimensionality reductions (e.g., principal component analysis (PCA) and auto-encoder (AE) approaches), and assessing resemblance to particular facial gestalts (e.g., syndromic facial archetypes). This study provides a comparative assessment of these strategies in genome-wide association studies (GWASs) of 3D facial shape. Specifically, we investigated inter-landmark distances, PCA and AE-derived latent dimensions, and facial resemblance to random, extreme, and syndromic gestalts within a GWAS of 8,426 individuals of recent European ancestry. Inter-landmark distances exhibit the highest SNP-based heritability as estimated via LD score regression, followed by AE dimensions. Conversely, resemblance scores to extreme and syndromic facial gestalts display the lowest heritability, in line with expectations. Notably, the aggregation of multiple GWASs on facial resemblance to random gestalts reveals the highest number of independent genetic loci. This novel, easy-to-implement phenotyping approach holds significant promise for capturing genetically relevant morphological traits derived from complex biomedical imaging datasets, and its applications extend beyond faces. Nevertheless, these different phenotyping strategies capture different genetic influences on craniofacial shape. Thus, it remains valuable to explore these strategies individually and in combination to gain a more comprehensive understanding of the genetic factors underlying craniofacial shape and related traits. Author Summary Advancements linking variation in the human genome to phenotypes have rapidly evolved in recent decades and have revealed that most human traits are influenced by genetic variants to at least some degree. While many traits, such as stature, are straightforward to acquire and investigate, the multivariate and multipartite nature of facial shape makes quantification more challenging. In this study, we compared the impact of different facial phenotyping approaches on gene mapping outcomes. Our findings suggest that the choice of facial phenotyping method has an impact on apparent trait heritability and the ability to detect genetic association signals. These results offer valuable insights into the importance of phenotyping in genetic investigations, especially when dealing with highly complex morphological traits.
Collapse
|
3
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
4
|
Kim K, Jun TH, Ha BK, Wang S, Sun H. New statistical selection method for pleiotropic variants associated with both quantitative and qualitative traits. BMC Bioinformatics 2023; 24:381. [PMID: 37817069 PMCID: PMC10563219 DOI: 10.1186/s12859-023-05505-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Accepted: 09/28/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Identification of pleiotropic variants associated with multiple phenotypic traits has received increasing attention in genetic association studies. Overlapping genetic associations from multiple traits help to detect weak genetic associations missed by single-trait analyses. Many statistical methods were developed to identify pleiotropic variants with most of them being limited to quantitative traits when pleiotropic effects on both quantitative and qualitative traits have been observed. This is a statistically challenging problem because there does not exist an appropriate multivariate distribution to model both quantitative and qualitative data together. Alternatively, meta-analysis methods can be applied, which basically integrate summary statistics of individual variants associated with either a quantitative or a qualitative trait without accounting for correlations among genetic variants. RESULTS We propose a new statistical selection method based on a unified selection score quantifying how a genetic variant, i.e., a pleiotropic variant associates with both quantitative and qualitative traits. In our extensive simulation studies where various types of pleiotropic effects on both quantitative and qualitative traits were considered, we demonstrated that the proposed method outperforms the existing meta-analysis methods in terms of true positive selection. We also applied the proposed method to a peanut dataset with 6 quantitative and 2 qualitative traits, and a cowpea dataset with 2 quantitative and 6 qualitative traits. We were able to detect some potentially pleiotropic variants missed by the existing methods in both analyses. CONCLUSIONS The proposed method is able to locate pleiotropic variants associated with both quantitative and qualitative traits. It has been implemented into an R package 'UNISS', which can be downloaded from http://github.com/statpng/uniss.
Collapse
Affiliation(s)
- Kipoong Kim
- Department of Statistic, Pusan National University, 46241, Busan, Korea
| | - Tae-Hwan Jun
- Department of Plant Bioscience, Pusan National University, 50463, Miryang, Korea
| | - Bo-Keun Ha
- Department of Applied Plant Science, Chonnam National University, 61186, Gwangju, Korea
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, 10032, USA
| | - Hokeun Sun
- Department of Statistic, Pusan National University, 46241, Busan, Korea.
| |
Collapse
|
5
|
Roth K, Pröll-Cornelissen MJ, Henne H, Appel AK, Schellander K, Tholen E, Große-Brinkhaus C. Multivariate genome-wide associations for immune traits in two maternal pig lines. BMC Genomics 2023; 24:492. [PMID: 37641029 PMCID: PMC10463314 DOI: 10.1186/s12864-023-09594-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/16/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND Immune traits are considered to serve as potential biomarkers for pig's health. Medium to high heritabilities have been observed for some of the immune traits suggesting genetic variability of these phenotypes. Consideration of previously established genetic correlations between immune traits can be used to identify pleiotropic genetic markers. Therefore, genome-wide association study (GWAS) approaches are required to explore the joint genetic foundation for health biomarkers. Usually, GWAS explores phenotypes in a univariate (uv), trait-by-trait manner. Besides two uv GWAS methods, four multivariate (mv) GWAS approaches were applied on combinations out of 22 immune traits for Landrace (LR) and Large White (LW) pig lines. RESULTS In total 433 (LR: 351, LW: 82) associations were identified with the uv approach implemented in PLINK and a Bayesian linear regression uv approach (BIMBAM) software. Single Nucleotide Polymorphisms (SNPs) that were identified with both uv approaches (n = 32) were mostly associated with immune traits such as haptoglobin, red blood cell characteristics and cytokines, and were located in protein-coding genes. Mv GWAS approaches detected 647 associations for different mv immune trait combinations which were summarized to 133 Quantitative Trait Loci (QTL). SNPs for different trait combinations (n = 66) were detected with more than one mv method. Most of these SNPs are associated with red blood cell related immune trait combinations. Functional annotation of these QTL revealed 453 immune-relevant protein-coding genes. With uv methods shared markers were not observed between the breeds, whereas mv approaches were able to detect two conjoint SNPs for LR and LW. Due to unmapped positions for these markers, their functional annotation was not clarified. CONCLUSIONS This study evaluated the joint genetic background of immune traits in LR and LW piglets through the application of various uv and mv GWAS approaches. In comparison to uv methods, mv methodologies identified more significant associations, which might reflect the pleiotropic background of the immune system more accurately. In genetic research of complex traits, the SNP effects are generally small. Furthermore, one genetic variant can affect several correlated immune traits at the same time, termed pleiotropy. As mv GWAS methods consider strong dependencies among traits, the power to detect SNPs can be boosted. Both methods revealed immune-relevant potential candidate genes. Our results indicate that one single test is not able to detect all the different types of genetic effects in the most powerful manner and therefore, the methods should be applied complementary.
Collapse
Affiliation(s)
- Katharina Roth
- Institute of Animal Science, University of Bonn, Endenicher Allee 15, 53115, Bonn, Germany
| | | | - Hubert Henne
- BHZP GmbH, An der Wassermühle 8, 21368, Dahlenburg-Ellringen, Germany
| | | | - Karl Schellander
- Institute of Animal Science, University of Bonn, Endenicher Allee 15, 53115, Bonn, Germany
| | - Ernst Tholen
- Institute of Animal Science, University of Bonn, Endenicher Allee 15, 53115, Bonn, Germany
| | | |
Collapse
|
6
|
Li Y, Yang H, Guo J, Yang Y, Yu Q, Guo Y, Zhang C, Wang Z, Zuo P. Uncovering the candidate genes related to sheep body weight using multi-trait genome-wide association analysis. Front Vet Sci 2023; 10:1206383. [PMID: 37662987 PMCID: PMC10469697 DOI: 10.3389/fvets.2023.1206383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 08/04/2023] [Indexed: 09/05/2023] Open
Abstract
In sheep, body weight is an economically important trait. This study sought to map genetic loci related to weaning weight and yearling weight. To this end, a single-trait and multi-trait genome-wide association study (GWAS) was performed using a high-density 600 K single nucleotide polymorphism (SNP) chip. The results showed that 43 and 56 SNPs were significantly associated with weaning weight and yearling weight, respectively. A region associated with both weaning and yearling traits (OARX: 6.74-7.04 Mb) was identified, suggesting that the same genes could play a role in regulating both these traits. This region was found to contain three genes (TBL1X, SHROOM2 and GPR143). The most significant SNP was Affx-281066395, located at 6.94 Mb (p = 1.70 × 10-17), corresponding to the SHROOM2 gene. We also identified 93 novel SNPs elated to sheep weight using multi-trait GWAS analysis. A new genomic region (OAR10: 76.04-77.23 Mb) with 22 significant SNPs were discovered. Combining transcriptomic data from multiple tissues and genomic data in sheep, we found the HINT1, ASB11 and GPR143 genes may involve in sheep body weight. So, multi-omic anlaysis is a valuable strategy identifying candidate genes related to body weight.
Collapse
Affiliation(s)
- Yunna Li
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Hua Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Jing Guo
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Yonglin Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Qian Yu
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Yuanyuan Guo
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Chaoxin Zhang
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Zhipeng Wang
- College of Animal Science and Technology, Northeast Agricultural University,, Harbin, China
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
| | - Peng Zuo
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science,, Shihezi, China
- College of Science, Northeast Agricultural University, Harbin, China
| |
Collapse
|
7
|
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. CELL GENOMICS 2023; 3:100371. [PMID: 37601973 PMCID: PMC10435382 DOI: 10.1016/j.xgen.2023.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 05/04/2023] [Accepted: 07/07/2023] [Indexed: 08/22/2023]
Abstract
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
Collapse
Affiliation(s)
- Yidong Zhang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100006, China
| | - Xilin Jiang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0SR, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, UK
| | - Alexander J. Mentzer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9700 RB, the Netherlands
| |
Collapse
|
8
|
Ting BW, Wright FA, Zhou YH. Simultaneous modeling of multivariate heterogeneous responses and heteroskedasticity via a two-stage composite likelihood. Biom J 2023; 65:e2200029. [PMID: 37212427 PMCID: PMC10524370 DOI: 10.1002/bimj.202200029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 02/08/2023] [Accepted: 03/13/2023] [Indexed: 05/23/2023]
Abstract
Multivariate heterogeneous responses and heteroskedasticity have attracted increasing attention in recent years. In genome-wide association studies, effective simultaneous modeling of multiple phenotypes would improve statistical power and interpretability. However, a flexible common modeling system for heterogeneous data types can pose computational difficulties. Here we build upon a previous method for multivariate probit estimation using a two-stage composite likelihood that exhibits favorable computational time while retaining attractive parameter estimation properties. We extend this approach to incorporate multivariate responses of heterogeneous data types (binary and continuous), and possible heteroskedasticity. Although the approach has wide applications, it would be particularly useful for genomics, precision medicine, or individual biomedical prediction. Using a genomics example, we explore statistical power and confirm that the approach performs well for hypothesis testing and coverage percentages under a wide variety of settings. The approach has the potential to better leverage genomics data and provide interpretable inference for pleiotropy, in which a locus is associated with multiple traits.
Collapse
Affiliation(s)
- Bryan W. Ting
- Bioinformatics Research Center, North Carolina State University, NC, USA
| | - Fred A. Wright
- Bioinformatics Research Center, North Carolina State University, NC, USA
- Department of Statistics, North Carolina State University, NC, USA
- Department of Biological Sciences, North Carolina State University, NC, USA
| | - Yi-Hui Zhou
- Bioinformatics Research Center, North Carolina State University, NC, USA
- Department of Statistics, North Carolina State University, NC, USA
- Department of Biological Sciences, North Carolina State University, NC, USA
| |
Collapse
|
9
|
Sajal IH, Biswas S. Bivariate quantitative Bayesian LASSO for detecting association of rare haplotypes with two correlated continuous phenotypes. Front Genet 2023; 14:1104727. [PMID: 36968609 PMCID: PMC10033866 DOI: 10.3389/fgene.2023.1104727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 02/21/2023] [Indexed: 03/12/2023] Open
Abstract
In genetic association studies, the multivariate analysis of correlated phenotypes offers statistical and biological advantages compared to analyzing one phenotype at a time. The joint analysis utilizes additional information contained in the correlation and avoids multiple testing. It also provides an opportunity to investigate and understand shared genetic mechanisms of multiple phenotypes. Bivariate logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotypes associated with two binary phenotypes or one binary and one continuous phenotype jointly. There is currently no haplotype association test available that can handle multiple continuous phenotypes. In this study, by employing the framework of bivariate LBL, we propose bivariate quantitative Bayesian LASSO (QBL) to detect rare haplotypes associated with two continuous phenotypes. Bivariate QBL removes unassociated haplotypes by regularizing the regression coefficients and utilizing a latent variable to model correlation between two phenotypes. We carry out extensive simulations to investigate the performance of bivariate QBL and compare it with that of a standard (univariate) haplotype association test, Haplo.score (applied twice to two phenotypes individually). Bivariate QBL performs better than Haplo.score in all simulations with varying degrees of power gain. We analyze Genetic Analysis Workshop 19 exome sequencing data on systolic and diastolic blood pressures and detect several rare haplotypes associated with the two phenotypes.
Collapse
|
10
|
Xie H, Cao X, Zhang S, Sha Q. Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies. Genet Epidemiol 2023; 47:185-197. [PMID: 36691904 DOI: 10.1002/gepi.22513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 11/16/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023]
Abstract
In genome-wide association studies (GWAS) for thousands of phenotypes in biobanks, most binary phenotypes have substantially fewer cases than controls. Many widely used approaches for joint analysis of multiple phenotypes produce inflated type I error rates for such extremely unbalanced case-control phenotypes. In this research, we develop a method to jointly analyze multiple unbalanced case-control phenotypes to circumvent this issue. We first group multiple phenotypes into different clusters based on a hierarchical clustering method, then we merge phenotypes in each cluster into a single phenotype. In each cluster, we use the saddlepoint approximation to estimate the p value of an association test between the merged phenotype and a single nucleotide polymorphism (SNP) which eliminates the issue of inflated type I error rate of the test for extremely unbalanced case-control phenotypes. Finally, we use the Cauchy combination method to obtain an integrated p value for all clusters to test the association between multiple phenotypes and a SNP. We use extensive simulation studies to evaluate the performance of the proposed approach. The results show that the proposed approach can control type I error rate very well and is more powerful than other available methods. We also apply the proposed approach to phenotypes in category IX (diseases of the circulatory system) in the UK Biobank. We find that the proposed approach can identify more significant SNPs than the other viable methods we compared with.
Collapse
Affiliation(s)
- Hongjing Xie
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
11
|
Reinert S. Quantitative genetics of pleiotropy and its potential for plant sciences. JOURNAL OF PLANT PHYSIOLOGY 2022; 276:153784. [PMID: 35944292 DOI: 10.1016/j.jplph.2022.153784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/14/2022] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Affiliation(s)
- Stephan Reinert
- Friedrich-Alexander-University Erlangen-Nürnberg, Department of Biology, Division of Biochemistry, Biocomputing Lab, Staudtstraße 5, 91058, Erlangen, Germany.
| |
Collapse
|
12
|
Tan VY, Timpson NJ. The UK Biobank: A Shining Example of Genome-Wide Association Study Science with the Power to Detect the Murky Complications of Real-World Epidemiology. Annu Rev Genomics Hum Genet 2022; 23:569-589. [PMID: 35508184 DOI: 10.1146/annurev-genom-121321-093606] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Genome-wide association studies (GWASs) have successfully identified thousands of genetic variants that are reliably associated with human traits. Although GWASs are restricted to certain variant frequencies, they have improved our understanding of the genetic architecture of complex traits and diseases. The UK Biobank (UKBB) has brought substantial analytical opportunity and performance to association studies. The dramatic expansion of many GWAS sample sizes afforded by the inclusion of UKBB data has improved the power of estimation of effect sizes but, critically, has done so in a context where phenotypic depth and precision enable outcome dissection and the application of epidemiological approaches. However, at the same time, the availability of such a large, well-curated, and deeply measured population-based collection has the capacity to increase our exposure to the many complications and inferential complexities associated with GWASs and other analyses. In this review, we discuss the impact that UKBB has had in the GWAS era, some of the opportunities that it brings, and exemplar challenges that illustrate the reality of using data from this world-leading resource.
Collapse
Affiliation(s)
- Vanessa Y Tan
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom;
- Bristol Medical School, University of Bristol, Bristol, United Kingdom
| | - Nicholas J Timpson
- Medical Research Council (MRC) Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom;
- Bristol Medical School, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
13
|
Wang M, Zhang S, Sha Q. A computationally efficient clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. PLoS One 2022; 17:e0260911. [PMID: 35482827 PMCID: PMC9049312 DOI: 10.1371/journal.pone.0260911] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 04/13/2022] [Indexed: 11/18/2022] Open
Abstract
There has been an increasing interest in joint analysis of multiple phenotypes in genome-wide association studies (GWAS) because jointly analyzing multiple phenotypes may increase statistical power to detect genetic variants associated with complex diseases or traits. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes in genetic association studies, including the Clustering Linear Combination (CLC) method. The CLC method works particularly well with phenotypes that have natural groupings, but due to the unknown number of clusters for a given data, the final test statistic of CLC method is the minimum p-value among all p-values of the CLC test statistics obtained from each possible number of clusters. Therefore, a simulation procedure needs to be used to evaluate the p-value of the final test statistic. This makes the CLC method computationally demanding. We develop a new method called computationally efficient CLC (ceCLC) to test the association between multiple phenotypes and a genetic variant. Instead of using the minimum p-value as the test statistic in the CLC method, ceCLC uses the Cauchy combination test to combine all p-values of the CLC test statistics obtained from each possible number of clusters. The test statistic of ceCLC approximately follows a standard Cauchy distribution, so the p-value can be obtained from the cumulative density function without the need for the simulation procedure. Through extensive simulation studies and application on the COPDGene data, the results demonstrate that the type I error rates of ceCLC are effectively controlled in different simulation settings and ceCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Meida Wang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Shuanglin Zhang
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| | - Qiuying Sha
- Mathematical Sciences, Michigan Technological University, Houghton, MI, United States of America
| |
Collapse
|
14
|
Fu L, Wang Y, Li T, Yang S, Hu YQ. A Novel Hierarchical Clustering Approach for Joint Analysis of Multiple Phenotypes Uncovers Obesity Variants Based on ARIC. Front Genet 2022; 13:791920. [PMID: 35391794 PMCID: PMC8981031 DOI: 10.3389/fgene.2022.791920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/27/2022] [Indexed: 12/02/2022] Open
Abstract
Genome-wide association studies (GWASs) have successfully discovered numerous variants underlying various diseases. Generally, one-phenotype one-variant association study in GWASs is not efficient in identifying variants with weak effects, indicating that more signals have not been identified yet. Nowadays, jointly analyzing multiple phenotypes has been recognized as an important approach to elevate the statistical power for identifying weak genetic variants on complex diseases, shedding new light on potential biological mechanisms. Therefore, hierarchical clustering based on different methods for calculating correlation coefficients (HCDC) is developed to synchronously analyze multiple phenotypes in association studies. There are two steps involved in HCDC. First, a clustering approach based on the similarity matrix between two groups of phenotypes is applied to choose a representative phenotype in each cluster. Then, we use existing methods to estimate the genetic associations with the representative phenotypes rather than the individual phenotypes in every cluster. A variety of simulations are conducted to demonstrate the capacity of HCDC for boosting power. As a consequence, existing methods embedding HCDC are either more powerful or comparable with those of without embedding HCDC in most scenarios. Additionally, the application of obesity-related phenotypes from Atherosclerosis Risk in Communities via existing methods with HCDC uncovered several associated variants. Among these, UQCC1-rs1570004 is reported as a significant obesity signal for the first time, whose differential expression in subcutaneous fat, visceral fat, and muscle tissue is worthy of further functional studies.
Collapse
Affiliation(s)
- Liwan Fu
- Center for Non-communicable Disease Management, National Center for Children's Health, Beijing Children's Hospital, Capital Medical University, Beijing, China.,State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuquan Wang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Tingting Li
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Siqian Yang
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
15
|
Liu W, Xu Y, Wang A, Huang T, Liu Z. The eigen higher criticism and eigen Berk–Jones tests for multiple trait association studies based on GWAS summary statistics. Genet Epidemiol 2021; 46:89-104. [DOI: 10.1002/gepi.22439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/10/2021] [Accepted: 10/21/2021] [Indexed: 11/11/2022]
Affiliation(s)
- Wei Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
- Department of Cell Biology and Genetics, School of Basic Medical Sciences Xi'an Jiaotong University Health Science Center Xi'an China
| | - Yuyang Xu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Anqi Wang
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| | - Tao Huang
- Department of Epidemiology and Biostatistics, School of Public Health Peking University Beijing China
- Institute for Artificial Intelligence, Center for Intelligent Public Health Peking University Beijing China
- Key Laboratory of Molecular Cardiovascular Diseases, Peking University Ministry of Education Beijing China
| | - Zhonghua Liu
- Department of Statistics and Actuarial Science The University of Hong Kong Hong Kong SAR China
| |
Collapse
|
16
|
Associating Multivariate Traits with Genetic Variants Using Collapsing and Kernel Methods with Pedigree- or Population-Based Studies. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8812282. [PMID: 33628328 PMCID: PMC7889379 DOI: 10.1155/2021/8812282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 01/02/2021] [Accepted: 01/08/2021] [Indexed: 11/18/2022]
Abstract
In genetic association analysis, several relevant phenotypes or multivariate traits with different types of components are usually collected to study complex or multifactorial diseases. Over the past few years, jointly testing for association between multivariate traits and multiple genetic variants has become more popular because it can increase statistical power to identify causal genes in pedigree- or population-based studies. However, most of the existing methods mainly focus on testing genetic variants associated with multiple continuous phenotypes. In this investigation, we develop a framework for identifying the pleiotropic effects of genetic variants on multivariate traits by using collapsing and kernel methods with pedigree- or population-structured data. The proposed framework is applicable to the burden test, the kernel test, and the omnibus test for autosomes and the X chromosome. The proposed multivariate trait association methods can accommodate continuous phenotypes or binary phenotypes and further can adjust for covariates. Simulation studies show that the performance of our methods is satisfactory with respect to the empirical type I error rates and power rates in comparison with the existing methods.
Collapse
|
17
|
Yuan X, Biswas S. Detecting rare haplotype association with two correlated phenotypes of binary and continuous types. Stat Med 2021; 40:1877-1900. [PMID: 33438281 DOI: 10.1002/sim.8877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 11/18/2020] [Accepted: 12/25/2020] [Indexed: 11/10/2022]
Abstract
Multiple correlated traits/phenotypes are often collected in genetic association studies and they may share a common genetic mechanism. Joint analysis of correlated phenotypes has well-known advantages over one-at-a-time analysis including gain in power and better understanding of genetic etiology. However, when the phenotypes are of discordant types such as binary and continuous, the joint modeling is more challenging. Another research area of current interest is discovery of rare genetic variants. Currently there is no method available for detecting association of rare (or common) haplotypes with multiple discordant phenotypes jointly. Our goal is to fill this gap specifically for two discordant phenotypes. We consider a rare haplotype association method for a binary phenotype, logistic Bayesian LASSO (univariate LBL) and its extension for two correlated binary phenotypes (bivariate LBL-2B). Under this framework, we propose a haplotype association test with binary and continuous phenotypes jointly (bivariate LBL-BC). Specifically, we use a latent variable to induce correlation between the two phenotypes. We carry out extensive simulations to investigate bivariate LBL-BC and compare it with univariate LBL and bivariate LBL-2B. In most settings, bivariate LBL-BC performs the best. In only two situations, bivariate LBL-BC has similar performance-when the two phenotypes are (1) weakly or not correlated and the target haplotype affects the binary phenotype only and (2) strongly positively correlated and the target haplotype affects both phenotypes in positive direction. Finally, we apply the method to a data set on lung cancer and nicotine dependence and detect several haplotypes including a rare one.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas, USA
| |
Collapse
|
18
|
Deng Y, Wu S, Fan H. Genome-wide pathway-based quantitative multiple phenotypes analysis. PLoS One 2020; 15:e0240910. [PMID: 33175855 PMCID: PMC7657528 DOI: 10.1371/journal.pone.0240910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 10/06/2020] [Indexed: 11/18/2022] Open
Abstract
For complex diseases, genome-wide pathway association studies have become increasingly promising. Currently, however, pathway-based association analysis mainly focus on a single phenotype, which may insufficient to describe the complex diseases and physiological processes. This work proposes a combination model to evaluate the association between a pathway and multiple phenotypes and to reduce the run time based on asymptotic results. For a single phenotype, we propose a semi-supervised maximum kernel-based U-statistics (mSKU) method to assess the pathway-based association analysis. For multiple phenotypes, we propose the fisher combination function with dependent phenotypes (FC) to transform the p-values between the pathway and each marginal phenotype individually to achieve pathway-based multiple phenotypes analysis. With real data from the Alzheimer Disease Neuroimaging Initiative (ADNI) study and Human Liver Cohort (HLC) study, the FC-mSKU method allows us to specify which pathways are specific to a single phenotype or contribute to common genetic constructions of multiple phenotypes. If we only focus on single-phenotype tests, we may miss some findings for etiology studies. Through extensive simulation studies, the FC-mSKU method demonstrates its advantages compared with its counterparts.
Collapse
Affiliation(s)
- Yamin Deng
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China.,Division of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Shiman Wu
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China
| | - Huifang Fan
- Statistics Center, First Hospital of Shanxi Medical University, Taiyuan, China
| |
Collapse
|
19
|
Wen Y, Lu Q. An optimal kernel-based multivariate U-statistic to test for associations with multiple phenotypes. Biostatistics 2020; 23:705-720. [PMID: 33108446 DOI: 10.1093/biostatistics/kxaa049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 09/24/2020] [Accepted: 10/03/2020] [Indexed: 11/13/2022] Open
Abstract
Set-based analysis that jointly considers multiple predictors in a group has been broadly conducted for association tests. However, their power can be sensitive to the distribution of phenotypes, and the underlying relationships between predictors and outcomes. Moreover, most of the set-based methods are designed for single-trait analysis, making it hard to explore the pleiotropic effect and borrow information when multiple phenotypes are available. Here, we propose a kernel-based multivariate U-statistics (KMU) that is robust and powerful in testing the association between a set of predictors and multiple outcomes. We employed a rank-based kernel function for the outcomes, which makes our method robust to various outcome distributions. Rather than selecting a single kernel, our test statistics is built based on multiple kernels selected in a data-driven manner, and thus is capable of capturing various complex relationships between predictors and outcomes. The asymptotic properties of our test statistics have been developed. Through simulations, we have demonstrated that KMU has controlled type I error and higher power than its counterparts. We further showed its practical utility by analyzing a whole genome sequencing data from Alzheimer's Disease Neuroimaging Initiative study, where novel genes have been detected to be associated with imaging phenotypes.
Collapse
Affiliation(s)
- Y Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Qing Lu
- Department of Biostatistics, College of Public Health, University of Florida, Gainesville, FL, USA
| |
Collapse
|
20
|
Rice BR, Fernandes SB, Lipka AE. Multi-Trait Genome-Wide Association Studies Reveal Loci Associated with Maize Inflorescence and Leaf Architecture. PLANT & CELL PHYSIOLOGY 2020; 61:1427-1437. [PMID: 32186727 DOI: 10.1093/pcp/pcaa039] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 03/17/2020] [Indexed: 05/23/2023]
Abstract
Maize inflorescence is a complex phenotype that involves the physical and developmental interplay of multiple traits. Given the evidence that genes could pleiotropically contribute to several of these traits, we used publicly available maize data to assess the ability of multivariate genome-wide association study (GWAS) approaches to identify pleiotropic quantitative trait loci (pQTL). Our analysis of 23 publicly available inflorescence and leaf-related traits in a diversity panel of n = 281 maize lines genotyped with 376,336 markers revealed that the two multivariate GWAS approaches we tested were capable of identifying pQTL in genomic regions coinciding with similar associations found in previous studies. We then conducted a parallel simulation study on the same individuals, where it was shown that multivariate GWAS approaches yielded a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches for all evaluated simulation settings except for when the correlated simulated traits had a heritability of 0.9. We therefore conclude that the implementation of state-of-the-art multivariate GWAS approaches is a useful tool for dissecting pleiotropy and their more widespread implementation could facilitate the discovery of genes and other biological mechanisms underlying maize inflorescence.
Collapse
Affiliation(s)
- Brian R Rice
- Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| | | | - Alexander E Lipka
- Department of Crop Sciences, University of Illinois, Urbana, IL, USA
| |
Collapse
|
21
|
Wang T, Li J, Gao X, Song W, Chen C, Yao D, Ma J, Xu L, Ma Y. Genome-wide association study of milk components in Chinese Holstein cows using single nucleotide polymorphism. Livest Sci 2020. [DOI: 10.1016/j.livsci.2020.103951] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Sha Q, Wang Z, Zhang X, Zhang S. A clustering linear combination approach to jointly analyze multiple phenotypes for GWAS. Bioinformatics 2020; 35:1373-1379. [PMID: 30239574 DOI: 10.1093/bioinformatics/bty810] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 08/29/2018] [Accepted: 09/18/2018] [Indexed: 12/16/2022] Open
Abstract
SUMMARY There is an increasing interest in joint analysis of multiple phenotypes for genome-wide association studies (GWASs) based on the following reasons. First, cohorts usually collect multiple phenotypes and complex diseases are usually measured by multiple correlated intermediate phenotypes. Second, jointly analyzing multiple phenotypes may increase statistical power for detecting genetic variants associated with complex diseases. Third, there is increasing evidence showing that pleiotropy is a widespread phenomenon in complex diseases. In this paper, we develop a clustering linear combination (CLC) method to jointly analyze multiple phenotypes for GWASs. In the CLC method, we first cluster individual statistics into positively correlated clusters and then, combine the individual statistics linearly within each cluster and combine the between-cluster terms in a quadratic form. CLC is not only robust to different signs of the means of individual statistics, but also reduce the degrees of freedom of the test statistic. We also theoretically prove that if we can cluster the individual statistics correctly, CLC is the most powerful test among all tests with certain quadratic forms. Our simulation results show that CLC is either the most powerful test or has similar power to the most powerful test among the tests we compared, and CLC is much more powerful than other tests when effect sizes align with inferred clusters. We also evaluate the performance of CLC through a real case study. AVAILABILITY AND IMPLEMENTATION R code for implementing our method is available at http://www.math.mtu.edu/∼shuzhang/software.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Xiao Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, USA
| |
Collapse
|
23
|
Li X, Zhang S, Sha Q. Joint analysis of multiple phenotypes using a clustering linear combination method based on hierarchical clustering. Genet Epidemiol 2020; 44:67-78. [PMID: 31541490 PMCID: PMC7480017 DOI: 10.1002/gepi.22263] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2019] [Revised: 07/19/2019] [Accepted: 08/28/2019] [Indexed: 12/24/2022]
Abstract
Emerging evidence suggests that a genetic variant can affect multiple phenotypes, especially in complex human diseases. Therefore, joint analysis of multiple phenotypes may offer new insights into disease etiology. Recently, many statistical methods have been developed for joint analysis of multiple phenotypes, including the clustering linear combination (CLC) method. Due to the unknown number of clusters for a given data, a simulation procedure must be used to evaluate the p-value of the final test statistic of CLC. This makes the CLC method computationally demanding. In this paper, we use a stopping criterion to determine the number of clusters in the CLC method. We have named our method, hierarchical clustering CLC (HCLC). HCLC has an asymptotic distribution, which is very computationally efficient and makes it applicable for genome-wide association studies. Extensive simulations together with the COPDGene data analysis have been used to assess the type I error rates and power of our proposed method. Our simulation results demonstrate that the type I error rates of HCLC are effectively controlled in different realistic settings. HCLC either outperforms all other methods or has statistical power that is very close to the most powerful method with which it has been compared.
Collapse
Affiliation(s)
- Xueling Li
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
24
|
Yuan X, Biswas S. Bivariate logistic Bayesian LASSO for detecting rare haplotype association with two correlated phenotypes. Genet Epidemiol 2019; 43:996-1017. [PMID: 31544985 DOI: 10.1002/gepi.22258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 07/31/2019] [Accepted: 08/09/2019] [Indexed: 11/08/2022]
Abstract
In genetic association studies, joint modeling of related traits/phenotypes can utilize the correlation between them and thereby provide more power and uncover additional information about genetic etiology. Moreover, detecting rare genetic variants are of current scientific interest as a key to missing heritability. Logistic Bayesian LASSO (LBL) has been proposed recently to detect rare haplotype variants using case-control data, that is, a single binary phenotype. As there is currently no haplotype association method that can handle multiple binary phenotypes, we extend LBL to fill this gap. We develop a bivariate model by using a latent variable to induce correlation between the two outcomes. We carry out extensive simulations to investigate the bivariate LBL and compare with the univariate LBL. The bivariate LBL performs better or similar to the univariate LBL in most settings. It has the highest gain in power when a haplotype is associated with both traits and it affects at least one trait in a direction opposite to the direction of the correlation between the traits. We analyze two data sets-Genetic Analysis Workshop 19 sequence data on systolic and diastolic blood pressures and a genome-wide association data set on lung cancer and smoking and detect several associated rare haplotypes.
Collapse
Affiliation(s)
- Xiaochen Yuan
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| | - Swati Biswas
- Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas
| |
Collapse
|
25
|
Joint Analysis of Multiple Phenotypes in Association Studies based on Cross-Validation Prediction Error. Sci Rep 2019; 9:1073. [PMID: 30705317 PMCID: PMC6355816 DOI: 10.1038/s41598-018-37538-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 11/19/2018] [Indexed: 01/28/2023] Open
Abstract
In genome-wide association studies (GWAS), joint analysis of multiple phenotypes could have increased statistical power over analyzing each phenotype individually to identify genetic variants that are associated with complex diseases. With this motivation, several statistical methods that jointly analyze multiple phenotypes have been developed, such as O’Brien’s method, Trait-based Association Test that uses Extended Simes procedure (TATES), multivariate analysis of variance (MANOVA), and joint model of multiple phenotypes (MultiPhen). However, the performance of these methods under a wide range of scenarios is not consistent: one test may be powerful in some situations, but not in the others. Thus, one challenge in joint analysis of multiple phenotypes is to construct a test that could maintain good performance across different scenarios. In this article, we develop a novel statistical method to test associations between a genetic variant and Multiple Phenotypes based on cross-validation Prediction Error (MultP-PE). Extensive simulations are conducted to evaluate the type I error rates and to compare the power performance of MultP-PE with various existing methods. The simulation studies show that MultP-PE controls type I error rates very well and has consistently higher power than the tests we compared in all simulation scenarios. We conclude with the recommendation for the use of MultP-PE for its good performance in association studies with multiple phenotypes.
Collapse
|
26
|
Leppäaho E, Renvall H, Salmela E, Kere J, Salmelin R, Kaski S. Discovering heritable modes of MEG spectral power. Hum Brain Mapp 2019; 40:1391-1402. [PMID: 30600573 PMCID: PMC6590382 DOI: 10.1002/hbm.24454] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 09/27/2018] [Accepted: 10/19/2018] [Indexed: 12/14/2022] Open
Abstract
Brain structure and many brain functions are known to be genetically controlled, but direct links between neuroimaging measures and their underlying cellular-level determinants remain largely undiscovered. Here, we adopt a novel computational method for examining potential similarities in high-dimensional brain imaging data between siblings. We examine oscillatory brain activity measured with magnetoencephalography (MEG) in 201 healthy siblings and apply Bayesian reduced-rank regression to extract a low-dimensional representation of familial features in the participants' spectral power structure. Our results show that the structure of the overall spectral power at 1-90 Hz is a highly conspicuous feature that not only relates siblings to each other but also has very high consistency within participants' own data, irrespective of the exact experimental state of the participant. The analysis is extended by seeking genetic associations for low-dimensional descriptions of the oscillatory brain activity. The observed variability in the MEG spectral power structure was associated with SDK1 (sidekick cell adhesion molecule 1) and suggestively with several other genes that function, for example, in brain development. The current results highlight the potential of sophisticated computational methods in combining molecular and neuroimaging levels for exploring brain functions, even for high-dimensional data limited to a few hundred participants.
Collapse
Affiliation(s)
- Eemeli Leppäaho
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| | - Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Helsinki, Finland.,Aalto NeuroImaging, Aalto University, Helsinki, Finland
| | - Elina Salmela
- Department of Biosciences, University of Helsinki, Helsinki, Finland
| | - Juha Kere
- Molecular Neurology Research Program, University of Helsinki, Folkhälsan Institute of Genetics, Helsinki, Finland.,Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.,School of Basic and Medical Biosciences, King's College London, Guy's Hospital, London, United Kingdom
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Helsinki, Finland.,Aalto NeuroImaging, Aalto University, Helsinki, Finland
| | - Samuel Kaski
- Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Helsinki, Finland
| |
Collapse
|
27
|
PCA-Based Multiple-Trait GWAS Analysis: A Powerful Model for Exploring Pleiotropy. Animals (Basel) 2018; 8:ani8120239. [PMID: 30562943 PMCID: PMC6316348 DOI: 10.3390/ani8120239] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 11/26/2018] [Accepted: 11/28/2018] [Indexed: 11/16/2022] Open
Abstract
Principal component analysis (PCA) is a potential approach that can be applied in multiple-trait genome-wide association studies (GWAS) to explore pleiotropy, as well as increase the power of quantitative trait loci (QTL) detection. In this study, the relationship of test single nucleotide polymorphisms (SNPs) was determined between single-trait GWAS and PCA-based GWAS. We found that the estimated pleiotropic quantitative trait nucleotides (QTNs) β * ^ were in most cases larger than the single-trait model estimations ( β 1 ^ and β 2 ^ ). Analysis using the simulated data showed that PCA-based multiple-trait GWAS has improved statistical power for detecting QTL compared to single-trait GWAS. For the minor allele frequency (MAF), when the MAF of QTNs was greater than 0.2, the PCA-based model had a significant advantage in detecting the pleiotropic QTNs, but when its MAF was reduced from 0.2 to 0, the advantage began to disappear. In addition, as the linkage disequilibrium (LD) of the pleiotropic QTNs decreased, its detection ability declined in the co-localization effect model. Furthermore, on the real data of 1141 Simmental cattle, we applied the PCA model to the multiple-trait GWAS analysis and identified a QTL that was consistent with a candidate gene, MCHR2, which was associated with presoma muscle development in cattle. In summary, PCA-based multiple-trait GWAS is an efficient model for exploring pleiotropic QTNs in quantitative traits.
Collapse
|
28
|
Zhang H, Liu D, Zhao J, Bi X. Modeling Hybrid Traits for Comorbidity and Genetic Studies of Alcohol and Nicotine Co-Dependence. Ann Appl Stat 2018; 12:2359-2378. [PMID: 30666272 PMCID: PMC6338437 DOI: 10.1214/18-aoas1156] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
We propose a novel multivariate model for analyzing hybrid traits and identifying genetic factors for comorbid conditions. Comorbidity is a common phenomenon in mental health in which an individual suffers from multiple disorders simultaneously. For example, in the Study of Addiction: Genetics and Environment (SAGE), alcohol and nicotine addiction were recorded through multiple assessments that we refer to as hybrid traits. Statistical inference for studying the genetic basis of hybrid traits has not been well-developed. Recent rank-based methods have been utilized for conducting association analyses of hybrid traits but do not inform the strength or direction of effects. To overcome this limitation, a parametric modeling framework is imperative. Although such parametric frameworks have been proposed in theory, they are neither well-developed nor extensively used in practice due to their reliance on complicated likelihood functions that have high computational complexity. Many existing parametric frameworks tend to instead use pseudo-likelihoods to reduce computational burdens. Here, we develop a model fitting algorithm for the full likelihood. Our extensive simulation studies demonstrate that inference based on the full likelihood can control the type-I error rate, and gains power and improves the effect size estimation when compared with several existing methods for hybrid models. These advantages remain even if the distribution of the latent variables is misspecified. After analyzing the SAGE data, we identify three genetic variants (rs7672861, rs958331, rs879330) that are significantly associated with the comorbidity of alcohol and nicotine addiction at the chromosome-wide level. Moreover, our approach has greater power in this analysis than several existing methods for hybrid traits.Although the analysis of the SAGE data motivated us to develop the model, it can be broadly applied to analyze any hybrid responses.
Collapse
Affiliation(s)
- Heping Zhang
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Dungang Liu
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Jiwei Zhao
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Xuan Bi
- Heping Zhang is Susan Dwight Bliss Professor , Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520; Dungang Liu is Assistant Professor , Department of Operations, Business Analytics and Information Systems, University of Cincinnati Lindner College of Business, Cincinnati, OH 45221; Jiwei Zhao is Assistant Professor , Department of Biostatistics, State University of New York at Buffalo, Buffalo, NY 14214; and Xuan Bi is Postdoctoral Associate, Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
29
|
Liang X, Sha Q, Zhang S. Joint analysis of multiple phenotypes in association studies using allele-based clustering approach for non-normal distributions. Ann Hum Genet 2018; 82:389-395. [PMID: 29932453 PMCID: PMC6188849 DOI: 10.1111/ahg.12260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 03/15/2018] [Accepted: 05/11/2018] [Indexed: 11/29/2022]
Abstract
In the study of complex diseases, several correlated phenotypes are usually measured. There is also increasing evidence showing that testing the association between a single-nucleotide polymorphism (SNP) and multiple-dependent phenotypes jointly is often more powerful than analyzing only one phenotype at a time. Therefore, developing statistical methods to test for genetic association with multiple phenotypes has become increasingly important. In this paper, we develop an Allele-based Clustering Approach (ACA) for the joint analysis of multiple non-normal phenotypes in association studies. In ACA, we consider the alleles at a SNP of interest as a dependent variable with two classes, and the correlated phenotypes as predictors to predict the alleles at the SNP of interest. We perform extensive simulation studies to evaluate the performance of ACA and compare the power of ACA with the powers of Adaptive Fisher's Combination test, Trait-based Association Test that uses Extended Simes procedure, Fisher's Combination test, the standard MANOVA, and the joint model of Multiple Phenotypes. Our simulation studies show that the proposed method has correct type I error rates and is much more powerful than other methods for some non-normal distributions.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
| |
Collapse
|
30
|
Moon S, Lee Y, Won S, Lee J. Multiple genotype-phenotype association study reveals intronic variant pair on SIDT2 associated with metabolic syndrome in a Korean population. Hum Genomics 2018; 12:48. [PMID: 30382898 PMCID: PMC6211397 DOI: 10.1186/s40246-018-0180-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 10/08/2018] [Indexed: 12/14/2022] Open
Abstract
Background Metabolic syndrome is a risk factor for type 2 diabetes and cardiovascular disease. We identified common genetic variants that alter the risk for metabolic syndrome in the Korean population. To isolate these variants, we conducted a multiple-genotype and multiple-phenotype genome-wide association analysis using the family-based quasi-likelihood score (MFQLS) test. For this analysis, we used 7211 and 2838 genotyped study subjects for discovery and replication, respectively. We also performed a multiple-genotype and multiple-phenotype analysis of a gene-based single-nucleotide polymorphism (SNP) set. Results We found an association between metabolic syndrome and an intronic SNP pair, rs7107152 and rs1242229, in SIDT2 gene at 11q23.3. Both SNPs correlate with the expression of SIDT2 and TAGLN, whose products promote insulin secretion and lipid metabolism, respectively. This SNP pair showed statistical significance at the replication stage. Conclusions Our findings provide insight into an underlying mechanism that contributes to metabolic syndrome. Electronic supplementary material The online version of this article (10.1186/s40246-018-0180-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sanghoon Moon
- Division of Genome Research, Center for Genome Science, Korea National Institute of Health, Cheongju, Chungcheongbuk-do, 28159, South Korea
| | - Young Lee
- Division of Genome Research, Center for Genome Science, Korea National Institute of Health, Cheongju, Chungcheongbuk-do, 28159, South Korea.,Veterans Medical Research Institute, Veterans Health Service Medical Center, Seoul, 05368, South Korea
| | - Sungho Won
- Department of Public Health Science, Seoul National University, Seoul, 08826, South Korea
| | - Juyoung Lee
- Division of Genome Research, Center for Genome Science, Korea National Institute of Health, Cheongju, Chungcheongbuk-do, 28159, South Korea.
| |
Collapse
|
31
|
Qi G, Chatterjee N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLoS Genet 2018; 14:e1007549. [PMID: 30289880 PMCID: PMC6192650 DOI: 10.1371/journal.pgen.1007549] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2017] [Revised: 10/17/2018] [Accepted: 07/09/2018] [Indexed: 12/31/2022] Open
Abstract
Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.
Collapse
Affiliation(s)
- Guanghao Qi
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America
- Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| |
Collapse
|
32
|
Wang Z, Sha Q, Fang S, Zhang K, Zhang S. Testing an optimally weighted combination of common and/or rare variants with multiple traits. PLoS One 2018; 13:e0201186. [PMID: 30048520 PMCID: PMC6062080 DOI: 10.1371/journal.pone.0201186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 07/10/2018] [Indexed: 12/25/2022] Open
Abstract
Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Zhenchuan Wang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shurong Fang
- Department of Mathematics and Computer Science, John Carroll University, University Heights, Ohio, United States of America
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
33
|
Hackinger S, Zeggini E. Statistical methods to detect pleiotropy in human complex traits. Open Biol 2018; 7:rsob.170125. [PMID: 29093210 PMCID: PMC5717338 DOI: 10.1098/rsob.170125] [Citation(s) in RCA: 78] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 09/29/2017] [Indexed: 12/13/2022] Open
Abstract
In recent years pleiotropy, the phenomenon of one genetic locus influencing several traits, has become a widely researched field in human genetics. With the increasing availability of genome-wide association study summary statistics, as well as the establishment of deeply phenotyped sample collections, it is now possible to systematically assess the genetic overlap between multiple traits and diseases. In addition to increasing power to detect associated variants, multi-trait methods can also aid our understanding of how different disorders are aetiologically linked by highlighting relevant biological pathways. A plethora of available tools to perform such analyses exists, each with their own advantages and limitations. In this review, we outline some of the currently available methods to conduct multi-trait analyses. First, we briefly introduce the concept of pleiotropy and outline the current landscape of pleiotropy research in human genetics; second, we describe analytical considerations and analysis methods; finally, we discuss future directions for the field.
Collapse
|
34
|
Gao X, Liu J, Gong P, Wang J, Fang W, Yan H, Zhu L, Zhou X. Identifying new susceptibility genes on dopaminergic and serotonergic pathways for the framing effect in decision-making. Soc Cogn Affect Neurosci 2018; 12:1534-1544. [PMID: 28431168 PMCID: PMC5629826 DOI: 10.1093/scan/nsx062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 04/17/2017] [Indexed: 01/03/2023] Open
Abstract
The framing effect refers the tendency to be risk-averse when options are presented positively but be risk-seeking when the same options are presented negatively during decision-making. This effect has been found to be modulated by the serotonin transporter gene (SLC6A4) and the catechol-o-methyltransferase gene (COMT) polymorphisms, which are on the dopaminergic and serotonergic pathways and which are associated with affective processing. The current study aimed to identify new genetic variations of genes on dopaminergic and serotonergic pathways that may contribute to individual differences in the susceptibility to framing. Using genome-wide association data and the gene-based principal components regression method, we examined genetic variations of 26 genes on the pathways in 1317 Chinese Han participants. Consistent with previous studies, we found that the genetic variations of the SLC6A4 gene and the COMT gene were associated with the framing effect. More importantly, we demonstrated that the genetic variations of the aromatic-L-amino-acid decarboxylase (DDC) gene, which is involved in the synthesis of both dopamine and serotonin, contributed to individual differences in the susceptibility to framing. Our findings shed light on the understanding of the genetic basis of affective decision-making.
Collapse
Affiliation(s)
- Xiaoxue Gao
- Center for Brain and Cognitive Sciences.,School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China
| | - Jinting Liu
- China Center for Special Economic Zone Research.,Research Centre for Brain Function and Psychological Science, Shenzhen University, Guangdong 518060, China
| | - Pingyuan Gong
- Key Laboratory of Resource Biology and Biotechnology in Western China (Ministry of Education), Northwest University, Shaanxi 710069, China
| | - Junhui Wang
- Research Institute of Educational Technology, South China Normal University, Guangdong 510631, China
| | - Wan Fang
- Peking-Tsinghua Center for Life Sciences.,School of Life Sciences
| | - Hongming Yan
- Peking-Tsinghua Center for Life Sciences.,School of Life Sciences
| | - Lusha Zhu
- Center for Brain and Cognitive Sciences.,Peking-Tsinghua Center for Life Sciences.,PKU-IDG/McGovern Institute for Brain Research
| | - Xiaolin Zhou
- Center for Brain and Cognitive Sciences.,School of Psychological and Cognitive Sciences, Peking University, Beijing 100871, China.,PKU-IDG/McGovern Institute for Brain Research.,Key Laboratory of Machine Perception (Ministry of Education).,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, 100871, China
| |
Collapse
|
35
|
Guo X, Zhu J, Fan Q, He M, Wang X, Zhang H. A univariate perspective of multivariate genome-wide association analysis. Genet Epidemiol 2018; 42:470-479. [PMID: 29781551 DOI: 10.1002/gepi.22128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 03/26/2018] [Accepted: 03/30/2018] [Indexed: 01/11/2023]
Abstract
Multiple correlated phenotypes are frequently collected in genome-wide association studies (GWASs), and a systematic, simultaneous analysis of multiple phenotypes can integrate the signals from single phenotypes, therefore increasing the power of detecting genetic signals. However, fundamental questions remain open, including the conditions and reasons under which the multivariate analysis is beneficial, how a highly significant signal arises in the multivariate analysis. To understand these issues, we propose to decompose the multivariate model into a series of simple univariate models. This transformation offers a clearer quantitative analysis of the circumstances under which a multivariate approach can be beneficial for the bivariate phenotypes case. A real data analysis is employed to illustrate how to interpret how the signals arising from multivariate GWASs.
Collapse
Affiliation(s)
- Xiaobo Guo
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, Melbourne, Victoria, Australia
| | - Junxian Zhu
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China
| | - Qiao Fan
- DUKE-National University of Singapore Graduate Medical School, Singapore, Singapore
| | - Mingguang He
- Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, Melbourne, Victoria, Australia.,State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-Sen University, Guangzhou, China
| | - Xueqin Wang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
| | - Heping Zhang
- Department of Statistical Science, School of Mathematics, Sun Yat-Sen University, Guangzhou, China.,Southern China Center for Statistical Science, Sun Yat-Sen University, Guangzhou, China.,Department of Biostatistics, Yale University School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
36
|
Liang X, Sha Q, Rho Y, Zhang S. A hierarchical clustering method for dimension reduction in joint analysis of multiple phenotypes. Genet Epidemiol 2018; 42:344-353. [PMID: 29682782 DOI: 10.1002/gepi.22124] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 02/01/2018] [Accepted: 02/19/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide association studies (GWAS) have become a very effective research tool to identify genetic variants of underlying various complex diseases. In spite of the success of GWAS in identifying thousands of reproducible associations between genetic variants and complex disease, in general, the association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis, and can shed new light on underlying biological mechanisms of complex diseases. In this paper, we develop a novel variable reduction method using hierarchical clustering method (HCM) for joint analysis of multiple phenotypes in association studies. The proposed method involves two steps. The first step applies a dimension reduction technique by using a representative phenotype for each cluster of phenotypes. Then, existing methods are used in the second step to test the association between genetic variants and the representative phenotypes rather than the individual phenotypes. We perform extensive simulation studies to compare the powers of multivariate analysis of variance (MANOVA), joint model of multiple phenotypes (MultiPhen), and trait-based association test that uses extended simes procedure (TATES) using HCM with those of without using HCM. Our simulation studies show that using HCM is more powerful than without using HCM in most scenarios. We also illustrate the usefulness of using HCM by analyzing a whole-genome genotyping data from a lung function study.
Collapse
Affiliation(s)
- Xiaoyu Liang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Yeonwoo Rho
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| |
Collapse
|
37
|
Salinas YD, Wang Z, DeWan AT. Statistical Analysis of Multiple Phenotypes in Genetic Epidemiologic Studies: From Cross-Phenotype Associations to Pleiotropy. Am J Epidemiol 2018; 187:855-863. [PMID: 29020254 DOI: 10.1093/aje/kwx296] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 08/03/2017] [Indexed: 12/15/2022] Open
Abstract
In the context of genetics, pleiotropy refers to the phenomenon in which a single genetic locus affects more than 1 trait or disease. Genetic epidemiologic studies have identified loci associated with multiple phenotypes, and these cross-phenotype associations are often incorrectly interpreted as examples of pleiotropy. Pleiotropy is only one possible explanation for cross-phenotype associations. Cross-phenotype associations may also arise due to issues related to study design, confounder bias, or nongenetic causal links between the phenotypes under analysis. Therefore, it is necessary to dissect cross-phenotype associations carefully to uncover true pleiotropic loci. In this review, we describe statistical methods that can be used to identify robust statistical evidence of pleiotropy. First, we provide an overview of univariate and multivariate methods for discovery of cross-phenotype associations and highlight important considerations for choosing among available methods. Then, we describe how to dissect cross-phenotype associations by using mediation analysis. Pleiotropic loci provide insights into the mechanistic underpinnings of disease comorbidity, and they may serve as novel targets for interventions that simultaneously treat multiple diseases. Discerning between different types of cross-phenotype associations is necessary to realize the public health potential of pleiotropic loci.
Collapse
Affiliation(s)
- Yasmmyn D Salinas
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut
| | - Andrew T DeWan
- Department of Chronic Disease Epidemiology, Yale School of Public Health, New Haven, Connecticut
| |
Collapse
|
38
|
Fast and Accurate Genome-Wide Association Test of Multiple Quantitative Traits. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:2564531. [PMID: 29743933 PMCID: PMC5878919 DOI: 10.1155/2018/2564531] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Accepted: 01/24/2018] [Indexed: 02/03/2023]
Abstract
Multiple correlated traits are often collected in genetic studies. By jointly analyzing multiple traits, we can increase power by aggregating multiple weak effects and reveal additional insights into the genetic architecture of complex human diseases. In this article, we propose a multivariate linear regression-based method to test the joint association of multiple quantitative traits. It is flexible to accommodate any covariates, has very accurate control of type I errors, and offers very competitive performance. We also discuss fast and accurate significance p value computation especially for genome-wide association studies with small-to-medium sample sizes. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to genome-wide association analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) study. We found some very interesting associations with diabetes traits which have not been reported before. We implemented the proposed methods in a publicly available R package.
Collapse
|
39
|
Chen L, Wang Y, Zhou Y. Association analysis of multiple traits by an approach of combining
$$P$$
P
values. J Genet 2018. [DOI: 10.1007/s12041-018-0885-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
40
|
Association analysis of rare and common variants with multiple traits based on variable reduction method. Genet Res (Camb) 2018; 100:e2. [PMID: 29386084 DOI: 10.1017/s0016672317000052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Pleiotropy, the effect of one variant on multiple traits, is widespread in complex diseases. Joint analysis of multiple traits can improve statistical power to detect genetic variants and uncover the underlying genetic mechanism. Currently, a large number of existing methods target one common variant or only rare variants. Increasing evidence shows that complex diseases are caused by common and rare variants. Here we propose a region-based method to test both rare and common variant associated multiple traits based on variable reduction method (abbreviated as MULVR). However, in the presence of noise traits, the MULVR method may lose power, so we propose the MULVR-O method, which jointly analyses the optimal number of traits associated with genetic variants by the MULVR method, to guard against the effect of noise traits. Extensive simulation studies show that our proposed method (MULVR-O) is applied to not only multiple quantitative traits but also qualitative traits, and is more powerful than several other comparison methods in most scenarios. An application to the two genes (SHBG and CHRM3) and two phenotypes (systolic blood pressure and diastolic blood pressure) from the GAW19 dataset illustrates that our proposed methods (MULVR and MULVR-O) are feasible and efficient as a region-based method.
Collapse
|
41
|
Zhu H, Zhang S, Sha Q. A novel method to test associations between a weighted combination of phenotypes and genetic variants. PLoS One 2018; 13:e0190788. [PMID: 29329304 PMCID: PMC5766098 DOI: 10.1371/journal.pone.0190788] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2017] [Accepted: 12/20/2017] [Indexed: 11/18/2022] Open
Abstract
Many complex diseases like diabetes, hypertension, metabolic syndrome, et cetera, are measured by multiple correlated phenotypes. However, most genome-wide association studies (GWAS) focus on one phenotype of interest or study multiple phenotypes separately for identifying genetic variants associated with complex diseases. Analyzing one phenotype or the related phenotypes separately may lose power due to ignoring the information obtained by combining phenotypes, such as the correlation between phenotypes. In order to increase statistical power to detect genetic variants associated with complex diseases, we develop a novel method to test a weighted combination of multiple phenotypes (WCmulP). We perform extensive simulation studies as well as real data (COPDGene) analysis to evaluate the performance of the proposed method. Our simulation results show that WCmulP has correct type I error rates and is either the most powerful test or comparable to the most powerful test among the methods we compared. WCmulP also has an outstanding performance for identifying single-nucleotide polymorphisms (SNPs) associated with COPD-related phenotypes.
Collapse
Affiliation(s)
- Huanhuan Zhu
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
- * E-mail:
| |
Collapse
|
42
|
Wang L, Damrauer SM, Zhang H, Zhang AX, Xiao R, Moore JH, Chen J. Phenotype validation in electronic health records based genetic association studies. Genet Epidemiol 2017; 41:790-800. [PMID: 29023970 DOI: 10.1002/gepi.22080] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 06/30/2017] [Accepted: 08/01/2017] [Indexed: 12/13/2022]
Abstract
The linkage between electronic health records (EHRs) and genotype data makes it plausible to study the genetic susceptibility of a wide range of disease phenotypes. Despite that EHR-derived phenotype data are subjected to misclassification, it has been shown useful for discovering susceptible genes, particularly in the setting of phenome-wide association studies (PheWAS). It is essential to characterize discovered associations using gold standard phenotype data by chart review. In this work, we propose a genotype stratified case-control sampling strategy to select subjects for phenotype validation. We develop a closed-form maximum-likelihood estimator for the odds ratio parameters and a score statistic for testing genetic association using the combined validated and error-prone EHR-derived phenotype data, and assess the extent of power improvement provided by this approach. Compared with case-control sampling based only on EHR-derived phenotype data, our genotype stratified strategy maintains nominal type I error rates, and result in higher power for detecting associations. It also corrects the bias in the odds ratio parameter estimates, and reduces the corresponding variance especially when the minor allele frequency is small.
Collapse
Affiliation(s)
- Lu Wang
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Scott M Damrauer
- Division of Vascular Surgery and Endovascular Therapy, Hospital of the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Department of Surgery, Corporal Michael Crescenz VA Medical Center, Philadelphia, Pennsylvania, United States of America
| | - Hong Zhang
- Institute of Biostatistics, Fudan University, Shanghai, P.R. China
| | - Alan X Zhang
- Sidwell Friends School, Washington, DC, United States of America
| | - Rui Xiao
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jason H Moore
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.,Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jinbo Chen
- Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
43
|
Xiang R, MacLeod IM, Bolormaa S, Goddard ME. Genome-wide comparative analyses of correlated and uncorrelated phenotypes identify major pleiotropic variants in dairy cattle. Sci Rep 2017; 7:9248. [PMID: 28835686 PMCID: PMC5569018 DOI: 10.1038/s41598-017-09788-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 07/31/2017] [Indexed: 11/10/2022] Open
Abstract
While single nucleotide polymorphisms (SNPs) associated with multiple phenotype have been reported, the knowledge of pleiotropy of uncorrelated phenotype is minimal. Principal components (PCs) and uncorrelated Cholesky transformed traits (CT) were constructed using 25 raw traits (RTs) of 2841 dairy bulls. Multi-trait meta-analyses of single-trait genome-wide association studies for RT, PC and CT in bulls were validated in 6821 cows. Most PCs and CTs had substantial estimates of heritability, suggesting that genes affect phenotype via diverse pathways. Phenotypic orthogonalizations did not eliminate pleiotropy: the meta-analysis achieved an agreement of significant pleiotropic SNPs (p < 1 × 10-5, n = 368) between RTs (416), PCs (466) and CTs (425). From this overlap we identified 21 lead SNPs with 100% validation rate containing two clusters: one consisted of DGAT1 (chr14:1.8 M+), MGST1 (chr5:93 M+), PAEP (chr11:103 M+) and GPAT4 (chr27:36 M+) affecting protein, milk and fat yield and the other included CSN2 (chr6:87 M+), MUC1 (chr3:15.6 M), GHR (chr20:31.2 M+) and SDC2 (chr14:70 M+) affecting protein and milk yield. Combining beef cattle data identified correlated SNPs representing CAPN1 (chr29:44 M+) and CAST (chr 7:96 M+) loci affecting beef tenderness, showing pleiotropic effects in dairy cattle. Our findings show that SNPs with a large effect on one trait are likely to have small effects on other uncorrelated traits.
Collapse
Affiliation(s)
- Ruidong Xiang
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, Victoria, 3010, Australia.
- AgriBio, Department Economic Development, Jobs, Transport & Resources, Bundoora, Victoria, 3083, Australia.
| | - Iona M MacLeod
- AgriBio, Department Economic Development, Jobs, Transport & Resources, Bundoora, Victoria, 3083, Australia
| | - Sunduimijid Bolormaa
- AgriBio, Department Economic Development, Jobs, Transport & Resources, Bundoora, Victoria, 3083, Australia
- Cooperative Research Centre for Sheep Industry Innovation, Armidale, NSW 2351, Australia
| | - Michael E Goddard
- Faculty of Veterinary & Agricultural Science, University of Melbourne, Parkville, Victoria, 3010, Australia
- AgriBio, Department Economic Development, Jobs, Transport & Resources, Bundoora, Victoria, 3083, Australia
| |
Collapse
|
44
|
Zhang W, Yang L, Tang LL, Liu A, Mills JL, Sun Y, Li Q. GATE: an efficient procedure in study of pleiotropic genetic associations. BMC Genomics 2017; 18:552. [PMID: 28732532 PMCID: PMC5521155 DOI: 10.1186/s12864-017-3928-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 07/06/2017] [Indexed: 11/10/2022] Open
Abstract
Background The association studies on human complex traits are admittedly propitious to identify deleterious genetic markers. Compared to single-trait analyses, multiple-trait analyses can arguably make better use of the information on both traits and markers, and thus improve statistical power of association tests prominently. Principal component analysis (PCA) is a well-known useful tool in multivariate analysis and can be applied to this task. Generally, PCA is first performed on all traits and then a certain number of top principal components (PCs) that explain most of the trait variations are selected to construct the test statistics. However, under some situations, only utilizing these top PCs would lead to a loss of important evidences from discarded PCs and thus makes the capability compromised. Methods To overcome this drawback while keeping the advantages of using the top PCs, we propose a group accumulated test evidence (GATE) procedure. By dividing the PCs which is sorted in the descending order according to the corresponding eigenvalues into a few groups, GATE integrates the information of traits at the group level. Results Simulation studies demonstrate the superiority of the proposed approach over several existing methods in terms of statistical power. Sometimes, the increase of power can reach 25%. These methods are further illustrated using the Heterogeneous Stock Mice data which is collected from a quantitative genome-wide association study. Conclusions Overall, GATE provides a powerful test for pleiotropic genetic associations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3928-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Zhang
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | - Liu Yang
- College of Geoscience and Surveying Engineering, China University of Mining and Technology, Beijing, China
| | - Larry L Tang
- Department of Statistics, George Mason University, Fairfax, VA, USA.,Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, USA
| | - Aiyi Liu
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - James L Mills
- Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Yuanchang Sun
- Department of Mathematics and Statistics, Florida International University, Miami, FL, USA
| | - Qizhai Li
- Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
45
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|
46
|
Verhulst B, Maes HH, Neale MC. GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling. Behav Genet 2017; 47:345-359. [PMID: 28299468 DOI: 10.1007/s10519-017-9842-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Accepted: 02/17/2017] [Indexed: 12/12/2022]
Abstract
Improving the accuracy of phenotyping through the use of advanced psychometric tools will increase the power to find significant associations with genetic variants and expand the range of possible hypotheses that can be tested on a genome-wide scale. Multivariate methods, such as structural equation modeling (SEM), are valuable in the phenotypic analysis of psychiatric and substance use phenotypes, but these methods have not been integrated into standard genome-wide association analyses because fitting a SEM at each single nucleotide polymorphism (SNP) along the genome was hitherto considered to be too computationally demanding. By developing a method that can efficiently fit SEMs, it is possible to expand the set of models that can be tested. This is particularly necessary in psychiatric and behavioral genetics, where the statistical methods are often handicapped by phenotypes with large components of stochastic variance. Due to the enormous amount of data that genome-wide scans produce, the statistical methods used to analyze the data are relatively elementary and do not directly correspond with the rich theoretical development, and lack the potential to test more complex hypotheses about the measurement of, and interaction between, comorbid traits. In this paper, we present a method to test the association of a SNP with multiple phenotypes or a latent construct on a genome-wide basis using a diagonally weighted least squares (DWLS) estimator for four common SEMs: a one-factor model, a one-factor residuals model, a two-factor model, and a latent growth model. We demonstrate that the DWLS parameters and p-values strongly correspond with the more traditional full information maximum likelihood parameters and p-values. We also present the timing of simulations and power analyses and a comparison with and existing multivariate GWAS software package.
Collapse
Affiliation(s)
- Brad Verhulst
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA.
| | - Hermine H Maes
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| | - Michael C Neale
- Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, USA
| |
Collapse
|
47
|
Kim J, Pan W. Adaptive testing for multiple traits in a proportional odds model with applications to detect SNP-brain network associations. Genet Epidemiol 2017; 41:259-277. [PMID: 28191669 DOI: 10.1002/gepi.22033] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 10/07/2016] [Accepted: 10/31/2016] [Indexed: 12/15/2022]
Abstract
There has been increasing interest in developing more powerful and flexible statistical tests to detect genetic associations with multiple traits, as arising from neuroimaging genetic studies. Most of existing methods treat a single trait or multiple traits as response while treating an SNP as a predictor coded under an additive inheritance mode. In this paper, we follow an earlier approach in treating an SNP as an ordinal response while treating traits as predictors in a proportional odds model (POM). In this way, it is not only easier to handle mixed types of traits, e.g., some quantitative and some binary, but it is also potentially more robust to the commonly adopted additive inheritance mode. More importantly, we develop an adaptive test in a POM so that it can maintain high power across many possible situations. Compared to the existing methods treating multiple traits as responses, e.g., in a generalized estimating equation (GEE) approach, the proposed method can be applied to a high dimensional setting where the number of phenotypes (p) can be larger than the sample size (n), in addition to a usual small P setting. The promising performance of the proposed method was demonstrated with applications to the Alzheimer's Disease Neuroimaging Initiative (ADNI) data, in which either structural MRI driven phenotypes or resting-state functional MRI (rs-fMRI) derived brain functional connectivity measures were used as phenotypes. The applications led to the identification of several top SNPs of biological interest. Furthermore, simulation studies showed competitive performance of the new method, especially for p>n.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | -
- Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http: //adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf
| |
Collapse
|
48
|
Lutz SM, Fingerlin TE, Hokanson JE, Lange C. A general approach to testing for pleiotropy with rare and common variants. Genet Epidemiol 2017; 41:163-170. [PMID: 27900789 PMCID: PMC5472207 DOI: 10.1002/gepi.22011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 08/01/2016] [Accepted: 09/19/2016] [Indexed: 12/22/2022]
Abstract
Through genome-wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes. However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET-B) has been proposed to specifically test for pleiotropy (i.e., that two normally distributed phenotypes are both associated with the single nucleotide polymorphism of interest). Although the method examines the genetic overlap between the two quantitative phenotypes, the extension to binary phenotypes, three or more phenotypes, and rare variants is not straightforward. We provide two approaches to formally test this pleiotropic relationship in multiple scenarios. These approaches depend on permuting the phenotypes of interest and comparing the set of observed P-values to the set of permuted P-values in relation to the origin (e.g., a vector of zeros) either using the Hausdorff metric or a cutoff-based approach. These approaches are appropriate for categorical and quantitative phenotypes, more than two phenotypes, common variants and rare variants. We evaluate these approaches under various simulation scenarios and apply them to the COPDGene study, a case-control study of chronic obstructive pulmonary disease in current and former smokers.
Collapse
Affiliation(s)
- Sharon M Lutz
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Tasha E Fingerlin
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
49
|
Mägi R, Suleimanov YV, Clarke GM, Kaakinen M, Fischer K, Prokopenko I, Morris AP. SCOPA and META-SCOPA: software for the analysis and aggregation of genome-wide association studies of multiple correlated phenotypes. BMC Bioinformatics 2017; 18:25. [PMID: 28077070 PMCID: PMC5225593 DOI: 10.1186/s12859-016-1437-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2016] [Accepted: 12/17/2016] [Indexed: 11/10/2022] Open
Abstract
Background Genome-wide association studies (GWAS) of single nucleotide polymorphisms (SNPs) have been successful in identifying loci contributing genetic effects to a wide range of complex human diseases and quantitative traits. The traditional approach to GWAS analysis is to consider each phenotype separately, despite the fact that many diseases and quantitative traits are correlated with each other, and often measured in the same sample of individuals. Multivariate analyses of correlated phenotypes have been demonstrated, by simulation, to increase power to detect association with SNPs, and thus may enable improved detection of novel loci contributing to diseases and quantitative traits. Results We have developed the SCOPA software to enable GWAS analysis of multiple correlated phenotypes. The software implements “reverse regression” methodology, which treats the genotype of an individual at a SNP as the outcome and the phenotypes as predictors in a general linear model. SCOPA can be applied to quantitative traits and categorical phenotypes, and can accommodate imputed genotypes under a dosage model. The accompanying META-SCOPA software enables meta-analysis of association summary statistics from SCOPA across GWAS. Application of SCOPA to two GWAS of high-and low-density lipoprotein cholesterol, triglycerides and body mass index, and subsequent meta-analysis with META-SCOPA, highlighted stronger association signals than univariate phenotype analysis at established lipid and obesity loci. The META-SCOPA meta-analysis also revealed a novel signal of association at genome-wide significance for triglycerides mapping to GPC5 (lead SNP rs71427535, p = 1.1x10−8), which has not been reported in previous large-scale GWAS of lipid traits. Conclusions The SCOPA and META-SCOPA software enable discovery and dissection of multiple phenotype association signals through implementation of a powerful reverse regression approach.
Collapse
Affiliation(s)
- Reedik Mägi
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Yury V Suleimanov
- Computation-based Science and Technology Research Center, Cyprus Institute, Nicosia, Cyprus.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Geraldine M Clarke
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | | | - Krista Fischer
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | | | - Andrew P Morris
- Estonian Genome Center, University of Tartu, Tartu, Estonia. .,Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. .,Department of Biostatistics, University of Liverpool, Liverpool, UK.
| |
Collapse
|
50
|
Abstract
For over a decade, genome-wide association studies (GWAS) have been a major tool for detecting genetic variants underlying complex traits. Recent studies have demonstrated that the same variant or gene can be associated with multiple traits, and such associations are termed cross-phenotype (CP) associations. CP association analysis can improve statistical power by searching for variants that contribute to multiple traits, which is often relevant to pleiotropy. In this chapter, we discuss existing statistical methods for analyzing association between a single marker and multivariate phenotypes, we introduce a general approach, CPASSOC, to detect the CP associations, and explain how to conduct the analysis in practice.
Collapse
Affiliation(s)
- Xiaoyin Li
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA.
| | - Xiaofeng Zhu
- Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| |
Collapse
|