1
|
Wang P, Xu X, Li M, Lou XY, Xu S, Wu B, Gao G, Yin P, Liu N. Gene-based association tests in family samples using GWAS summary statistics. Genet Epidemiol 2024; 48:103-113. [PMID: 38317324 DOI: 10.1002/gepi.22548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 11/18/2023] [Accepted: 01/08/2024] [Indexed: 02/07/2024]
Abstract
Genome-wide association studies (GWAS) have led to rapid growth in detecting genetic variants associated with various phenotypes. Owing to a great number of publicly accessible GWAS summary statistics, and the difficulty in obtaining individual-level genotype data, many existing gene-based association tests have been adapted to require only GWAS summary statistics rather than individual-level data. However, these association tests are restricted to unrelated individuals and thus do not apply to family samples directly. Moreover, due to its flexibility and effectiveness, the linear mixed model has been increasingly utilized in GWAS to handle correlated data, such as family samples. However, it remains unknown how to perform gene-based association tests in family samples using the GWAS summary statistics estimated from the linear mixed model. In this study, we show that, when family size is negligible compared to the total sample size, the diagonal block structure of the kinship matrix makes it possible to approximate the correlation matrix of marginal Z scores by linkage disequilibrium matrix. Based on this result, current methods utilizing summary statistics for unrelated individuals can be directly applied to family data without any modifications. Our simulation results demonstrate that this proposed strategy controls the type 1 error rate well in various situations. Finally, we exemplify the usefulness of the proposed approach with a dental caries GWAS data set.
Collapse
Affiliation(s)
- Peng Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hubei, People's Republic of China
| | - Xiao Xu
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| | - Xiang-Yang Lou
- Department of Biostatistics, College of Public Health and Health Professions and College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Siqi Xu
- Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, Hong Kong
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA
| | - Guimin Gao
- Department of Public Health Sciences, University of Chicago, Chicago, Illinois, USA
| | - Ping Yin
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Hubei, People's Republic of China
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, Indiana University School of Public Health-Bloomington, Bloomington, Indiana, USA
| |
Collapse
|
2
|
Liu Z, Xu J, Tan J, Li X, Zhang F, Ouyang W, Wang S, Huang Y, Li S, Pan X. Genetic overlap for ten cardiovascular diseases: A comprehensive gene-centric pleiotropic association analysis and Mendelian randomization study. iScience 2023; 26:108150. [PMID: 37908310 PMCID: PMC10613921 DOI: 10.1016/j.isci.2023.108150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/13/2023] [Accepted: 10/02/2023] [Indexed: 11/02/2023] Open
Abstract
Recent studies suggest that pleiotropic effects may explain the genetic architecture of cardiovascular diseases (CVDs). We conducted a comprehensive gene-centric pleiotropic association analysis for ten CVDs using genome-wide association study (GWAS) summary statistics to identify pleiotropic genes and pathways that may underlie multiple CVDs. We found shared genetic mechanisms underlying the pathophysiology of CVDs, with over two-thirds of the diseases exhibiting common genes and single-nucleotide polymorphisms (SNPs). Significant positive genetic correlations were observed in more than half of paired CVDs. Additionally, we investigated the pleiotropic genes shared between different CVDs, as well as their functional pathways and distribution in different tissues. Moreover, six hub genes, including ALDH2, XPO1, HSPA1L, ESR2, WDR12, and RAB1A, as well as 26 targeted potential drugs, were identified. Our study provides further evidence for the pleiotropic effects of genetic variants on CVDs and highlights the importance of considering pleiotropy in genetic association studies.
Collapse
Affiliation(s)
- Zeye Liu
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Jing Xu
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Jiangshan Tan
- Key Laboratory of Pulmonary Vascular Medicine, National Clinical Research Center of Cardiovascular Diseases, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China
| | - Xiaofei Li
- Department of Cardiology, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Fengwen Zhang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Wenbin Ouyang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Shouzheng Wang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Yuan Huang
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Shoujun Li
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Xiangbin Pan
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| |
Collapse
|
3
|
Majumdar S, Basu S, McGue M, Chatterjee S. Simultaneous selection of multiple important single nucleotide polymorphisms in familial genome wide association studies data. Sci Rep 2023; 13:8476. [PMID: 37231056 PMCID: PMC10213008 DOI: 10.1038/s41598-023-35379-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 05/17/2023] [Indexed: 05/27/2023] Open
Abstract
We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.
Collapse
Affiliation(s)
- Subhabrata Majumdar
- University of Minnesota Twin Cities, Minneapolis, USA.
- AI Risk and Vulnerability Alliance, Seattle, USA.
| | - Saonli Basu
- University of Minnesota Twin Cities, Minneapolis, USA
| | - Matt McGue
- University of Minnesota Twin Cities, Minneapolis, USA
| | | |
Collapse
|
4
|
Xu H, Shao Z, Zhang S, Liu X, Zeng P. How can childhood maltreatment affect post-traumatic stress disorder in adult: Results from a composite null hypothesis perspective of mediation analysis. Front Psychiatry 2023; 14:1102811. [PMID: 36970281 PMCID: PMC10033829 DOI: 10.3389/fpsyt.2023.1102811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/20/2023] [Indexed: 03/11/2023] Open
Abstract
BackgroundA greatly growing body of literature has revealed the mediating role of DNA methylation in the influence path from childhood maltreatment to psychiatric disorders such as post-traumatic stress disorder (PTSD) in adult. However, the statistical method is challenging and powerful mediation analyses regarding this issue are lacking.MethodsTo study how the maltreatment in childhood alters long-lasting DNA methylation changes which further affect PTSD in adult, we here carried out a gene-based mediation analysis from a perspective of composite null hypothesis in the Grady Trauma Project (352 participants and 16,565 genes) with childhood maltreatment as exposure, multiple DNA methylation sites as mediators, and PTSD or its relevant scores as outcome. We effectively addressed the challenging issue of gene-based mediation analysis by taking its composite null hypothesis testing nature into consideration and fitting a weighted test statistic.ResultsWe discovered that childhood maltreatment could substantially affected PTSD or PTSD-related scores, and that childhood maltreatment was associated with DNA methylation which further had significant roles in PTSD and these scores. Furthermore, using the proposed mediation method, we identified multiple genes within which DNA methylation sites exhibited mediating roles in the influence path from childhood maltreatment to PTSD-relevant scores in adult, with 13 for Beck Depression Inventory and 6 for modified PTSD Symptom Scale, respectively.ConclusionOur results have the potential to confer meaningful insights into the biological mechanism for the impact of early adverse experience on adult diseases; and our proposed mediation methods can be applied to other similar analysis settings.
Collapse
Affiliation(s)
- Haibo Xu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Haibo Xu,
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, Ministry of Education Key Laboratory of Environment and Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Xin Liu
- Center for Mental Health Education and Research, Xuzhou Medical University, Xuzhou, China
- School of Management, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, China
- Ping Zeng,
| |
Collapse
|
5
|
Alamin M, Sultana MH, Lou X, Jin W, Xu H. Dissecting Complex Traits Using Omics Data: A Review on the Linear Mixed Models and Their Application in GWAS. PLANTS (BASEL, SWITZERLAND) 2022; 11:3277. [PMID: 36501317 PMCID: PMC9739826 DOI: 10.3390/plants11233277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/23/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
Genome-wide association study (GWAS) is the most popular approach to dissecting complex traits in plants, humans, and animals. Numerous methods and tools have been proposed to discover the causal variants for GWAS data analysis. Among them, linear mixed models (LMMs) are widely used statistical methods for regulating confounding factors, including population structure, resulting in increased computational proficiency and statistical power in GWAS studies. Recently more attention has been paid to pleiotropy, multi-trait, gene-gene interaction, gene-environment interaction, and multi-locus methods with the growing availability of large-scale GWAS data and relevant phenotype samples. In this review, we have demonstrated all possible LMMs-based methods available in the literature for GWAS. We briefly discuss the different LMM methods, software packages, and available open-source applications in GWAS. Then, we include the advantages and weaknesses of the LMMs in GWAS. Finally, we discuss the future perspective and conclusion. The present review paper would be helpful to the researchers for selecting appropriate LMM models and methods quickly for GWAS data analysis and would benefit the scientific society.
Collapse
Affiliation(s)
- Md. Alamin
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | | | - Xiangyang Lou
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Wenfei Jin
- Department of Biology, School of Life Sciences, Southern University of Science and Technology, Shenzhen 518055, China
| | - Haiming Xu
- Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
6
|
Chen W, Coombes BJ, Larson NB. Recent advances and challenges of rare variant association analysis in the biobank sequencing era. Front Genet 2022; 13:1014947. [PMID: 36276986 PMCID: PMC9582646 DOI: 10.3389/fgene.2022.1014947] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 09/22/2022] [Indexed: 12/04/2022] Open
Abstract
Causal variants for rare genetic diseases are often rare in the general population. Rare variants may also contribute to common complex traits and can have much larger per-allele effect sizes than common variants, although power to detect these associations can be limited. Sequencing costs have steadily declined with technological advancements, making it feasible to adopt whole-exome and whole-genome profiling for large biobank-scale sample sizes. These large amounts of sequencing data provide both opportunities and challenges for rare-variant association analysis. Herein, we review the basic concepts of rare-variant analysis methods, the current state-of-the-art methods in utilizing variant annotations or external controls to improve the statistical power, and particular challenges facing rare variant analysis such as accounting for population structure, extremely unbalanced case-control design. We also review recent advances and challenges in rare variant analysis for familial sequencing data and for more complex phenotypes such as survival data. Finally, we discuss other potential directions for further methodology investigation.
Collapse
Affiliation(s)
- Wenan Chen
- Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Brandon J. Coombes
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| | - Nicholas B. Larson
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Wenan Chen, ; Brandon J. Coombes, ; Nicholas B. Larson,
| |
Collapse
|
7
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
8
|
Li S, Li S, Su S, Zhang H, Shen J, Wen Y. Gene Region Association Analysis of Longitudinal Quantitative Traits Based on a Function-On-Function Regression Model. Front Genet 2022; 13:781740. [PMID: 35265102 PMCID: PMC8899465 DOI: 10.3389/fgene.2022.781740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
In the process of growth and development in life, gene expressions that control quantitative traits will turn on or off with time. Studies of longitudinal traits are of great significance in revealing the genetic mechanism of biological development. With the development of ultra-high-density sequencing technology, the associated analysis has tremendous challenges to statistical methods. In this paper, a longitudinal functional data association test (LFDAT) method is proposed based on the function-on-function regression model. LFDAT can simultaneously treat phenotypic traits and marker information as continuum variables and analyze the association of longitudinal quantitative traits and gene regions. Simulation studies showed that: 1) LFDAT performs well for both linkage equilibrium simulation and linkage disequilibrium simulation, 2) LFDAT has better performance for gene regions (include common variants, low-frequency variants, rare variants and mixture), and 3) LFDAT can accurately identify gene switching in the growth and development stage. The longitudinal data of the Oryza sativa projected shoot area is analyzed by LFDAT. It showed that there is the advantage of quick calculations. Further, an association analysis was conducted between longitudinal traits and gene regions by integrating the micro effects of multiple related variants and using the information of the entire gene region. LFDAT provides a feasible method for studying the formation and expression of longitudinal traits.
Collapse
Affiliation(s)
- Shijing Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shiqin Li
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Shaoqiang Su
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hui Zhang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiayu Shen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
9
|
Lu H, Qiao J, Shao Z, Wang T, Huang S, Zeng P. A comprehensive gene-centric pleiotropic association analysis for 14 psychiatric disorders with GWAS summary statistics. BMC Med 2021; 19:314. [PMID: 34895209 PMCID: PMC8667366 DOI: 10.1186/s12916-021-02186-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear. METHODS We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders. RESULTS We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders. CONCLUSIONS Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
10
|
Lu H, Wei Y, Jiang Z, Zhang J, Wang T, Huang S, Zeng P. Integrative eQTL-weighted hierarchical Cox models for SNP-set based time-to-event association studies. J Transl Med 2021; 19:418. [PMID: 34627275 PMCID: PMC8502405 DOI: 10.1186/s12967-021-03090-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Accepted: 09/26/2021] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Integrating functional annotations into SNP-set association studies has been proven a powerful analysis strategy. Statistical methods for such integration have been developed for continuous and binary phenotypes; however, the SNP-set integrative approaches for time-to-event or survival outcomes are lacking. METHODS We here propose IEHC, an integrative eQTL (expression quantitative trait loci) hierarchical Cox regression, for SNP-set based survival association analysis by modeling effect sizes of genetic variants as a function of eQTL via a hierarchical manner. Three p-values combination tests are developed to examine the joint effects of eQTL and genetic variants after a novel decorrelated modification of statistics for the two components. An omnibus test (IEHC-ACAT) is further adapted to aggregate the strengths of all available tests. RESULTS Simulations demonstrated that the IEHC joint tests were more powerful if both eQTL and genetic variants contributed to association signal, while IEHC-ACAT was robust and often outperformed other approaches across various simulation scenarios. When applying IEHC to ten TCGA cancers by incorporating eQTL from relevant tissues of GTEx, we revealed that substantial correlations existed between the two types of effect sizes of genetic variants from TCGA and GTEx, and identified 21 (9 unique) cancer-associated genes which would otherwise be missed by approaches not incorporating eQTL. CONCLUSION IEHC represents a flexible, robust, and powerful approach to integrate functional omics information to enhance the power of identifying association signals for the survival risk of complex human cancers.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yongyue Wei
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
11
|
Family-based gene-environment interaction using sequence kernel association test (FGE-SKAT) for complex quantitative traits. Sci Rep 2021; 11:7431. [PMID: 33795796 PMCID: PMC8016937 DOI: 10.1038/s41598-021-86871-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 03/22/2021] [Indexed: 11/30/2022] Open
Abstract
After the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.
Collapse
|
12
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
13
|
Li W, Zheng M, Zhao G, Wang J, Liu J, Wang S, Feng F, Liu D, Zhu D, Li Q, Guo L, Guo Y, Liu R, Wen J. Identification of QTL regions and candidate genes for growth and feed efficiency in broilers. Genet Sel Evol 2021; 53:13. [PMID: 33549052 PMCID: PMC7866652 DOI: 10.1186/s12711-021-00608-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 01/26/2021] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Feed accounts for about 70% of the total cost of poultry meat production. Residual feed intake (RFI) has become the preferred measure of feed efficiency because it is phenotypically independent of growth rate and body weight. In this study, our aim was to estimate genetic parameters and identify quantitative trait loci (QTL) for feed efficiency in 3314 purebred broilers using a genome-wide association study. Broilers were genotyped using a custom 55 K single nucleotide polymorphism (SNP) array. RESULTS Estimates of genomic heritability for seven growth and feed efficiency traits, including body weight at 28 days of age (BW28), BW42, average daily feed intake (ADFI), RFI, and RFI adjusted for weight of abdominal fat (RFIa), ranged from 0.12 to 0.26. Eleven genome-wide significant SNPs and 15 suggestively significant SNPs were detected, of which 19 clustered around two genomic regions. A region on chromosome 16 (2.34-2.66 Mb) was associated with both BW28 and BW42, and the most significant SNP in this region, AX_101003762, accounted for 7.6% of the genetic variance of BW28. The other region, on chromosome 1 (91.27-92.43 Mb) was associated with RFI and ADFI, and contains the NSUN3 and EPHA6 as candidate genes. The most significant SNP in this region, AX_172588157, accounted for 4.4% of the genetic variance of RFI. In addition, a genomic region containing the gene AGK on chromosome 1 was found to be associated with RFIa. The NSUN3 and AGK genes were found to be differentially expressed in breast muscle, thigh muscle, and abdominal fat between male broilers with high and low RFI. CONCLUSIONS We identified QTL regions for BW28 and BW42 (spanning 0.32 Mb) and RFI (spanning 1.16 Mb). The NSUN3, EPHA6, and AGK were identified as the most likely candidate genes for these QTL. These genes are involved in mitochondrial function and behavioral regulation. These results contribute to the identification of candidate genes and variants for growth and feed efficiency in poultry.
Collapse
Affiliation(s)
- Wei Li
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Maiqing Zheng
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Guiping Zhao
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Jie Wang
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Jie Liu
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Shunli Wang
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Furong Feng
- Foshan Gaoming Xinguang Agricultural and Animal Industrials Corporation, Foshan, 528515 China
| | - Dawei Liu
- Foshan Gaoming Xinguang Agricultural and Animal Industrials Corporation, Foshan, 528515 China
| | - Dan Zhu
- Foshan Gaoming Xinguang Agricultural and Animal Industrials Corporation, Foshan, 528515 China
| | - Qinghe Li
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Liping Guo
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Yuming Guo
- College of Animal Science and Technology, China Agricultural University, Beijing, 100193 China
| | - Ranran Liu
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Jie Wen
- State Key Laboratory of Animal Nutrition; Key Laboratory of Animal (Poultry) Genetics Breeding and Reproduction, Ministry of Agriculture, Institute of Animal Sciences, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| |
Collapse
|
14
|
Gao C, Sha Q, Zhang S, Zhang K. MF-TOWmuT: Testing an optimally weighted combination of common and rare variants with multiple traits using family data. Genet Epidemiol 2020; 45:64-81. [PMID: 33047835 DOI: 10.1002/gepi.22355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 08/03/2020] [Accepted: 08/18/2020] [Indexed: 11/11/2022]
Abstract
With rapid advancements of sequencing technologies and accumulations of electronic health records, a large number of genetic variants and multiple correlated human complex traits have become available in many genetic association studies. Thus, it becomes necessary and important to develop new methods that can jointly analyze the association between multiple genetic variants and multiple traits. Compared with methods that only use a single marker or trait, the joint analysis of multiple genetic variants and multiple traits is more powerful since such an analysis can fully incorporate the correlation structure of genetic variants and/or traits and their mutual dependence patterns. However, most of existing methods that simultaneously analyze multiple genetic variants and multiple traits are only applicable to unrelated samples. We develop a new method called MF-TOWmuT to detect association of multiple phenotypes and multiple genetic variants in a genomic region with family samples. MF-TOWmuT is based on an optimally weighted combination of variants. Our method can be applied to both rare and common variants and both qualitative and quantitative traits. Our simulation results show that (1) the type I error of MF-TOWmuT is preserved; (2) MF-TOWmuT outperforms two existing methods such as Multiple Family-based Quasi-Likelihood Score Test and Multivariate Family-based Rare Variant Association Test in terms of power. We also illustrate the usefulness of MF-TOWmuT by analyzing genotypic and phenotipic data from the Genetics of Kidneys in Diabetes study. R program is available at https://github.com/gaochengPRC/MF-TOWmuT.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| |
Collapse
|
15
|
Hamazaki K, Iwata H. RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLoS Comput Biol 2020; 16:e1007663. [PMID: 32059004 PMCID: PMC7046296 DOI: 10.1371/journal.pcbi.1007663] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 02/27/2020] [Accepted: 01/18/2020] [Indexed: 11/18/2022] Open
Abstract
Difficulty in detecting rare variants is one of the problems in conventional genome-wide association studies (GWAS). The problem is closely related to the complex gene compositions comprising multiple alleles, such as haplotypes. Several single nucleotide polymorphism (SNP) set approaches have been proposed to solve this problem. These methods, however, have been rarely discussed in connection with haplotypes. In this study, we developed a novel SNP-set method named "RAINBOW" and applied the method to haplotype-based GWAS by regarding a haplotype block as a SNP-set. Combining haplotype block estimation and SNP-set GWAS, haplotype-based GWAS can be conducted without prior information of haplotypes. We prepared 100 datasets of simulated phenotypic data and real marker genotype data of Oryza sativa subsp. indica, and performed GWAS of the datasets. We compared the power of our method, the conventional single-SNP GWAS, the conventional haplotype-based GWAS, and the conventional SNP-set GWAS. Our proposed method was shown to be superior to these in three aspects: (1) controlling false positives; (2) in detecting causal variants without relying on the linkage disequilibrium if causal variants were genotyped in the dataset; and (3) it showed greater power than the other methods, i.e., it was able to detect causal variants that were not detected by the others, primarily when the causal variants were located very close to each other, and the directions of their effects were opposite. By using the SNP-set approach as in this study, we expect that detecting not only rare variants but also genes with complex mechanisms, such as genes with multiple causal variants, can be realized. RAINBOW was implemented as an R package named "RAINBOWR" and is available from CRAN (https://cran.r-project.org/web/packages/RAINBOWR/index.html) and GitHub (https://github.com/KosukeHamazaki/RAINBOWR).
Collapse
Affiliation(s)
- Kosuke Hamazaki
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
| | - Hiroyoshi Iwata
- Department of Agricultural and Environmental Biology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan
- * E-mail:
| |
Collapse
|
16
|
Koh H, Li Y, Zhan X, Chen J, Zhao N. A Distance-Based Kernel Association Test Based on the Generalized Linear Mixed Model for Correlated Microbiome Studies. Front Genet 2019; 10:458. [PMID: 31156711 PMCID: PMC6532659 DOI: 10.3389/fgene.2019.00458] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Accepted: 04/30/2019] [Indexed: 12/12/2022] Open
Abstract
Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
Collapse
Affiliation(s)
- Hyunwook Koh
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yutong Li
- School of Physics, Peking University, Beijing, China
| | - Xiang Zhan
- Department of Public Health Sciences, Pennsylvania State University, Hershey, PA, United States
| | - Jun Chen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
17
|
Chiu CY, Yuan F, Zhang BS, Yuan A, Li X, Fang HB, Lange K, Weeks DE, Wilson AF, Bailey-Wilson JE, Musolf AM, Stambolian D, Lakhal-Chaieb ML, Cook RJ, McMahon FJ, Amos CI, Xiong M, Fan R. Linear mixed models for association analysis of quantitative traits with next-generation sequencing data. Genet Epidemiol 2019; 43:189-206. [PMID: 30537345 PMCID: PMC6375753 DOI: 10.1002/gepi.22177] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 08/27/2018] [Accepted: 09/26/2018] [Indexed: 01/01/2023]
Abstract
We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. F -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the F -distributed statistics provide a good control of the type I error rate. The F -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The F -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels α = 0.01 and 0.05 . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Fang Yuan
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| | - Bing-Song Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Xin Li
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Hong-Bin Fang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Anthony M Musolf
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Dwight Stambolian
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, Ontario, Quebec, Canada
| | - Francis J McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, University of Waterloo, National Institute of Mental Health, NIH, Bethesda, Maryland
| | | | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| |
Collapse
|
18
|
Chen H, Huffman JE, Brody JA, Wang C, Lee S, Li Z, Gogarten SM, Sofer T, Bielak LF, Bis JC, Blangero J, Bowler RP, Cade BE, Cho MH, Correa A, Curran JE, de Vries PS, Glahn DC, Guo X, Johnson AD, Kardia S, Kooperberg C, Lewis JP, Liu X, Mathias RA, Mitchell BD, O’Connell JR, Peyser PA, Post WS, Reiner AP, Rich SS, Rotter JI, Silverman EK, Smith JA, Vasan RS, Wilson JG, Yanek LR, Redline S, Smith NL, Boerwinkle E, Borecki IB, Cupples LA, Laurie CC, Morrison AC, Rice KM, Lin X, Rice KM, Lin X. Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am J Hum Genet 2019; 104:260-274. [PMID: 30639324 DOI: 10.1016/j.ajhg.2018.12.012] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2018] [Accepted: 12/17/2018] [Indexed: 12/12/2022] Open
Abstract
With advances in whole-genome sequencing (WGS) technology, more advanced statistical methods for testing genetic association with rare variants are being developed. Methods in which variants are grouped for analysis are also known as variant-set, gene-based, and aggregate unit tests. The burden test and sequence kernel association test (SKAT) are two widely used variant-set tests, which were originally developed for samples of unrelated individuals and later have been extended to family data with known pedigree structures. However, computationally efficient and powerful variant-set tests are needed to make analyses tractable in large-scale WGS studies with complex study samples. In this paper, we propose the variant-set mixed model association tests (SMMAT) for continuous and binary traits using the generalized linear mixed model framework. These tests can be applied to large-scale WGS studies involving samples with population structure and relatedness, such as in the National Heart, Lung, and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program. SMMATs share the same null model for different variant sets, and a virtue of this null model, which includes covariates only, is that it needs to be fit only once for all tests in each genome-wide analysis. Simulation studies show that all the proposed SMMATs correctly control type I error rates for both continuous and binary traits in the presence of population structure and relatedness. We also illustrate our tests in a real data example of analysis of plasma fibrinogen levels in the TOPMed program (n = 23,763), using the Analysis Commons, a cloud-based computing platform.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Statistics, Harvard University, Cambridge, MA 02138, USA.
| |
Collapse
|
19
|
Runcie DE, Crawford L. Fast and flexible linear mixed models for genome-wide genetics. PLoS Genet 2019; 15:e1007978. [PMID: 30735486 PMCID: PMC6383949 DOI: 10.1371/journal.pgen.1007978] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 02/21/2019] [Accepted: 01/21/2019] [Indexed: 11/18/2022] Open
Abstract
Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.
Collapse
Affiliation(s)
- Daniel E. Runcie
- Department of Plant Sciences, University of California Davis, Davis, California, United States of America
| | - Lorin Crawford
- Department of Biostatistics, Brown University, Providence, Rhode Island, United States of America
| |
Collapse
|
20
|
Qi W, Allen AS, Li YJ. Family-based association tests for rare variants with censored traits. PLoS One 2019; 14:e0210870. [PMID: 30682063 PMCID: PMC6347269 DOI: 10.1371/journal.pone.0210870] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2018] [Accepted: 12/27/2018] [Indexed: 11/30/2022] Open
Abstract
We propose a set of family-based burden and kernel tests for censored traits (FamBAC and FamKAC). Here, censored traits refer to time-to-event outcomes, for instance, age-at-onset of a disease. To model censored traits in family-based designs, we used the frailty model, which incorporated not only fixed genetic effects of rare variants in a region of interest but also random polygenic effects shared within families. We first partitioned genotype scores of rare variants into orthogonal between- and within-family components, and then derived their corresponding efficient score statistics from the frailty model. Finally, FamBAC and FamKAC were constructed by aggregating the weighted efficient scores of the within-family components across rare variants and subjects. FamBAC collapsed rare variants within subject first to form a burden test that followed a chi-squared distribution; whereas FamKAC was a variant component test following a mixture of chi-squared distributions. For FamKAC, p-values can be computed by permutation tests or for computational efficiency by approximation methods. Through simulation studies, we showed that type I error was correctly controlled by FamBAC for various variant weighting schemes (0.0371 to 0.0527). However, FamKAC type I error rates based on approximation methods were deflated (max 0.0376) but improved by permutation tests. Our simulations also demonstrated that burden test FamBAC had higher power than kernel test FamKAC when high proportion (e.g. ≥ 80%) of causal variants had effects in the same direction. In contrast, when the effects of causal variants on the censored trait were in mixed directions, FamKAC outperformed FamBAC and had comparable or higher power than an existing method, RVFam. Our proposed framework has the flexibility to accommodate general nuclear families, and can be used to analyze sequence data for censored traits such as age-at-onset of a complex disease of interest.
Collapse
Affiliation(s)
- Wenjing Qi
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Duke Molecular Physiology Institute, Duke University, Durham, NC, United States of America
| | - Andrew S. Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Center for Statistical Genetics and Genomics, Duke University, Durham, NC, United States of America
| | - Yi-Ju Li
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, United States of America
- Duke Molecular Physiology Institute, Duke University, Durham, NC, United States of America
- * E-mail:
| |
Collapse
|
21
|
Larson NB, Chen J, Schaid DJ. A review of kernel methods for genetic association studies. Genet Epidemiol 2019; 43:122-136. [PMID: 30604442 DOI: 10.1002/gepi.22180] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/09/2018] [Accepted: 11/26/2018] [Indexed: 12/17/2022]
Abstract
Evaluating the association of multiple genetic variants with a trait of interest by use of kernel-based methods has made a significant impact on how genetic association analyses are conducted. An advantage of kernel methods is that they tend to be robust when the genetic variants have effects that are a mixture of positive and negative effects, as well as when there is a small fraction of causal variants. Another advantage is that kernel methods fit within the framework of mixed models, providing flexible ways to adjust for additional covariates that influence traits. Herein, we review the basic ideas behind the use of kernel methods for genetic association analysis as well as recent methodological advancements for different types of traits, multivariate traits, pedigree data, and longitudinal data. Finally, we discuss opportunities for future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Jun Chen
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| | - Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
22
|
Zhan X, Xue L, Zheng H, Plantinga A, Wu MC, Schaid DJ, Zhao N, Chen J. A small‐sample kernel association test for correlated data with application to microbiome association studies. Genet Epidemiol 2018; 42:772-782. [DOI: 10.1002/gepi.22160] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 06/27/2018] [Accepted: 07/15/2018] [Indexed: 01/11/2023]
Affiliation(s)
- Xiang Zhan
- Department of Public Health SciencesPennsylvania State UniversityHershey Pennsylvania
| | - Lingzhou Xue
- Department of StatisticsPennsylvania State UniversityUniversity Park Pennsylvania
| | - Haotian Zheng
- Department of Mathematical SciencesTsinghua UniversityBeijing China
| | - Anna Plantinga
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
| | - Michael C. Wu
- Department of BiostatisticsUniversity of WashingtonSeattle Washington
- Division of Public Health SciencesFred Hutchinson Cancer Research CenterSeattle Washington
| | - Daniel J. Schaid
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
| | - Ni Zhao
- Department of BiostatisticsJohns Hopkins UniversityBaltimore Maryland
| | - Jun Chen
- Division of Biomedical Statistics and InformaticsMayo ClinicRochester Minnesota
- Center for Individualized MedicineMayo ClinicRochester Minnesota
| |
Collapse
|
23
|
Wu X, Guan T, Liu DJ, León Novelo LG, Bandyopadhyay D. ADAPTIVE-WEIGHT BURDEN TEST FOR ASSOCIATIONS BETWEEN QUANTITATIVE TRAITS AND GENOTYPE DATA WITH COMPLEX CORRELATIONS. Ann Appl Stat 2018; 12:1558-1582. [PMID: 30214655 DOI: 10.1214/17-aoas1121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
High-throughput sequencing has often been used to screen samples from pedigrees or with population structure, producing genotype data with complex correlations rendered from both familial relation and linkage disequilibrium. With such data, it is critical to account for these genotypic correlations when assessing the contribution of variants by gene or pathway. Recognizing the limitations of existing association testing methods, we propose Adaptive-weight Burden Test (ABT), a retrospective, mixed-model test for genetic association of quantitative traits on genotype data with complex correlations. This method makes full use of genotypic correlations across both samples and variants, and adopts "data-driven" weights to improve power. We derive the ABT statistic and its explicit distribution under the null hypothesis, and demonstrate through simulation studies that it is generally more powerful than the fixed-weight burden test and family-based SKAT in various scenarios, controlling for the type I error rate. Further investigation reveals the connection of ABT with kernel tests, as well as the adaptability of its weights to the direction of genetic effects. The application of ABT is illustrated by a whole genome analysis of genes with common and rare variants associated with fasting glucose from the NHLBI "Grand Opportunity" Exome Sequencing Project.
Collapse
Affiliation(s)
- Xiaowei Wu
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Ting Guan
- Department of Statistics, Virginia Tech, 250 Drillfield Drive, MC0439, Blacksburg, VA 24061, USA
| | - Dajiang J Liu
- Department of Public Health Sciences, Hershey Institute of Personalized Medicine, Pennsylvania State University College of Medicine, Hershey, PA 17033, USA
| | - Luis G León Novelo
- Department of Biostatistics, School of Public Health, University of Texas Health Science Center, Houston, TX 77030, USA
| | | |
Collapse
|
24
|
Lumley T, Brody J, Peloso G, Morrison A, Rice K. FastSKAT: Sequence kernel association tests for very large sets of markers. Genet Epidemiol 2018; 42:516-527. [PMID: 29932245 PMCID: PMC6129408 DOI: 10.1002/gepi.22136] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 04/30/2018] [Accepted: 05/10/2018] [Indexed: 11/06/2022]
Abstract
The sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an n × n matrix, for n subjects) has computational complexity proportional to n3 . As SKAT is often used when n > 10 4 , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome-wide association by class of histone marker.
Collapse
Affiliation(s)
| | - Jennifer Brody
- Cardiovascular Health Research Unit, University of Washington
| | - Gina Peloso
- Department of Biostatistics, Boston University
| | | | - Kenneth Rice
- Department of Biostatistics, University of Washington
| |
Collapse
|
25
|
Wang X, Zhang Z, Morris N, Cai T, Lee S, Wang C, Yu TW, Walsh CA, Lin X. Rare variant association test in family-based sequencing studies. Brief Bioinform 2018; 18:954-961. [PMID: 27677958 DOI: 10.1093/bib/bbw083] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Indexed: 12/20/2022] Open
Abstract
The objective of this article is to introduce valid and robust methods for the analysis of rare variants for family-based exome chips, whole-exome sequencing or whole-genome sequencing data. Family-based designs provide unique opportunities to detect genetic variants that complement studies of unrelated individuals. Currently, limited methods and software tools have been developed to assist family-based association studies with rare variants, especially for analyzing binary traits. In this article, we address this gap by extending existing burden and kernel-based gene set association tests for population data to related samples, with a particular emphasis on binary phenotypes. The proposed approach blends the strengths of kernel machine methods and generalized estimating equations. Importantly, the efficient generalized kernel score test can be applied as a mega-analysis framework to combine studies with different designs. We illustrate the application of the proposed method using data from an exome sequencing study of autism. Methods discussed in this article are implemented in an R package 'gskat', which is available on CRAN and GitHub.
Collapse
|
26
|
Novel Methods for Family-Based Genetic Studies. Methods Mol Biol 2018. [PMID: 29876895 DOI: 10.1007/978-1-4939-7868-7_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The recent development of microarray and sequencing technology allows identification of disease susceptibility genes. Although the genome-wide association studies (GWAS) have successfully identified many genetic markers related to human diseases, the traditional statistical methods are not powerful to detect rare genetic markers. The rare genetic markers are usually grouped together and tested at the set level. One of such methods is the sequence kernel association test (SKAT), which has been commonly used in the rare genetic marker analysis. In recent publications, SKAT has been extended to be applicable for family-based rare variant analysis. Here, I present three published statistical approaches for family-based rare variant analysis for: 1. continuous traits, 2. binary traits, and 3. multiple correlated traits.
Collapse
|
27
|
Randolph TW, Zhao S, Copeland W, Hullar M, Shojaie A. KERNEL-PENALIZED REGRESSION FOR ANALYSIS OF MICROBIOME DATA. Ann Appl Stat 2018; 12:540-566. [PMID: 30224943 PMCID: PMC6138053 DOI: 10.1214/17-aoas1102] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The analysis of human microbiome data is often based on dimension-reduced graphical displays and clusterings derived from vectors of microbial abundances in each sample. Common to these ordination methods is the use of biologically motivated definitions of similarity. Principal coordinate analysis, in particular, is often performed using ecologically defined distances, allowing analyses to incorporate context-dependent, non-Euclidean structure. In this paper, we go beyond dimension-reduced ordination methods and describe a framework of high-dimensional regression models that extends these distance-based methods. In particular, we use kernel-based methods to show how to incorporate a variety of extrinsic information, such as phylogeny, into penalized regression models that estimate taxonspecific associations with a phenotype or clinical outcome. Further, we show how this regression framework can be used to address the compositional nature of multivariate predictors comprised of relative abundances; that is, vectors whose entries sum to a constant. We illustrate this approach with several simulations using data from two recent studies on gut and vaginal microbiomes. We conclude with an application to our own data, where we also incorporate a significance test for the estimated coefficients that represent associations between microbial abundance and a percent fat.
Collapse
|
28
|
Powerful Genetic Association Analysis for Common or Rare Variants with High-Dimensional Structured Traits. Genetics 2017. [PMID: 28642271 DOI: 10.1534/genetics.116.199646] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Many genetic association studies collect a wide range of complex traits. As these traits may be correlated and share a common genetic mechanism, joint analysis can be statistically more powerful and biologically more meaningful. However, most existing tests for multiple traits cannot be used for high-dimensional and possibly structured traits, such as network-structured transcriptomic pathway expressions. To overcome potential limitations, in this article we propose the dual kernel-based association test (DKAT) for testing the association between multiple traits and multiple genetic variants, both common and rare. In DKAT, two individual kernels are used to describe the phenotypic and genotypic similarity, respectively, between pairwise subjects. Using kernels allows for capturing structure while accommodating dimensionality. Then, the association between traits and genetic variants is summarized by a coefficient which measures the association between two kernel matrices. Finally, DKAT evaluates the hypothesis of nonassociation with an analytical P-value calculation without any computationally expensive resampling procedures. By collapsing information in both traits and genetic variants using kernels, the proposed DKAT is shown to have a correct type-I error rate and higher power than other existing methods in both simulation studies and application to a study of genetic regulation of pathway gene expressions.
Collapse
|
29
|
Wang Z, Xu K, Zhang X, Wu X, Wang Z. Longitudinal SNP-set association analysis of quantitative phenotypes. Genet Epidemiol 2017; 41:81-93. [PMID: 27859628 PMCID: PMC5154867 DOI: 10.1002/gepi.22016] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Revised: 08/10/2016] [Accepted: 09/19/2016] [Indexed: 02/06/2023]
Abstract
Many genetic epidemiological studies collect repeated measurements over time. This design not only provides a more accurate assessment of disease condition, but allows us to explore the genetic influence on disease development and progression. Thus, it is of great interest to study the longitudinal contribution of genes to disease susceptibility. Most association testing methods for longitudinal phenotypes are developed for single variant, and may have limited power to detect association, especially for variants with low minor allele frequency. We propose Longitudinal SNP-set/sequence kernel association test (LSKAT), a robust, mixed-effects method for association testing of rare and common variants with longitudinal quantitative phenotypes. LSKAT uses several random effects to account for the within-subject correlation in longitudinal data, and allows for adjustment for both static and time-varying covariates. We also present a longitudinal trait burden test (LBT), where we test association between the trait and the burden score in linear mixed models. In simulation studies, we demonstrate that LBT achieves high power when variants are almost all deleterious or all protective, while LSKAT performs well in a wide range of genetic models. By making full use of trait values from repeated measures, LSKAT is more powerful than several tests applied to a single measurement or average over all time points. Moreover, LSKAT is robust to misspecification of the covariance structure. We apply the LSKAT and LBT methods to detect association with longitudinally measured body mass index in the Framingham Heart Study, where we are able to replicate association with a circadian gene NR1D2.
Collapse
Affiliation(s)
- Zhong Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Baker Institute for Animal Health, Cornell University, Ithaca, New York, United States of America
- Center for Computational Biology, Beijing Forestry University, Beijing, China
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
| | - Xiaowei Wu
- Department of Statistics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, United States of America
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| |
Collapse
|
30
|
Bureau A, Croteau J. When Is an Endophenotype Useful to Detect Association to a Disease? Exploring the Relationships between Disease Status, Endophenotype and Genetic Polymorphisms. Hum Hered 2016; 81:11-25. [PMID: 27475094 DOI: 10.1159/000446475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/26/2016] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES To investigate the conditions and analysis strategies required so that endophenotypes related to a disease help discover genetic variants involved in the disease. METHODS The association with disease susceptibility variants is examined as a function of the relationships between disease status, endophenotype values and the genotype at another disease or endophenotype susceptibility locus assumed to be previously known, using approximate linear models of allele frequencies as a function of these variables and simulations in the context of family studies when the endophenotype is dichotomous. RESULTS Under genetic mechanisms where the risk allele of the tested locus has an effect exclusively in subjects with the endophenotype, the risk allele frequency differences between affected and unaffected subjects are much greater in the subset of subjects with an endophenotype impairment than in those without such an impairment, and power gains are obtained when testing the association under a joint disease-endophenotype model, both with two-locus or single-locus tests. However, with moderate main effect on the risk of disease or endophenotype impairment, testing directly the association between risk allele and disease or endophenotype is more powerful than testing under a joint disease-endophenotype model. CONCLUSIONS Joint modeling of disease and endophenotype should be used only in parallel with standard disease association testing.
Collapse
Affiliation(s)
- Alexandre Bureau
- Département de médecine sociale et préventive, Université Laval, Québec, Qué., Canada
| | | |
Collapse
|
31
|
Darst BF, Engelman CD. Transmission and decorrelation methods for detecting rare variants using sequencing data from related individuals. BMC Proc 2016; 10:203-207. [PMID: 27980637 PMCID: PMC5133523 DOI: 10.1186/s12919-016-0031-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND Advances in whole genome sequencing have enabled the investigation of rare variants, which could explain some of the missing heritability that genome-wide association studies are unable to detect. Most methods to detect associations with rare variants are developed for unrelated individuals; however, several methods exist that utilize family studies and could have better power to detect such associations. METHODS Using whole genome sequencing data and simulated phenotypes provided by the organizers of the Genetic Analysis Workshop 19 (GAW19), we compared family-based methods that test for associations between rare and common variants with a quantitative trait. This was done using 2 fairly novel methods: family-based association test for rare variants (FBAT-RV), which is a transmission-based method that utilizes the transmission of genetic information from parent to offspring; and Minimum p value Optimized Nuisance parameter Score Test Extended to Relatives (MONSTER), which is a decorrelation method that instead attempts to adjust for relatedness using a regression-based method. We also considered family-based association test linear combination (FBAT-LC) and FBAT-Min P, which are slightly older methods that do not allow for the weighting of rare or common variants, but contrast some of the limitations of FBAT-RV. RESULTS MONSTER had much higher overall power than FBAT-RV and FBAT-Min P. Interestingly, FBAT-LC had similar overall power as MONSTER. MONSTER had the highest power for a gene accounting for a larger percent of the phenotypic variance, whereas MONSTER and FBAT-LC both had the highest power for a gene accounting for moderate variance. FBAT-LC had the highest power for a gene accounting for the least variance. CONCLUSIONS Based on the simulated data from GAW19, MONSTER and FBAT-LC were the most powerful of the methods assessed. However, there are limitations to each of these methods that should be carefully considered when conducting an analysis of rare variants in related individuals. This emphasizes the need for methods that can incorporate the advantages of each of these methods into 1 family-based association test for rare variants.
Collapse
Affiliation(s)
- Burcu F. Darst
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| | - Corinne D. Engelman
- University of Wisconsin, Madison, WI USA
- Department of Population Health Sciences, University of Wisconsin School of Medicine and Public Health, Madison, WI USA
| |
Collapse
|
32
|
Malzahn D, Friedrichs S, Bickeböller H. Comparing strategies for combined testing of rare and common variants in whole sequence and genome-wide genotype data. BMC Proc 2016; 10:269-273. [PMID: 27980648 PMCID: PMC5133495 DOI: 10.1186/s12919-016-0042-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
We used our extension of the kernel score test to family data to analyze real and simulated baseline systolic blood pressure in extended pedigrees. We compared the power for different kernels and for different weightings of genetic markers. Moreover, we compared the power of rare and common markers with 3 strategies for joint testing and on marker panels with different densities. Marker weights had much greater influence on power than the kernel chosen. Inverse minor allele frequency weights often increased power on common markers but could decrease power on rare markers. Furthermore, defining the gene region based on linkage disequilibrium blocks often yielded robust power of joint tests of rare and common markers.
Collapse
Affiliation(s)
- Dörthe Malzahn
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
| | - Stefanie Friedrichs
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
| | - Heike Bickeböller
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Humboldtallee 32, 37073 Göttingen, Germany
| |
Collapse
|
33
|
Zhang Q, Guldbrandtsen B, Calus MPL, Lund MS, Sahana G. Comparison of gene-based rare variant association mapping methods for quantitative traits in a bovine population with complex familial relationships. Genet Sel Evol 2016; 48:60. [PMID: 27534618 PMCID: PMC4989328 DOI: 10.1186/s12711-016-0238-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 08/04/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There is growing interest in the role of rare variants in the variation of complex traits due to increasing evidence that rare variants are associated with quantitative traits. However, association methods that are commonly used for mapping common variants are not effective to map rare variants. Besides, livestock populations have large half-sib families and the occurrence of rare variants may be confounded with family structure, which makes it difficult to disentangle their effects from family mean effects. We compared the power of methods that are commonly applied in human genetics to map rare variants in cattle using whole-genome sequence data and simulated phenotypes. We also studied the power of mapping rare variants using linear mixed models (LMM), which are the method of choice to account for both family relationships and population structure in cattle. RESULTS We observed that the power of the LMM approach was low for mapping a rare variant (defined as those that have frequencies lower than 0.01) with a moderate effect (5 to 8 % of phenotypic variance explained by multiple rare variants that vary from 5 to 21 in number) contributing to a QTL with a sample size of 1000. In contrast, across the scenarios studied, statistical methods that are specialized for mapping rare variants increased power regardless of whether multiple rare variants or a single rare variant underlie a QTL. Different methods for combining rare variants in the test single nucleotide polymorphism set resulted in similar power irrespective of the proportion of total genetic variance explained by the QTL. However, when the QTL variance is very small (only 0.1 % of the total genetic variance), these specialized methods for mapping rare variants and LMM generally had no power to map the variants within a gene with sample sizes of 1000 or 5000. CONCLUSIONS We observed that the methods that combine multiple rare variants within a gene into a meta-variant generally had greater power to map rare variants compared to LMM. Therefore, it is recommended to use rare variant association mapping methods to map rare genetic variants that affect quantitative traits in livestock, such as bovine populations.
Collapse
Affiliation(s)
- Qianqian Zhang
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark. .,Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, The Netherlands.
| | - Bernt Guldbrandtsen
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| | - Mario P L Calus
- Animal Breeding and Genomics Centre, Wageningen UR Livestock Research, Wageningen, The Netherlands
| | - Mogens Sandø Lund
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| | - Goutam Sahana
- Department of Molecular Biology and Genetics, Center for Quantitative Genetics and Genomics, Aarhus University, Tjele, 8830, Denmark
| |
Collapse
|
34
|
Jeng XJ, Daye ZJ, Lu W, Tzeng JY. Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level. PLoS Comput Biol 2016; 12:e1004993. [PMID: 27355347 PMCID: PMC4927097 DOI: 10.1371/journal.pcbi.1004993] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2015] [Accepted: 05/21/2016] [Indexed: 11/24/2022] Open
Abstract
Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information. Next-generation sequencing technologies have allowed genetic association studies of complex traits at the single base-pair resolution, where most genetic variants have extremely low mutation frequencies. These rare variants have been the focus of modern statistical-computational genomics due to their potential to explain missing disease heritability. The identification of individual rare variants associated with diseases can provide new biological insights and enable the precise delineation of disease mechanisms. However, due to the extreme rarity of mutations and large numbers of variants, significances of causative variants tend to be mixed inseparably with a few noncausative ones, and standard multiple testing procedures controlling for false positives fail to provide a meaningful way to include a large proportion of the causative variants. To address the challenge of detecting weak biological signals, we propose a novel statistical procedure, based on false-negative control, to provide a practical approach for variant inclusion in large-scale sequencing studies. By determining those variants that can be confidently dispatched as noncausative, the proposed procedure offers an objective selection of a modest number of potentially causative variants at the single-locus level. Results can be further prioritized or used to infer disease-associated genes with annotation information.
Collapse
Affiliation(s)
- Xinge Jessie Jeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Zhongyin John Daye
- Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
35
|
The impact of genotype calling errors on family-based studies. Sci Rep 2016; 6:28323. [PMID: 27328765 PMCID: PMC4916415 DOI: 10.1038/srep28323] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Accepted: 05/31/2016] [Indexed: 02/07/2023] Open
Abstract
Family-based sequencing studies have unique advantages in enriching rare variants, controlling population stratification, and improving genotype calling. Standard genotype calling algorithms are less likely to call rare variants correctly, often mistakenly calling heterozygotes as reference homozygotes. The consequences of such non-random errors on association tests for rare variants are unclear, particularly in transmission-based tests. In this study, we investigated the impact of genotyping errors on rare variant association tests of family-based sequence data. We performed a comprehensive analysis to study how genotype calling errors affect type I error and statistical power of transmission-based association tests using a variety of realistic parameters in family-based sequencing studies. In simulation studies, we found that biased genotype calling errors yielded not only an inflation of type I error but also a power loss of association tests. We further confirmed our observation using exome sequence data from an autism project. We concluded that non-symmetric genotype calling errors need careful consideration in the analysis of family-based sequence data and we provided practical guidance on ameliorating the test bias.
Collapse
|
36
|
Yan Q, Weeks DE, Tiwari HK, Yi N, Zhang K, Gao G, Lin WY, Lou XY, Chen W, Liu N. Rare-Variant Kernel Machine Test for Longitudinal Data from Population and Family Samples. Hum Hered 2016; 80:126-38. [PMID: 27161037 DOI: 10.1159/000445057] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2015] [Accepted: 02/24/2016] [Indexed: 01/12/2023] Open
Abstract
OBJECTIVE The kernel machine (KM) test reportedly performs well in the set-based association test of rare variants. Many studies have been conducted to measure phenotypes at multiple time points, but the standard KM methodology has only been available for phenotypes at a single time point. In addition, family-based designs have been widely used in genetic association studies; therefore, the data analysis method used must appropriately handle familial relatedness. A rare-variant test does not currently exist for longitudinal data from family samples. Therefore, in this paper, we aim to introduce an association test for rare variants, which includes multiple longitudinal phenotype measurements for either population or family samples. METHODS This approach uses KM regression based on the linear mixed model framework and is applicable to longitudinal data from either population (L-KM) or family samples (LF-KM). RESULTS In our population-based simulation studies, L-KM has good control of Type I error rate and increased power in all the scenarios we considered compared with other competing methods. Conversely, in the family-based simulation studies, we found an inflated Type I error rate when L-KM was applied directly to the family samples, whereas LF-KM retained the desired Type I error rate and had the best power performance overall. Finally, we illustrate the utility of our proposed LF-KM approach by analyzing data from an association study between rare variants and blood pressure from the Genetic Analysis Workshop 18 (GAW18). CONCLUSION We propose a method for rare-variant association testing in population and family samples using phenotypes measured at multiple time points for each subject. The proposed method has the best power performance compared to competing approaches in our simulation study.
Collapse
Affiliation(s)
- Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, Pa., USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Oualkacha K, Lakhal-Chaieb L, Greenwood CM. Software Application Profile: RVPedigree: a suite of family-based rare variant association tests for normally and non-normally distributed quantitative traits. Int J Epidemiol 2016; 45:402-7. [PMID: 27085080 DOI: 10.1093/ije/dyw047] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/18/2016] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION RVPedigree (Rare Variant association tests in Pedigrees) implements a suite of programs facilitating genome-wide analysis of association between a quantitative trait and autosomal region-based genetic variation. The main features here are the ability to appropriately test for association of rare variants with non-normally distributed quantitative traits, and also to appropriately adjust for related individuals, either from families or from population structure and cryptic relatedness. IMPLEMENTATION RVPedigree is available as an R package. GENERAL FEATURES The package includes calculation of kinship matrices, various options for coping with non-normality, three different ways of estimating statistical significance incorporating triaging to enable efficient use of the most computationally-intensive calculations, and a parallelization option for genome-wide analysis. AVAILABILITY The software is available from the Comprehensive R Archive Network [CRAN.R-project.org] under the name 'RVPedigree' and at [https://github.com/GreenwoodLab]. It has been published under General Public License (GPL) version 3 or newer.
Collapse
Affiliation(s)
- Karim Oualkacha
- Départment de mathématiques, Université du Québec à Montréal, QC, Canada
| | - Lajmi Lakhal-Chaieb
- Département de mathématiques et de statistique, Université Laval, Québec, QC, Canada
| | - Celia Mt Greenwood
- Lady Davis Institute, Jewish General Hospital, Montréal, QC, Canada Departments of Oncology, Epidemiology, Biostatistics & Occupational Health, and Human Genetics, McGill University, Montreal, QC, Canada Ludmer Centre for Neuroinformatics and Mental Health, Montreal, QC, Canada
| |
Collapse
|
38
|
A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects. Eur J Hum Genet 2016; 24:1344-51. [PMID: 26860061 PMCID: PMC4989219 DOI: 10.1038/ejhg.2016.8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Revised: 12/22/2015] [Accepted: 12/30/2015] [Indexed: 01/05/2023] Open
Abstract
For region-based sequencing data, power to detect genetic associations can be improved through analysis of multiple related phenotypes. With this motivation, we propose a novel test to detect association simultaneously between a set of rare variants, such as those obtained by sequencing in a small genomic region, and multiple continuous phenotypes. We allow arbitrary correlations among the phenotypes and build on a linear mixed model by assuming the effects of the variants follow a multivariate normal distribution with a zero mean and a specific covariance matrix structure. In order to account for the unknown correlation parameter in the covariance matrix of the variant effects, a data-adaptive variance component test based on score-type statistics is derived. As our approach can calculate the P-value analytically, the proposed test procedure is computationally efficient. Broad simulations and an application to the UK10K project show that our proposed multivariate test is generally more powerful than univariate tests, especially when there are pleiotropic effects or highly correlated phenotypes.
Collapse
|
39
|
Friedrichs S, Malzahn D, Pugh EW, Almeida M, Liu XQ, Bailey JN. Filtering genetic variants and placing informative priors based on putative biological function. BMC Genet 2016; 17 Suppl 2:8. [PMID: 26866982 PMCID: PMC4895695 DOI: 10.1186/s12863-015-0313-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.
Collapse
Affiliation(s)
- Stefanie Friedrichs
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Dörthe Malzahn
- Department of Genetic Epidemiology, University Medical Center, Georg-August University Göttingen, Göttingen, Germany.
| | - Elizabeth W Pugh
- Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
| | - Marcio Almeida
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX, USA.
| | - Xiao Qing Liu
- Department of Obstetrics, Gynecology, and Reproductive Sciences, Department of Biochemistry and Medical Genetics, Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada.
- Children's Hospital Research Institute of Manitoba, Winnipeg, MB, Canada.
| | - Julia N Bailey
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, USA.
- Epilepsy Genetics/Genomics Laboratory, West Los Angeles Veterans Administration, Los Angeles, CA, USA.
| |
Collapse
|
40
|
Kim J, Bai Y, Pan W. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics. Genet Epidemiol 2015; 39:651-63. [PMID: 26493956 DOI: 10.1002/gepi.21931] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2015] [Accepted: 08/12/2015] [Indexed: 01/01/2023]
Abstract
We study the problem of testing for single marker-multiple phenotype associations based on genome-wide association study (GWAS) summary statistics without access to individual-level genotype and phenotype data. For most published GWASs, because obtaining summary data is substantially easier than accessing individual-level phenotype and genotype data, while often multiple correlated traits have been collected, the problem studied here has become increasingly important. We propose a powerful adaptive test and compare its performance with some existing tests. We illustrate its applications to analyses of a meta-analyzed GWAS dataset with three blood lipid traits and another with sex-stratified anthropometric traits, and further demonstrate its potential power gain over some existing methods through realistic simulation studies. We start from the situation with only one set of (possibly meta-analyzed) genome-wide summary statistics, then extend the method to meta-analysis of multiple sets of genome-wide summary statistics, each from one GWAS. We expect the proposed test to be useful in practice as more powerful than or complementary to existing methods.
Collapse
Affiliation(s)
- Junghi Kim
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Yun Bai
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
41
|
Associating Multivariate Quantitative Phenotypes with Genetic Variants in Family Samples with a Novel Kernel Machine Regression Method. Genetics 2015; 201:1329-39. [PMID: 26482791 DOI: 10.1534/genetics.115.178590] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 10/04/2015] [Indexed: 11/18/2022] Open
Abstract
The recent development of sequencing technology allows identification of association between the whole spectrum of genetic variants and complex diseases. Over the past few years, a number of association tests for rare variants have been developed. Jointly testing for association between genetic variants and multiple correlated phenotypes may increase the power to detect causal genes in family-based studies, but familial correlation needs to be appropriately handled to avoid an inflated type I error rate. Here we propose a novel approach for multivariate family data using kernel machine regression (denoted as MF-KM) that is based on a linear mixed-model framework and can be applied to a large range of studies with different types of traits. In our simulation studies, the usual kernel machine test has inflated type I error rates when applied directly to familial data, while our proposed MF-KM method preserves the expected type I error rates. Moreover, the MF-KM method has increased power compared to methods that either analyze each phenotype separately while considering family structure or use only unrelated founders from the families. Finally, we illustrate our proposed methodology by analyzing whole-genome genotyping data from a lung function study.
Collapse
|
42
|
Xu Z, Pan W. Approximate score-based testing with application to multivariate trait association analysis. Genet Epidemiol 2015. [PMID: 26198454 DOI: 10.1002/gepi.21911] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
For genome-wide association studies and DNA sequencing studies, several powerful score-based tests, such as kernel machine regression and sum of powered score tests, have been proposed in the last few years. However, extensions of these score-based tests to more complex models, such as mixed-effects models for analysis of multiple and correlated traits, have been hindered by the unavailability of the score vector, due to either no output from statistical software or no closed-form solution at all. We propose a simple and general method to asymptotically approximate the score vector based on an asymptotically normal and consistent estimate of a parameter vector to be tested and its (consistent) covariance matrix. The proposed method is applicable to both maximum-likelihood estimation and estimating function-based approaches. We use the derived approximate score vector to extend several score-based tests to mixed-effects models. We demonstrate the feasibility and possible power gains of these tests in association analysis of multiple and correlated quantitative or binary traits with both real and simulated data. The proposed method is easy to implement with a wide applicability.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | |
Collapse
|
43
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
Affiliation(s)
- Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
44
|
Casale FP, Rakitsch B, Lippert C, Stegle O. Efficient set tests for the genetic analysis of correlated traits. Nat Methods 2015; 12:755-8. [PMID: 26076425 DOI: 10.1038/nmeth.3439] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Accepted: 05/18/2015] [Indexed: 01/17/2023]
Abstract
Set tests are a powerful approach for genome-wide association testing between groups of genetic variants and quantitative traits. We describe mtSet (http://github.com/PMBio/limix), a mixed-model approach that enables joint analysis across multiple correlated traits while accounting for population structure and relatedness. mtSet effectively combines the benefits of set tests with multi-trait modeling and is computationally efficient, enabling genetic analysis of large cohorts (up to 500,000 individuals) and multiple traits.
Collapse
Affiliation(s)
- Francesco Paolo Casale
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Barbara Rakitsch
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | - Christoph Lippert
- 1] Microsoft Research, Los Angeles, California, USA. [2] Human Longevity, Inc., Mountain View, California, USA
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| |
Collapse
|
45
|
Huang YT, Liang L, Moffatt MF, Cookson WOCM, Lin X. iGWAS: Integrative Genome-Wide Association Studies of Genetic and Genomic Data for Disease Susceptibility Using Mediation Analysis. Genet Epidemiol 2015; 39:347-56. [PMID: 25997986 DOI: 10.1002/gepi.21905] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2015] [Revised: 03/23/2015] [Accepted: 04/07/2015] [Indexed: 12/20/2022]
Abstract
Genome-wide association studies (GWAS) have been a standard practice in identifying single nucleotide polymorphisms (SNPs) for disease susceptibility. We propose a new approach, termed integrative GWAS (iGWAS) that exploits the information of gene expressions to investigate the mechanisms of the association of SNPs with a disease phenotype, and to incorporate the family-based design for genetic association studies. Specifically, the relations among SNPs, gene expression, and disease are modeled within the mediation analysis framework, which allows us to disentangle the genetic effect on a disease phenotype into two parts: an effect mediated through a gene expression (mediation effect, ME) and an effect through other biological mechanisms or environment-mediated mechanisms (alternative effect, AE). We develop omnibus tests for the ME and AE that are robust to underlying true disease models. Numerical studies show that the iGWAS approach is able to facilitate discovering genetic association mechanisms, and outperforms the SNP-only method for testing genetic associations. We conduct a family-based iGWAS of childhood asthma that integrates genetic and genomic data. The iGWAS approach identifies six novel susceptibility genes (MANEA, MRPL53, LYCAT, ST8SIA4, NDFIP1, and PTCH1) using the omnibus test with false discovery rate less than 1%, whereas no gene using SNP-only analyses survives with the same cut-off. The iGWAS analyses further characterize that genetic effects of these genes are mostly mediated through their gene expressions. In summary, the iGWAS approach provides a new analytic framework to investigate the mechanism of genetic etiology, and identifies novel susceptibility genes of childhood asthma that were biologically meaningful.
Collapse
Affiliation(s)
- Yen-Tsung Huang
- Departments of Epidemiology and Biostatistics, Brown University, Providence, Rhode Island, United States of America
| | - Liming Liang
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Miriam F Moffatt
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | | | - Xihong Lin
- Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
46
|
Lin KH, Zöllner S. Robust and Powerful Affected Sibpair Test for Rare Variant Association. Genet Epidemiol 2015; 39:325-33. [PMID: 25966809 DOI: 10.1002/gepi.21903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 03/25/2015] [Accepted: 04/01/2015] [Indexed: 11/09/2022]
Abstract
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case-control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family-based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family-based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case-control study for variants with summed risk allele frequency f < 0.05; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene-gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.
Collapse
Affiliation(s)
- Keng-Han Lin
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
47
|
Broadaway KA, Duncan R, Conneely KN, Almli LM, Bradley B, Ressler KJ, Epstein MP. Kernel Approach for Modeling Interaction Effects in Genetic Association Studies of Complex Quantitative Traits. Genet Epidemiol 2015; 39:366-75. [PMID: 25885490 DOI: 10.1002/gepi.21901] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Revised: 02/16/2015] [Accepted: 02/27/2015] [Indexed: 12/29/2022]
Abstract
The etiology of complex traits likely involves the effects of genetic and environmental factors, along with complicated interaction effects between them. Consequently, there has been interest in applying genetic association tests of complex traits that account for potential modification of the genetic effect in the presence of an environmental factor. One can perform such an analysis using a joint test of gene and gene-environment interaction. An optimal joint test would be one that remains powerful under a variety of models ranging from those of strong gene-environment interaction effect to those of little or no gene-environment interaction effect. To fill this demand, we have extended a kernel machine based approach for association mapping of multiple SNPs to consider joint tests of gene and gene-environment interaction. The kernel-based approach for joint testing is promising, because it incorporates linkage disequilibrium information from multiple SNPs simultaneously in analysis and permits flexible modeling of interaction effects. Using simulated data, we show that our kernel machine approach typically outperforms the traditional joint test under strong gene-environment interaction models and further outperforms the traditional main-effect association test under models of weak or no gene-environment interaction effects. We illustrate our test using genome-wide association data from the Grady Trauma Project, a cohort of highly traumatized, at-risk individuals, which has previously been investigated for interaction effects.
Collapse
Affiliation(s)
- K Alaine Broadaway
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Richard Duncan
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Karen N Conneely
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| | - Lynn M Almli
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America
| | - Bekh Bradley
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America.,Department of Veterans Affairs, Atlanta VA Medical Center, Atlanta, Georgia, United States of America
| | - Kerry J Ressler
- Department of Psychiatry and Behavioral Sciences, Emory University, Atlanta, Georgia, United States of America
| | - Michael P Epstein
- Department of Human Genetics, Emory University, Atlanta, Georgia, United States of America
| |
Collapse
|
48
|
Wen X. Bayesian model comparison in genetic association analysis: linear mixed modeling and SNP set testing. Biostatistics 2015; 16:701-12. [PMID: 25796429 DOI: 10.1093/biostatistics/kxv009] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Accepted: 02/08/2015] [Indexed: 11/12/2022] Open
Abstract
We consider the problems of hypothesis testing and model comparison under a flexible Bayesian linear regression model whose formulation is closely connected with the linear mixed effect model and the parametric models for Single Nucleotide Polymorphism (SNP) set analysis in genetic association studies. We derive a class of analytic approximate Bayes factors and illustrate their connections with a variety of frequentist test statistics, including the Wald statistic and the variance component score statistic. Taking advantage of Bayesian model averaging and hierarchical modeling, we demonstrate some distinct advantages and flexibilities in the approaches utilizing the derived Bayes factors in the context of genetic association studies. We demonstrate our proposed methods using real or simulated numerical examples in applications of single SNP association testing, multi-locus fine-mapping and SNP set association testing.
Collapse
Affiliation(s)
- Xiaoquan Wen
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
49
|
Yan Q, Tiwari HK, Yi N, Gao G, Zhang K, Lin WY, Lou XY, Cui X, Liu N. A Sequence Kernel Association Test for Dichotomous Traits in Family Samples under a Generalized Linear Mixed Model. Hum Hered 2015; 79:60-8. [PMID: 25791389 DOI: 10.1159/000375409] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 01/21/2015] [Indexed: 01/15/2023] Open
Abstract
OBJECTIVE The existing methods for identifying multiple rare variants underlying complex diseases in family samples are underpowered. Therefore, we aim to develop a new set-based method for an association study of dichotomous traits in family samples. METHODS We introduce a framework for testing the association of genetic variants with diseases in family samples based on a generalized linear mixed model. Our proposed method is based on a kernel machine regression and can be viewed as an extension of the sequence kernel association test (SKAT and famSKAT) for application to family data with dichotomous traits (F-SKAT). RESULTS Our simulation studies show that the original SKAT has inflated type I error rates when applied directly to family data. By contrast, our proposed F-SKAT has the correct type I error rate. Furthermore, in all of the considered scenarios, F-SKAT, which uses all family data, has higher power than both SKAT, which uses only unrelated individuals from the family data, and another method, which uses all family data. CONCLUSION We propose a set-based association test that can be used to analyze family data with dichotomous phenotypes while handling genetic variants with the same or opposite directions of effects as well as any types of family relationships.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Ala., USA
| | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Feng S, Pistis G, Zhang H, Zawistowski M, Mulas A, Zoledziewska M, Holmen OL, Busonero F, Sanna S, Hveem K, Willer C, Cucca F, Liu DJ, Abecasis GR. Methods for association analysis and meta-analysis of rare variants in families. Genet Epidemiol 2015; 39:227-38. [PMID: 25740221 DOI: 10.1002/gepi.21892] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Revised: 01/03/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.
Collapse
Affiliation(s)
- Shuang Feng
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|