1
|
Ghavi Hossein-Zadeh N. An overview of recent technological developments in bovine genomics. Vet Anim Sci 2024; 25:100382. [PMID: 39166173 PMCID: PMC11334705 DOI: 10.1016/j.vas.2024.100382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/22/2024] Open
Abstract
Cattle are regarded as highly valuable animals because of their milk, beef, dung, fur, and ability to draft. The scientific community has tried a number of strategies to improve the genetic makeup of bovine germplasm. To ensure higher returns for the dairy and beef industries, researchers face their greatest challenge in improving commercially important traits. One of the biggest developments in the last few decades in the creation of instruments for cattle genetic improvement is the discovery of the genome. Breeding livestock is being revolutionized by genomic selection made possible by the availability of medium- and high-density single nucleotide polymorphism (SNP) arrays coupled with sophisticated statistical techniques. It is becoming easier to access high-dimensional genomic data in cattle. Continuously declining genotyping costs and an increase in services that use genomic data to increase return on investment have both made a significant contribution to this. The field of genomics has come a long way thanks to groundbreaking discoveries such as radiation-hybrid mapping, in situ hybridization, synteny analysis, somatic cell genetics, cytogenetic maps, molecular markers, association studies for quantitative trait loci, high-throughput SNP genotyping, whole-genome shotgun sequencing to whole-genome mapping, and genome editing. These advancements have had a significant positive impact on the field of cattle genomics. This manuscript aimed to review recent advances in genomic technologies for cattle breeding and future prospects in this field.
Collapse
Affiliation(s)
- Navid Ghavi Hossein-Zadeh
- Department of Animal Science, Faculty of Agricultural Sciences, University of Guilan, Rasht, 41635-1314, Iran
| |
Collapse
|
2
|
Wang J, Zhou F, Li C, Yin N, Liu H, Zhuang B, Huang Q, Wen Y. Gene Association Analysis of Quantitative Trait Based on Functional Linear Regression Model with Local Sparse Estimator. Genes (Basel) 2023; 14:genes14040834. [PMID: 37107592 PMCID: PMC10137544 DOI: 10.3390/genes14040834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/27/2023] [Accepted: 03/28/2023] [Indexed: 04/03/2023] Open
Abstract
Functional linear regression models have been widely used in the gene association analysis of complex traits. These models retain all the genetic information in the data and take full advantage of spatial information in genetic variation data, which leads to brilliant detection power. However, the significant association signals identified by the high-power methods are not all the real causal SNPs, because it is easy to regard noise information as significant association signals, leading to a false association. In this paper, a method based on the sparse functional data association test (SFDAT) of gene region association analysis is developed based on a functional linear regression model with local sparse estimation. The evaluation indicators CSR and DL are defined to evaluate the feasibility and performance of the proposed method with other indicators. Simulation studies show that: (1) SFDAT performs well under both linkage equilibrium and linkage disequilibrium simulation; (2) SFDAT performs successfully for gene regions (including common variants, low-frequency variants, rare variants and mix variants); (3) With power and type I error rates comparable to OLS and Smooth, SFDAT has a better ability to handle the zero regions. The Oryza sativa data set is analyzed by SFDAT. It is shown that SFDAT can better perform gene association analysis and eliminate the false positive of gene localization. This study showed that SFDAT can lower the interference caused by noise while maintaining high power. SFDAT provides a new method for the association analysis between gene regions and phenotypic quantitative traits.
Collapse
Affiliation(s)
- Jingyu Wang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Fujie Zhou
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Cheng Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ning Yin
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Huiming Liu
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Binxian Zhuang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qingyu Huang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- Correspondence:
| |
Collapse
|
3
|
Li S, Li S, Su S, Zhang H, Shen J, Wen Y. Gene Region Association Analysis of Longitudinal Quantitative Traits Based on a Function-On-Function Regression Model. Front Genet 2022; 13:781740. [PMID: 35265102 PMCID: PMC8899465 DOI: 10.3389/fgene.2022.781740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/04/2022] [Indexed: 11/13/2022] Open
Abstract
In the process of growth and development in life, gene expressions that control quantitative traits will turn on or off with time. Studies of longitudinal traits are of great significance in revealing the genetic mechanism of biological development. With the development of ultra-high-density sequencing technology, the associated analysis has tremendous challenges to statistical methods. In this paper, a longitudinal functional data association test (LFDAT) method is proposed based on the function-on-function regression model. LFDAT can simultaneously treat phenotypic traits and marker information as continuum variables and analyze the association of longitudinal quantitative traits and gene regions. Simulation studies showed that: 1) LFDAT performs well for both linkage equilibrium simulation and linkage disequilibrium simulation, 2) LFDAT has better performance for gene regions (include common variants, low-frequency variants, rare variants and mixture), and 3) LFDAT can accurately identify gene switching in the growth and development stage. The longitudinal data of the Oryza sativa projected shoot area is analyzed by LFDAT. It showed that there is the advantage of quick calculations. Further, an association analysis was conducted between longitudinal traits and gene regions by integrating the micro effects of multiple related variants and using the information of the entire gene region. LFDAT provides a feasible method for studying the formation and expression of longitudinal traits.
Collapse
Affiliation(s)
- Shijing Li
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Shiqin Li
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Shaoqiang Su
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Hui Zhang
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiayu Shen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yongxian Wen
- College of Computer and Information Science, Fujian Agriculture and Forestry University, Fuzhou, China.,> Institute of Statistics and Application, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
4
|
Simulation Research on the Methods of Multi-Gene Region Association Analysis Based on a Functional Linear Model. Genes (Basel) 2022; 13:genes13030455. [PMID: 35328009 PMCID: PMC8954869 DOI: 10.3390/genes13030455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/16/2022] Open
Abstract
Genome-wide association analysis is an important approach to identify genetic variants associated with complex traits. Complex traits are not only affected by single gene loci, but also by the interaction of multiple gene loci. Studies of association between gene regions and quantitative traits are of great significance in revealing the genetic mechanism of biological development. There have been a lot of studies on single-gene region association analysis, but the application of functional linear models in multi-gene region association analysis is still less. In this paper, a functional multi-gene region association analysis test method is proposed based on the functional linear model. From the three directions of common multi-gene region method, multi-gene region weighted method and multi-gene region loci weighted method, that test method is studied combined with computer simulation. The following conclusions are obtained through computer simulation: (a) The functional multi-gene region association analysis test method has higher power than the functional single gene region association analysis test method; (b) The functional multi-gene region weighted method performs better than the common functional multi-gene region method; (c) the functional multi-gene region loci weighted method is the best method for association analysis on three directions of the common multi-gene region method; (d) the performance of the Step method and Multi-gene region loci weighted Step for multi-gene regions is the best in general. Functional multi-gene region association analysis test method can theoretically provide a feasible method for the study of complex traits affected by multiple genes.
Collapse
|
5
|
A Constrained Generalized Functional Linear Model for Multi-Loci Genetic Mapping. STATS 2021. [DOI: 10.3390/stats4030033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In genome-wide association studies (GWAS), efficient incorporation of linkage disequilibria (LD) among densely typed genetic variants into association analysis is a critical yet challenging problem. Functional linear models (FLM), which impose a smoothing structure on the coefficients of correlated covariates, are advantageous in genetic mapping of multiple variants with high LD. Here we propose a novel constrained generalized FLM (cGFLM) framework to perform simultaneous association tests on a block of linked SNPs with various trait types, including continuous, binary and zero-inflated count phenotypes. The new cGFLM applies a set of inequality constraints on the FLM to ensure model identifiability under different genetic codings. The method is implemented via B-splines, and an augmented Lagrangian algorithm is employed for parameter estimation. For hypotheses testing, a test statistic that accounts for the model constraints was derived, following a mixture of chi-square distributions. Simulation results show that cGFLM is effective in identifying causal loci and gene clusters compared to several competing methods based on single markers and SKAT-C. We applied the proposed method to analyze a candidate gene-based COGEND study and a large-scale GWAS data on dental caries risk.
Collapse
|
6
|
Zhang B, Chiu CY, Yuan F, Sang T, Cook RJ, Wilson AF, Bailey-Wilson JE, Chew EY, Xiong M, Fan R. Gene-based analysis of bi-variate survival traits via functional regressions with applications to eye diseases. Genet Epidemiol 2021; 45:455-470. [PMID: 33645812 DOI: 10.1002/gepi.22381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Revised: 01/15/2021] [Accepted: 02/08/2021] [Indexed: 11/12/2022]
Abstract
Genetic studies of two related survival outcomes of a pleiotropic gene are commonly encountered but statistical models to analyze them are rarely developed. To analyze sequencing data, we propose mixed effect Cox proportional hazard models by functional regressions to perform gene-based joint association analysis of two survival traits motivated by our ongoing real studies. These models extend fixed effect Cox models of univariate survival traits by incorporating variations and correlation of multivariate survival traits into the models. The associations between genetic variants and two survival traits are tested by likelihood ratio test statistics. Extensive simulation studies suggest that type I error rates are well controlled and power performances are stable. The proposed models are applied to analyze bivariate survival traits of left and right eyes in the age-related macular degeneration progression.
Collapse
Affiliation(s)
- Bingsong Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee, USA.,Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Fang Yuan
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, People's Republic of China
| | - Tian Sang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA.,School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, China
| | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, Ontario, Canada
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, NIH, Bethesda, Maryland, USA
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas, USA
| | - Ruzong Fan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA.,Computational and Statistical Genomics Branch, National Human Genome, Research Institute, National Institutes of Health (NIH), Baltimore, Maryland, USA
| |
Collapse
|
7
|
Xiang Y, Xiang X, Li Y. Identifying rare variants for quantitative traits in extreme samples of population via Kullback-Leibler distance. BMC Genet 2020; 21:130. [PMID: 33234108 PMCID: PMC7687851 DOI: 10.1186/s12863-020-00951-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 11/10/2020] [Indexed: 11/23/2022] Open
Abstract
Background The rapid development of sequencing technology and simultaneously the availability of large quantities of sequence data has facilitated the identification of rare variant associated with quantitative traits. However, existing statistical methods depend on certain assumptions and thus lacking uniform power. The present study focuses on mapping rare variant associated with quantitative traits. Results In the present study, we proposed a two-stage strategy to identify rare variant of quantitative traits using phenotype extreme selection design and Kullback-Leibler distance, where the first stage was association analysis and the second stage was fine mapping. We presented a statistic and a linkage disequilibrium measure for the first stage and the second stage, respectively. Theory analysis and simulation study showed that (1) the power of the proposed statistic for association analysis increased with the stringency of the sample selection and was affected slightly by non-causal variants and opposite effect variants, (2) the statistic here achieved higher power than three commonly used methods, and (3) the linkage disequilibrium measure for fine mapping was independent of the frequencies of non-causal variants and simply dependent on the frequencies of causal variants. Conclusions We conclude that the two-stage strategy here can be used effectively to mapping rare variant associated with quantitative traits.
Collapse
Affiliation(s)
- Yang Xiang
- School of Mathematics and Computational Science, Huaihua University, Huaihua, Hunan, 418008, People's Republic of China.,Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, Huaihua University, Huaihua, 418008, China.,Key Laboratory of Hunan Higher Education for Western Hunan Medicinal Plant and Ethnobotany, Huaihua University, Huaihua, 418008, China
| | - Xinrong Xiang
- School of Mathematics and Statistics, Hunan Normal University, Changsha, Hunan, 410081, People's Republic of China
| | - Yumei Li
- School of Mathematics and Computational Science, Huaihua University, Huaihua, Hunan, 418008, People's Republic of China. .,Key Laboratory of Research and Utilization of Ethnomedicinal Plant Resources of Hunan Province, Huaihua University, Huaihua, 418008, China. .,Key Laboratory of Hunan Higher Education for Western Hunan Medicinal Plant and Ethnobotany, Huaihua University, Huaihua, 418008, China.
| |
Collapse
|
8
|
Jiang Y, Chiu CY, Yan Q, Chen W, Gorin MB, Conley YP, Lakhal-Chaieb ML, Cook RJ, Amos CI, Wilson AF, Bailey-Wilson JE, McMahon FJ, Vazquez AI, Yuan A, Zhong X, Xiong M, Weeks DE, Fan R. Gene-Based Association Testing of Dichotomous Traits With Generalized Functional Linear Mixed Models Using Extended Pedigrees: Applications to Age-Related Macular Degeneration. J Am Stat Assoc 2020; 116:531-545. [PMID: 34321704 PMCID: PMC8315575 DOI: 10.1080/01621459.2020.1799809] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 07/09/2020] [Accepted: 07/17/2020] [Indexed: 10/23/2022]
Abstract
Genetics plays a role in age-related macular degeneration (AMD), a common cause of blindness in the elderly. There is a need for powerful methods for carrying out region-based association tests between a dichotomous trait like AMD and genetic variants on family data. Here, we apply our new generalized functional linear mixed models (GFLMM) developed to test for gene-based association in a set of AMD families. Using common and rare variants, we observe significant association with two known AMD genes: CFH and ARMS2. Using rare variants, we find suggestive signals in four genes: ASAH1, CLEC6A, TMEM63C, and SGSM1. Intriguingly, ASAH1 is down-regulated in AMD aqueous humor, and ASAH1 deficiency leads to retinal inflammation and increased vulnerability to oxidative stress. These findings were made possible by our GFLMM which model the effect of a major gene as a fixed mean, the polygenic contributions as a random variation, and the correlation of pedigree members by kinship coefficients. Simulations indicate that the GFLMM likelihood ratio tests (LRTs) accurately control the Type I error rates. The LRTs have similar or higher power than existing retrospective kernel and burden statistics. Our GFLMM-based statistics provide a new tool for conducting family-based genetic studies of complex diseases. Supplementary materials for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.
Collapse
Affiliation(s)
- Yingda Jiang
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children’s Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, PA
| | - Michael B. Gorin
- Department of Ophthalmology, David Geffen School of Medicine, UCLA Stein Eye Institute, Los Angeles, CA
| | - Yvette P. Conley
- Department of Health Promotion and Development, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | | | - Richard J. Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | | | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
| | - Francis J. McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, National Institute of Mental Health, NIH, Bethesda, MD
| | - Ana I. Vazquez
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Xiaogang Zhong
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, TX
| | - Daniel E. Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Baltimore, MD
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, DC
| |
Collapse
|
9
|
Cai X, Chang LB, Potter J, Song C. Adaptive Fisher method detects dense and sparse signals in association analysis of SNV sets. BMC Med Genomics 2020; 13:46. [PMID: 32241265 PMCID: PMC7118831 DOI: 10.1186/s12920-020-0684-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND With the development of next generation sequencing (NGS) technology and genotype imputation methods, statistical methods have been proposed to test a set of genomic variants together to detect if any of them is associated with the phenotype or disease. In practice, within the set, there is an unknown proportion of variants truly causal or associated with the disease. There is a demand for statistical methods with high power in both dense and sparse scenarios, where the proportion of causal or associated variants is large or small respectively. RESULTS We propose a new association test - weighted Adaptive Fisher (wAF) that can adapt to both dense and sparse scenarios by adding weights to the Adaptive Fisher (AF) method we developed before. Using simulation, we show that wAF enjoys comparable or better power to popular methods such as sequence kernel association tests (SKAT and SKAT-O) and adaptive SPU (aSPU) test. We apply wAF to a publicly available schizophrenia dataset, and successfully detect thirteen genes. Among them, three genes are supported by existing literature; six are plausible as they either relate to other neurological diseases or have relevant biological functions. CONCLUSIONS The proposed wAF method is a powerful disease-variants association test in both dense and sparse scenarios. Both simulation studies and real data analysis indicate the potential of wAF for new biological findings.
Collapse
Affiliation(s)
- Xiaoyu Cai
- Department of Statistics, The Ohio State University, 1948 Neil Ave., Columbus, OH 43210, US
| | - Lo-Bin Chang
- Department of Statistics, The Ohio State University, 1948 Neil Ave., Columbus, OH 43210, US
| | - Jordan Potter
- Department of Mathematics and Statistics, Kenyon College, 201 N College Rd., Gambier, Ohio 43022, US
| | - Chi Song
- College of Public Health, Division of Biostatistics, The Ohio State University, 1841 Neil Ave., 208E Cunz Hall, Columbus, OH 43210, US
| |
Collapse
|
10
|
Wei Y, Liu Y, Sun T, Chen W, Ding Y. Gene-based association analysis for bivariate time-to-event data through functional regression with copula models. Biometrics 2019; 76:619-629. [PMID: 31625595 DOI: 10.1111/biom.13165] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Accepted: 10/08/2019] [Indexed: 11/28/2022]
Abstract
Several gene-based association tests for time-to-event traits have been proposed recently to detect whether a gene region (containing multiple variants), as a set, is associated with the survival outcome. However, for bivariate survival outcomes, to the best of our knowledge, there is no statistical method that can be directly applied for gene-based association analysis. Motivated by a genetic study to discover the gene regions associated with the progression of a bilateral eye disease, age-related macular degeneration (AMD), we implement a novel functional regression (FR) method under the copula framework. Specifically, the effects of variants within a gene region are modeled through a functional linear model, which then contributes to the marginal survival functions within the copula. Generalized score test statistics are derived to test for the association between bivariate survival traits and the genetic region. Extensive simulation studies are conducted to evaluate the type I error control and power performance of the proposed approach, with comparisons to several existing methods for a single survival trait, as well as the marginal Cox FR model using the robust sandwich estimator for bivariate survival traits. Finally, we apply our method to a large AMD study, the Age-related Eye Disease Study, and to identify the gene regions that are associated with AMD progression.
Collapse
Affiliation(s)
- Yue Wei
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Yi Liu
- Department of Biostatistics and Data Sciences, Boehringer Ingelheim, Ridgefield, Connecticut
| | - Tao Sun
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Wei Chen
- Department of Pediatrics, Children's Hospital of Pittsburgh, Pittsburgh, Pennsylvania
| | - Ying Ding
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
11
|
Chiu CY, Zhang B, Wang S, Shao J, Lakhal-Chaieb ML, Cook RJ, Wilson AF, Bailey-Wilson JE, Xiong M, Fan R. Gene-based association analysis of survival traits via functional regression-based mixed effect cox models for related samples. Genet Epidemiol 2019; 43:952-965. [PMID: 31502722 DOI: 10.1002/gepi.22254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/26/2019] [Accepted: 07/16/2019] [Indexed: 01/09/2023]
Abstract
The importance to integrate survival analysis into genetics and genomics is widely recognized, but only a small number of statisticians have produced relevant work toward this study direction. For unrelated population data, functional regression (FR) models have been developed to test for association between a quantitative/dichotomous/survival trait and genetic variants in a gene region. In major gene association analysis, these models have higher power than sequence kernel association tests. In this paper, we extend this approach to analyze censored traits for family data or related samples using FR based mixed effect Cox models (FamCoxME). The FamCoxME model effect of major gene as fixed mean via functional data analysis techniques, the local gene or polygene variations or both as random, and the correlation of pedigree members by kinship coefficients or genetic relationship matrix or both. The association between the censored trait and the major gene is tested by likelihood ratio tests (FamCoxME FR LRT). Simulation results indicate that the LRT control the type I error rates accurately/conservatively and have good power levels when both local gene or polygene variations are modeled. The proposed methods were applied to analyze a breast cancer data set from the Consortium of Investigators of Modifiers of BRCA1 and BRCA2 (CIMBA). The FamCoxME provides a new tool for gene-based analysis of family-based studies or related samples.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee
| | - Bingsong Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Shuqi Wang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Jingyi Shao
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | | | - Richard J Cook
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland
| | - Momiao Xiong
- Department of Biostatistics, Human Genetics Center, University of Texas-Houston, Houston, Texas
| | - Ruzong Fan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| |
Collapse
|
12
|
Geng P, Tong X, Lu Q. An integrative U method for joint analysis of multi-level omic data. BMC Genet 2019; 20:40. [PMID: 30967125 PMCID: PMC6457037 DOI: 10.1186/s12863-019-0742-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Accepted: 03/20/2019] [Indexed: 11/30/2022] Open
Abstract
Background The advance of high-throughput technologies has made it cost-effective to collect diverse types of omic data in large-scale clinical and biological studies. While the collection of the vast amounts of multi-level omic data from these studies provides a great opportunity for genetic research, the high dimensionality of omic data and complex relationships among multi-level omic data bring tremendous analytic challenges. Results To address these challenges, we develop an integrative U (IU) method for the design and analysis of multi-level omic data. While non-parametric methods make less model assumptions and are flexible for analyzing different types of phenotypes and omic data, they have been less developed for association analysis of omic data. The IU method is a nonparametric method that can accommodate various types of omic and phenotype data, and consider interactive relationship among different levels of omic data. Through simulations and a real data application, we compare the IU test with commonly used variance component tests. Conclusions Results show that the proposed test attains more robust type I error performance and higher empirical power than variance component tests under various types of phenotypes and different underlying interaction effects. Electronic supplementary material The online version of this article (10.1186/s12863-019-0742-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Pei Geng
- Department of Mathematics, Illinois State University, Normal, IL, 61761, USA
| | - Xiaoran Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.
| |
Collapse
|
13
|
Svishcheva GR. A generalized model for combining dependent SNP-level summary statistics and its extensions to statistics of other levels. Sci Rep 2019; 9:5461. [PMID: 30940856 PMCID: PMC6445108 DOI: 10.1038/s41598-019-41827-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Accepted: 03/06/2019] [Indexed: 11/12/2022] Open
Abstract
Here I propose a fundamentally new flexible model to reveal the association between a trait and a set of genetic variants in a genomic region/gene. This model was developed for the situation when original individual-level phenotype and genotype data are not available, but the researcher possesses the results of statistical analyses conducted on these data (namely, SNP-level summary Z score statistics and SNP-by-SNP correlations). The new model was analytically derived from the classical multiple linear regression model applied for the region-based association analysis of individual-level phenotype and genotype data by using the linear compression of data, where the SNP-by-SNP correlations are among the explanatory variables, and the summary Z score statistics are categorized as the response variables. I analytically show that the regional association analysis methods developed within the framework of the classical multiple linear regression model with additive effects of genetic variants can be reformulated in terms of the new model without the loss of information. The results obtained from the regional association analysis utilizing the classical model and those derived using the proposed model are identical when SNP-by-SNP correlations and SNP-level statistics are estimated from the same genetic data.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090, Russia. .,Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia.
| |
Collapse
|
14
|
Chiu CY, Yuan F, Zhang BS, Yuan A, Li X, Fang HB, Lange K, Weeks DE, Wilson AF, Bailey-Wilson JE, Musolf AM, Stambolian D, Lakhal-Chaieb ML, Cook RJ, McMahon FJ, Amos CI, Xiong M, Fan R. Linear mixed models for association analysis of quantitative traits with next-generation sequencing data. Genet Epidemiol 2019; 43:189-206. [PMID: 30537345 PMCID: PMC6375753 DOI: 10.1002/gepi.22177] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 08/27/2018] [Accepted: 09/26/2018] [Indexed: 01/01/2023]
Abstract
We develop linear mixed models (LMMs) and functional linear mixed models (FLMMs) for gene-based tests of association between a quantitative trait and genetic variants on pedigrees. The effects of a major gene are modeled as a fixed effect, the contributions of polygenes are modeled as a random effect, and the correlations of pedigree members are modeled via inbreeding/kinship coefficients. F -statistics and χ 2 likelihood ratio test (LRT) statistics based on the LMMs and FLMMs are constructed to test for association. We show empirically that the F -distributed statistics provide a good control of the type I error rate. The F -test statistics of the LMMs have similar or higher power than the FLMMs, kernel-based famSKAT (family-based sequence kernel association test), and burden test famBT (family-based burden test). The F -statistics of the FLMMs perform well when analyzing a combination of rare and common variants. For small samples, the LRT statistics of the FLMMs control the type I error rate well at the nominal levels α = 0.01 and 0.05 . For moderate/large samples, the LRT statistics of the FLMMs control the type I error rates well. The LRT statistics of the LMMs can lead to inflated type I error rates. The proposed models are useful in whole genome and whole exome association studies of complex traits.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Division of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, Tennessee
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Fang Yuan
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| | - Bing-Song Zhang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Ao Yuan
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Xin Li
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Hong-Bin Fang
- Department of Biostatistics, Bioinformatics, and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia
| | - Kenneth Lange
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Anthony M Musolf
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
| | - Dwight Stambolian
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania
| | | | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, Ontario, Quebec, Canada
| | - Francis J McMahon
- Human Genetics Branch and Genetic Basis of Mood and Anxiety Disorders Section, University of Waterloo, National Institute of Mental Health, NIH, Bethesda, Maryland
| | | | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas
| | - Ruzong Fan
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health (NIH), Bethesda, Maryland
- Department of Biochemistry and Molecular Biology, School of Basic Medicine, Kunming Medical University, Kunming, Yunnan, China
| |
Collapse
|
15
|
Belonogova NM, Svishcheva GR, Wilson JF, Campbell H, Axenovich TI. Weighted functional linear regression models for gene-based association analysis. PLoS One 2018; 13:e0190486. [PMID: 29309409 PMCID: PMC5757938 DOI: 10.1371/journal.pone.0190486] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 12/17/2017] [Indexed: 11/19/2022] Open
Abstract
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Collapse
Affiliation(s)
- Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - James F. Wilson
- Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Edinburgh, Scotland
| | - Harry Campbell
- Centre for Global Health Research, Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
16
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
17
|
Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models. Eur J Hum Genet 2016; 25:350-359. [PMID: 28000696 DOI: 10.1038/ejhg.2016.170] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2016] [Revised: 07/26/2016] [Accepted: 09/27/2016] [Indexed: 11/09/2022] Open
Abstract
To analyze next-generation sequencing data, multivariate functional linear models are developed for a meta-analysis of multiple studies to connect genetic variant data to multiple quantitative traits adjusting for covariates. The goal is to take the advantage of both meta-analysis and pleiotropic analysis in order to improve power and to carry out a unified association analysis of multiple studies and multiple traits of complex disorders. Three types of approximate F -distributions based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants. Simulation analysis is performed to evaluate false-positive rates and power of the proposed tests. The proposed methods are applied to analyze lipid traits in eight European cohorts. It is shown that it is more advantageous to perform multivariate analysis than univariate analysis in general, and it is more advantageous to perform meta-analysis of multiple studies instead of analyzing the individual studies separately. The proposed models require individual observations. The value of the current paper can be seen at least for two reasons: (a) the proposed methods can be applied to studies that have individual genotype data; (b) the proposed methods can be used as a criterion for future work that uses summary statistics to build test statistics to meta-analyze the data.
Collapse
|
18
|
Chiu CY, Jung J, Wang Y, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Mills JL, Boehnke M, Xiong M, Fan R. A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for next-generation sequencing. Genet Epidemiol 2016; 41:18-34. [PMID: 27917525 DOI: 10.1002/gepi.22014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 09/01/2016] [Accepted: 09/19/2016] [Indexed: 01/23/2023]
Abstract
In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F-distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high-dimensional genotype data. It is shown that approximate F-distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models that perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods.
Collapse
Affiliation(s)
- Chi-Yang Chiu
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, USA
| | - Jeesun Jung
- Laboratory of Epidemiology and Biometry, National Institute on Alcohol, Abuse and Alcoholism, NIH, Bethesda, MD, USA
| | - Yifan Wang
- Center for Drug Evaluation and Research, Food and Drug Administration, Silver Spring, MD, USA
| | - Daniel E Weeks
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD, USA
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - James L Mills
- Epidemiology Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, USA
| | - Michael Boehnke
- Department of Biostatistics, School of Public Health, The University of Michigan, Ann Arbor, MI, USA
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, TX, USA
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, MD, USA
| |
Collapse
|
19
|
Wang P, Rahman M, Jin L, Xiong M. A new statistical framework for genetic pleiotropic analysis of high dimensional phenotype data. BMC Genomics 2016; 17:881. [PMID: 27821073 PMCID: PMC5100198 DOI: 10.1186/s12864-016-3169-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 10/18/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The widely used genetic pleiotropic analyses of multiple phenotypes are often designed for examining the relationship between common variants and a few phenotypes. They are not suited for both high dimensional phenotypes and high dimensional genotype (next-generation sequencing) data. To overcome limitations of the traditional genetic pleiotropic analysis of multiple phenotypes, we develop sparse structural equation models (SEMs) as a general framework for a new paradigm of genetic analysis of multiple phenotypes. To incorporate both common and rare variants into the analysis, we extend the traditional multivariate SEMs to sparse functional SEMs. To deal with high dimensional phenotype and genotype data, we employ functional data analysis and the alternative direction methods of multiplier (ADMM) techniques to reduce data dimension and improve computational efficiency. RESULTS Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods. Simulations also demonstrate that the gene-based pleiotropic analysis has higher power than the single variant-based pleiotropic analysis. The proposed method is applied to exome sequence data from the NHLBI's Exome Sequencing Project (ESP) with 11 phenotypes, which identifies a network with 137 genes connected to 11 phenotypes and 341 edges. Among them, 114 genes showed pleiotropic genetic effects and 45 genes were reported to be associated with phenotypes in the analysis or other cardiovascular disease (CVD) related phenotypes in the literature. CONCLUSIONS Our proposed sparse functional SEMs can incorporate both common and rare variants into the analysis and the ADMM algorithm can efficiently solve the penalized SEMs. Using this model we can jointly infer genetic architecture and casual phenotype network structure, and decompose the genetic effect into direct, indirect and total effect. Using large scale simulations we showed that the proposed methods have higher power to detect true causal genetic pleiotropic structure than other existing methods.
Collapse
Affiliation(s)
- Panpan Wang
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA.,State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China
| | - Mohammad Rahman
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA
| | - Li Jin
- State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, 200433, China.
| | - Momiao Xiong
- Human Genetics Center, Department of Biostatistics, University of Texas School of Public Health, Houston, TX, 77030, USA. .,Human Genetics Center, The University of Texas Health Science Center at Houston, P.O. Box 20186, Houston, TX, 77225, USA.
| |
Collapse
|
20
|
Svishcheva GR, Belonogova NM, Axenovich TI. Functional linear models for region-based association analysis. RUSS J GENET+ 2016. [DOI: 10.1134/s1022795416100124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
21
|
Fan R, Chiu CY, Jung J, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Chen Z, Mills JL, Xiong M. A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 2016; 40:702-721. [PMID: 27374056 DOI: 10.1002/gepi.21984] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 03/08/2016] [Accepted: 04/26/2016] [Indexed: 12/22/2022]
Abstract
In association studies of complex traits, fixed-effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance-component tests based on mixed models were developed for region-based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT-O), and a combined sum test of rare and common variant effect (SKAT-C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT-O, and SKAT-C, (ii) traditional fixed-effect additive models, and (iii) fixed-effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed-effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed-effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT-O/SKAT-C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed-effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Chi-Yang Chiu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Jeesun Jung
- Laboratory of Epidemiology and Biometry, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Daniel E Weeks
- Departments of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America.,Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Alexander F Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Christopher I Amos
- Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America
| | - Zhen Chen
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas-Houston, Houston, Texas, United States of America
| |
Collapse
|
22
|
Reimherr M, Nicolae D. Estimating Variance Components in Functional Linear Models With Applications to Genetic Heritability. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1016224] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
23
|
Zhang F, Xie D, Liang M, Xiong M. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits. PLoS Genet 2016; 12:e1005965. [PMID: 27104857 PMCID: PMC4841563 DOI: 10.1371/journal.pgen.1005965] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 03/08/2016] [Indexed: 12/02/2022] Open
Abstract
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes. The widely used statistical methods test interaction for single phenotype. However, we often observe pleotropic genetic interaction effects. The simultaneous gene-gene (GxG) interaction analysis of multiple complementary traits will increase statistical power to detect GxG interactions. Although GxG interactions play an important role in uncovering the genetic structure of complex traits, the statistical methods for detecting GxG interactions in multiple phenotypes remains less developed owing to its potential complexity. Therefore, we extend functional regression model from single variate to multivariate for simultaneous GxG interaction analysis of multiple correlated phenotypes. Large-scale simulations are conducted to evaluate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare power with traditional multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for interaction analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic GxG interactions. 267 pairs of genes that formed a genetic interaction network showed significant evidence of interactions influencing five traits.
Collapse
Affiliation(s)
- Futao Zhang
- Department of Computer Science, College of Internet of Things, Hohai University, Changzhou, China
| | - Dan Xie
- College of Information Engineering, Hubei University of Chinese Medicine, Hubei, China
| | - Meimei Liang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang, China
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
24
|
Svishcheva GR, Belonogova NM, Axenovich TI. Some pitfalls in application of functional data analysis approach to association studies. Sci Rep 2016; 6:23918. [PMID: 27041739 PMCID: PMC4819216 DOI: 10.1038/srep23918] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 03/16/2016] [Indexed: 11/26/2022] Open
Abstract
One of the most effective methods for gene-based mapping employs functional data analysis, which smoothes data using standard basis functions. The full functional linear model includes a functional representation of genotypes and their effects, while the beta-smooth only model smoothes the genotype effects only. Benefits and limitations of the beta-smooth only model should be studied before using it in practice. Here we analytically compare the full and beta-smooth only models under various scenarios. We show that when the full model employs two sets of basis functions equal in type and number, genotypes smoothing is eliminated from the model and it becomes analytically equivalent to the beta-smooth only model. If the basis functions differ only in type, genotypes smoothing is also eliminated from the full model, but the type of basis functions used for smoothing genotype effects becomes redefined. This leads to misinterpretation of the results and may reduce statistical power. When basis functions differ in number, no analytical comparison of the full and beta-smooth only models is possible. However, we show that the numbers of basis functions set unequal can become equal during the analysis, and the full model becomes disadvantageous.
Collapse
Affiliation(s)
- G R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - N M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - T I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
25
|
Vsevolozhskaya OA, Zaykin DV, Barondess DA, Tong X, Jadhav S, Lu Q. Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models. Genet Epidemiol 2016; 40:210-221. [PMID: 27027515 PMCID: PMC4817279 DOI: 10.1002/gepi.21955] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Revised: 12/04/2015] [Accepted: 12/14/2015] [Indexed: 12/27/2022]
Abstract
Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study.
Collapse
Affiliation(s)
| | - Dmitri V. Zaykin
- National Institute of Environmental Health Sciences, National Institutes of Health, USA
| | - David A. Barondess
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, USA
| | - Xiaoren Tong
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, USA
| | - Sneha Jadhav
- Department of Statistics, Michigan State University, East Lansing, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, USA
| |
Collapse
|
26
|
Fan R, Wang Y, Yan Q, Ding Y, Weeks DE, Lu Z, Ren H, Cook RJ, Xiong M, Swaroop A, Chew EY, Chen W. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions. Genet Epidemiol 2016; 40:133-43. [PMID: 26782979 DOI: 10.1002/gepi.21947] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 10/13/2015] [Accepted: 11/05/2015] [Indexed: 11/07/2022]
Abstract
Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example.
Collapse
Affiliation(s)
- Ruzong Fan
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Yifan Wang
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Qi Yan
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Ying Ding
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Daniel E Weeks
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Zhaohui Lu
- Division of Intramural Population Health Research, Biostatistics and Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health (NIH), Bethesda, Maryland, United States of America
| | - Haobo Ren
- Regeneron Pharmaceuticals, Inc, Basking Ridge, New Jersey, United States of America
| | - Richard J Cook
- Department of Statistics and Actuarial Science, Waterloo, ON, Canada
| | - Momiao Xiong
- Human Genetics Center, University of Texas, Houston, Texas, United States of America
| | - Anand Swaroop
- Neurobiology-Neurodegeneration and Repair Laboratory, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Emily Y Chew
- Division of Epidemiology and Clinical Applications, National Eye Institute, NIH, Bethesda, Maryland, United States of America
| | - Wei Chen
- Division of Pulmonary Medicine, Allergy and Immunology, Children's Hospital of Pittsburgh at The University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
27
|
Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 2015; 202:457-70. [PMID: 26715663 DOI: 10.1534/genetics.115.180869] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao's efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao's efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao's efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao's efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10(-6)), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10(-5)), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.
Collapse
|
28
|
Genome-wide gene-gene interaction analysis for next-generation sequencing. Eur J Hum Genet 2015; 24:421-8. [PMID: 26173972 DOI: 10.1038/ejhg.2015.147] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 04/21/2015] [Accepted: 05/26/2015] [Indexed: 01/13/2023] Open
Abstract
The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study.
Collapse
|
29
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
Affiliation(s)
- Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
30
|
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models. Genetics 2015; 200:1089-104. [PMID: 26058849 DOI: 10.1534/genetics.115.178343] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/05/2015] [Indexed: 11/18/2022] Open
Abstract
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.
Collapse
|
31
|
Wang Y, Liu A, Mills JL, Boehnke M, Wilson AF, Bailey-Wilson JE, Xiong M, Wu CO, Fan R. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models. Genet Epidemiol 2015; 39:259-75. [PMID: 25809955 PMCID: PMC4443751 DOI: 10.1002/gepi.21895] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 01/28/2015] [Accepted: 01/28/2015] [Indexed: 10/23/2022]
Abstract
In genetics, pleiotropy describes the genetic effect of a single gene on multiple phenotypic traits. A common approach is to analyze the phenotypic traits separately using univariate analyses and combine the test results through multiple comparisons. This approach may lead to low power. Multivariate functional linear models are developed to connect genetic variant data to multiple quantitative traits adjusting for covariates for a unified analysis. Three types of approximate F-distribution tests based on Pillai-Bartlett trace, Hotelling-Lawley trace, and Wilks's Lambda are introduced to test for association between multiple quantitative traits and multiple genetic variants in one genetic region. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and optimal sequence kernel association test (SKAT-O). Extensive simulations were performed to evaluate the false positive rates and power performance of the proposed models and tests. We show that the approximate F-distribution tests control the type I error rates very well. Overall, simultaneous analysis of multiple traits can increase power performance compared to an individual test of each trait. The proposed methods were applied to analyze (1) four lipid traits in eight European cohorts, and (2) three biochemical traits in the Trinity Students Study. The approximate F-distribution tests provide much more significant results than those of F-tests of univariate analysis and SKAT-O for the three biochemical traits. The approximate F-distribution tests of the proposed functional linear models are more sensitive than those of the traditional multivariate linear models that in turn are more sensitive than SKAT-O in the univariate case. The analysis of the four lipid traits and the three biochemical traits detects more association than SKAT-O in the univariate case.
Collapse
Affiliation(s)
- Yifan Wang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Aiyi Liu
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - James L. Mills
- Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Michael Boehnke
- Department of Biostatistics, School of Public Health, The University of Michigan, Ann Arbor, Michigan, United States of America
| | - Alexander F. Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Joan E. Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Momiao Xiong
- Human Genetics Center, University of Texas - Houston, Houston, Texas, United States of America
| | - Colin O. Wu
- Office of Biostatistics Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
32
|
Fan R, Wang Y, Mills JL, Carter TC, Lobach I, Wilson AF, Bailey-Wilson JE, Weeks DE, Xiong M. Generalized functional linear models for gene-based case-control association studies. Genet Epidemiol 2014; 38:622-637. [PMID: 25203683 PMCID: PMC4189986 DOI: 10.1002/gepi.21840] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 04/29/2014] [Accepted: 05/28/2014] [Indexed: 01/23/2023]
Abstract
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - Yifan Wang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - James L. Mills
- Epidemiology Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - Tonia C. Carter
- Center for Human Genetics, Marshfield Clinic, Marshfield, WI 54449
| | - Iryna Lobach
- Department of Neurology, School of Medicine University of California, San Francisco, CA 94185
| | - Alexander F. Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch National Human Genome Research Institute National Institutes of Health, Bethesda, MD 20892
| | - Joan E. Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch National Human Genome Research Institute National Institutes of Health, Bethesda, MD 20892
| | - Daniel E. Weeks
- Departments of Human Genetics and Biostatistics, Graduate School of Public Health University of Pittsburgh, Pittsburgh, PA 15261
| | - Momiao Xiong
- Human Genetics Center, University of Texas - Houston P.O. Box 20334, Houston, Texas 77225
| |
Collapse
|
33
|
Vsevolozhskaya OA, Zaykin DV, Greenwood MC, Wei C, Lu Q. Functional analysis of variance for association studies. PLoS One 2014; 9:e105074. [PMID: 25244256 PMCID: PMC4171465 DOI: 10.1371/journal.pone.0105074] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 07/18/2014] [Indexed: 12/21/2022] Open
Abstract
While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.
Collapse
Affiliation(s)
- Olga A. Vsevolozhskaya
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - Dmitri V. Zaykin
- National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina, United States of America
| | - Mark C. Greenwood
- Department of Mathematical Sciences, Montana State University, Bozeman, Montana, United States of America
| | - Changshuai Wei
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
34
|
Song C, Zhang H. TARV: tree-based analysis of rare variants identifying risk modifying variants in CTNNA2 and CNTNAP2 for alcohol addiction. Genet Epidemiol 2014; 38:552-9. [PMID: 25041903 PMCID: PMC4154634 DOI: 10.1002/gepi.21843] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 06/02/2014] [Accepted: 06/16/2014] [Indexed: 12/18/2022]
Abstract
Since the development of next generation sequencing (NGS) technology, researchers have been extending their efforts on genome-wide association studies (GWAS) from common variants to rare variants to find the missing inheritance. Although various statistical methods have been proposed to analyze rare variants data, they generally face difficulties for complex disease models involving multiple genes. In this paper, we propose a tree-based analysis of rare variants (TARV) that adopts a nonparametric disease model and is capable of exploring gene-gene interactions. We found that TARV outperforms the sequence kernel association test (SKAT) in most of our simulation scenarios, and by notable margins in some cases. By applying TARV to the study of addiction: genetics and environment (SAGE) data, we successfully detected gene CTNNA2 and its 43 specific variants that increase the risk of alcoholism in women, with an odds ratio (OR) of 1.94. This gene has not been detected in the SAGE data. Post hoc literature search also supports the role of CTNNA2 as a likely risk gene for alcohol addiction. In addition, we also detected a plausible protective gene CNTNAP2, whose 97 rare variants can reduce the risk of alcoholism in women, with an OR of 0.55. These findings suggest that TARV can be effective in dissecting genetic variants for complex diseases using rare variants data.
Collapse
Affiliation(s)
- Chi Song
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut 06520, USA
| | - Heping Zhang
- Department of Biostatistics, School of Public Health, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
35
|
Fan R, Wang Y, Mills JL, Wilson AF, Bailey-Wilson JE, Xiong M. Functional linear models for association analysis of quantitative traits. Genet Epidemiol 2014; 37:726-42. [PMID: 24130119 DOI: 10.1002/gepi.21757] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/15/2013] [Accepted: 08/14/2013] [Indexed: 12/19/2022]
Abstract
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, Maryland, United States of America
| | | | | | | | | | | |
Collapse
|
36
|
Zhang F, Boerwinkle E, Xiong M. Epistasis analysis for quantitative traits by functional regression model. Genome Res 2014; 24:989-98. [PMID: 24803592 PMCID: PMC4032862 DOI: 10.1101/gr.161760.113] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10−10) in the ESP, and 11 were replicated in the CHARGE-S study.
Collapse
Affiliation(s)
- Futao Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang 310058, China; Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| |
Collapse
|