1
|
Zorkoltseva IV, Elgaeva EE, Belonogova NM, Kirichenko AV, Svishcheva GR, Freidin MB, Williams FMK, Suri P, Tsepilov YA, Axenovich TI. Multi-Trait Exome-Wide Association Study of Back Pain-Related Phenotypes. Genes (Basel) 2023; 14:1962. [PMID: 37895311 PMCID: PMC10606006 DOI: 10.3390/genes14101962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/16/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023] Open
Abstract
Back pain (BP) is a major contributor to disability worldwide, with heritability estimated at 40-60%. However, less than half of the heritability is explained by common genetic variants identified by genome-wide association studies. More powerful methods and rare and ultra-rare variant analysis may offer additional insight. This study utilized exome sequencing data from the UK Biobank to perform a multi-trait gene-based association analysis of three BP-related phenotypes: chronic back pain, dorsalgia, and intervertebral disc disorder. We identified the SLC13A1 gene as a contributor to chronic back pain via loss-of-function (LoF) and missense variants. This gene has been previously detected in two studies. A multi-trait approach uncovered the novel FSCN3 gene and its impact on back pain through LoF variants. This gene deserves attention because it is only the second gene shown to have an effect on back pain due to LoF variants and represents a promising drug target for back pain therapy.
Collapse
|
2
|
Belonogova NM, Kirichenko AV, Freidin MB, Williams FMK, Suri P, Aulchenko YS, Axenovich TI, Tsepilov YA. Noncoding rare variants in PANX3 are associated with chronic back pain. Pain 2023; 164:864-869. [PMID: 36448979 PMCID: PMC10014492 DOI: 10.1097/j.pain.0000000000002781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/31/2022] [Indexed: 12/05/2022]
Abstract
ABSTRACT Back pain is the leading cause of years lived with disability worldwide, yet surprisingly, little is known regarding the biology underlying this condition. The impact of genetics is known for chronic back pain: its heritability is estimated to be at least 40%. Large genome-wide association studies have shown that common variation may account for up to 35% of chronic back pain heritability; rare variants may explain a portion of the heritability not explained by common variants. In this study, we performed the first gene-based association analysis of chronic back pain using UK Biobank imputed data including rare variants with moderate imputation quality. We discovered 2 genes, SOX5 and PANX3 , influencing chronic back pain. The SOX5 gene is a well-known back pain gene. The PANX3 gene has not previously been described as having a role in chronic back pain. We showed that the association of PANX3 with chronic back pain is driven by rare noncoding intronic polymorphisms. This result was replicated in an independent sample from UK Biobank and validated using a similar phenotype, dorsalgia, from FinnGen Biobank. We also found that the PANX3 gene is associated with intervertebral disk disorders. We can speculate that a possible mechanism of action of PANX3 on back pain is due to its effect on the intervertebral disks.
Collapse
|
3
|
Zlobin AS, Volkova NA, Zinovieva NA, Iolchiev BS, Bagirov VA, Borodin PM, Axenovich TI, Tsepilov YA. Loci Associated with Negative Heterosis for Viability and Meat Productivity in Interspecific Sheep Hybrids. Animals (Basel) 2023; 13:ani13010184. [PMID: 36611792 PMCID: PMC9817718 DOI: 10.3390/ani13010184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 12/15/2022] [Accepted: 12/19/2022] [Indexed: 01/05/2023] Open
Abstract
Negative heterosis can occur on different economically important traits, but the exact biological mechanisms of this phenomenon are still unknown. The present study focuses on determining the genetic factors associated with negative heterosis in interspecific hybrids between domestic sheep (Ovis aries) and argali (Ovis ammon). One locus (rs417431015) associated with viability and two loci (rs413302370, rs402808951) associated with meat productivity were identified. One gene (ARAP2) was prioritized for viability and three for meat productivity (PDE2A, ARAP1, and PCDH15). The loci associated with meat productivity were demonstrated to fit the overdominant inheritance model and could potentially be involved int negative heterosis mechanisms.
Collapse
|
4
|
Svishcheva GR, Tiys ES, Elgaeva EE, Feoktistova SG, Timmers PRHJ, Sharapov SZ, Axenovich TI, Tsepilov YA. A Novel Framework for Analysis of the Shared Genetic Background of Correlated Traits. Genes (Basel) 2022; 13:genes13101694. [PMID: 36292579 PMCID: PMC9602050 DOI: 10.3390/genes13101694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/12/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022] Open
Abstract
We propose a novel effective framework for the analysis of the shared genetic background for a set of genetically correlated traits using SNP-level GWAS summary statistics. This framework called SHAHER is based on the construction of a linear combination of traits by maximizing the proportion of its genetic variance explained by the shared genetic factors. SHAHER requires only full GWAS summary statistics and matrices of genetic and phenotypic correlations between traits as inputs. Our framework allows both shared and unshared genetic factors to be effectively analyzed. We tested our framework using simulation studies, compared it with previous developments, and assessed its performance using three real datasets: anthropometric traits, psychiatric conditions and lipid concentrations. SHAHER is versatile and applicable to summary statistics from GWASs with arbitrary sample sizes and sample overlaps, allows for the incorporation of different GWAS models (Cox, linear and logistic), and is computationally fast.
Collapse
|
5
|
Slavskii SA, Kuznetsov IA, Shashkova TI, Bazykin GA, Axenovich TI, Kondrashov FA, Aulchenko YS. The limits of normal approximation for adult height. Eur J Hum Genet 2021; 29:1082-1091. [PMID: 33664501 PMCID: PMC8298501 DOI: 10.1038/s41431-021-00836-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 01/05/2021] [Accepted: 02/11/2021] [Indexed: 11/14/2022] Open
Abstract
Adult height inspired the first biometrical and quantitative genetic studies and is a test-case trait for understanding heritability. The studies of height led to formulation of the classical polygenic model, that has a profound influence on the way we view and analyse complex traits. An essential part of the classical model is an assumption of additivity of effects and normality of the distribution of the residuals. However, it may be expected that the normal approximation will become insufficient in bigger studies. Here, we demonstrate that when the height of hundreds of thousands of individuals is analysed, the model complexity needs to be increased to include non-additive interactions between sex, environment and genes. Alternatively, the use of log-normal approximation allowed us to still use the additive effects model. These findings are important for future genetic and methodologic studies that make use of adult height as an exemplar trait.
Collapse
|
6
|
Belonogova NM, Zorkoltseva IV, Tsepilov YA, Axenovich TI. Gene-based association analysis identifies 190 genes affecting neuroticism. Sci Rep 2021; 11:2484. [PMID: 33510330 PMCID: PMC7844228 DOI: 10.1038/s41598-021-82123-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 01/15/2021] [Indexed: 11/25/2022] Open
Abstract
Neuroticism is a personality trait, which is an important risk factor for psychiatric disorders. Recent genome-wide studies reported about 600 genes potentially influencing neuroticism. Little is known about the mechanisms of their action. Here, we aimed to conduct a more detailed analysis of genes that can regulate the level of neuroticism. Using UK Biobank-based GWAS summary statistics, we performed a gene-based association analysis using four sets of within-gene variants, each set possessing specific protein-coding properties. To guard against the influence of strong GWAS signals outside the gene, we used a specially designed procedure called “polygene pruning”. As a result, we identified 190 genes associated with neuroticism due to the effect of within-gene variants rather than strong GWAS signals outside the gene. Thirty eight of these genes are new. Within all genes identified, we distinguished two slightly overlapping groups obtained from using protein-coding and non-coding variants. Many genes in the former group included potentially pathogenic variants. For some genes in the latter group, we found evidence of pleiotropy with gene expression. Using a bioinformatics analysis, we prioritized the neuroticism genes and showed that the genes that contribute to neuroticism through their within-gene variants are the most appropriate candidate genes.
Collapse
|
7
|
Svishcheva GR, Belonogova NM, Zorkoltseva IV, Kirichenko AV, Axenovich TI. Gene-based association tests using GWAS summary statistics. Bioinformatics 2020; 35:3701-3708. [PMID: 30860568 DOI: 10.1093/bioinformatics/btz172] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/12/2019] [Accepted: 03/11/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. RESULTS We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. AVAILABILITY AND IMPLEMENTATION The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
8
|
Silva CT, Zorkoltseva IV, Niemeijer MN, van den Berg ME, Amin N, Demirkan A, van Leeuwen E, Iglesias AI, Piñeros-Hernández LB, Restrepo CM, Kors JA, Kirichenko AV, Willemsen R, Oostra BA, Stricker BH, Uitterlinden AG, Axenovich TI, van Duijn CM, Isaacs A. A combined linkage, microarray and exome analysis suggests MAP3K11 as a candidate gene for left ventricular hypertrophy. BMC Med Genomics 2018; 11:22. [PMID: 29506515 PMCID: PMC5838853 DOI: 10.1186/s12920-018-0339-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2017] [Accepted: 02/21/2018] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Electrocardiographic measures of left ventricular hypertrophy (LVH) are used as predictors of cardiovascular risk. We combined linkage and association analyses to discover novel rare genetic variants involved in three such measures and two principal components derived from them. METHODS The study was conducted among participants from the Erasmus Rucphen Family Study (ERF), a Dutch family-based sample from the southwestern Netherlands. Variance components linkage analyses were performed using Merlin. Regions of interest (LOD > 1.9) were fine-mapped using microarray and exome sequence data. RESULTS We observed one significant LOD score for the second principal component on chromosome 15 (LOD score = 3.01) and 12 suggestive LOD scores. Several loci contained variants identified in GWAS for these traits; however, these did not explain the linkage peaks, nor did other common variants. Exome sequence data identified two associated variants after multiple testing corrections were applied. CONCLUSIONS We did not find common SNPs explaining these linkage signals. Exome sequencing uncovered a relatively rare variant in MAPK3K11 on chromosome 11 (MAF = 0.01) that helped account for the suggestive linkage peak observed for the first principal component. Conditional analysis revealed a drop in LOD from 2.01 to 0.88 for MAP3K11, suggesting that this variant may partially explain the linkage signal at this chromosomal location. MAP3K11 is related to the JNK pathway and is a pro-apoptotic kinase that plays an important role in the induction of cardiomyocyte apoptosis in various pathologies, including LVH.
Collapse
|
9
|
Belonogova NM, Svishcheva GR, Wilson JF, Campbell H, Axenovich TI. Weighted functional linear regression models for gene-based association analysis. PLoS One 2018; 13:e0190486. [PMID: 29309409 PMCID: PMC5757938 DOI: 10.1371/journal.pone.0190486] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 12/17/2017] [Indexed: 11/19/2022] Open
Abstract
Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.
Collapse
|
10
|
Amin N, Belonogova NM, Jovanova O, Brouwer RWW, van Rooij JGJ, van den Hout MCGN, Svishcheva GR, Kraaij R, Zorkoltseva IV, Kirichenko AV, Hofman A, Uitterlinden AG, van IJcken WFJ, Tiemeier H, Axenovich TI, van Duijn CM. Nonsynonymous Variation in NKPD1 Increases Depressive Symptoms in European Populations. Biol Psychiatry 2017; 81:702-707. [PMID: 27745872 DOI: 10.1016/j.biopsych.2016.08.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Revised: 07/28/2016] [Accepted: 08/02/2016] [Indexed: 12/21/2022]
Abstract
BACKGROUND Despite high heritability, little success was achieved in mapping genetic determinants of depression-related traits by means of genome-wide association studies. METHODS To identify genes associated with depressive symptomology, we performed a gene-based association analysis of nonsynonymous variation captured using exome-sequencing and exome-chip genotyping in a genetically isolated population from the Netherlands (n = 1999). Finally, we reproduced our significant findings in an independent population-based cohort (n = 1604). RESULTS We detected significant association of depressive symptoms with a gene NKPD1 (p = 3.7 × 10-08). Nonsynonymous variants in the gene explained 0.9% of sex- and age-adjusted variance of depressive symptoms in the discovery study, which is translated into 3.8% of the total estimated heritability (h2 = 0.24). Significant association of depressive symptoms with NKPD1 was also observed (n = 1604; p = 1.5 × 10-03) in the independent replication sample despite little overlap with the discovery cohort in the set of nonsynonymous genetic variants observed in the NKPD1 gene. Meta-analysis of the discovery and replication studies improved the association signal (p = 1.0 × 10-09). CONCLUSIONS Our study suggests that nonsynonymous variation in the gene NKPD1 affects depressive symptoms in the general population. NKPD1 is predicted to be involved in the de novo synthesis of sphingolipids, which have been implicated in the pathogenesis of depression.
Collapse
|
11
|
Silva CT, Zorkoltseva IV, Amin N, Demirkan A, van Leeuwen EM, Kors JA, van den Berg M, Stricker BH, Uitterlinden AG, Kirichenko AV, Witteman JCM, Willemsen R, Oostra BA, Axenovich TI, van Duijn CM, Isaacs A. A Combined Linkage and Exome Sequencing Analysis for Electrocardiogram Parameters in the Erasmus Rucphen Family Study. Front Genet 2016; 7:190. [PMID: 27877193 PMCID: PMC5099142 DOI: 10.3389/fgene.2016.00190] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 10/11/2016] [Indexed: 12/30/2022] Open
Abstract
Electrocardiogram (ECG) measurements play a key role in the diagnosis and prediction of cardiac arrhythmias and sudden cardiac death. ECG parameters, such as the PR, QRS, and QT intervals, are known to be heritable and genome-wide association studies of these phenotypes have been successful in identifying common variants; however, a large proportion of the genetic variability of these traits remains to be elucidated. The aim of this study was to discover loci potentially harboring rare variants utilizing variance component linkage analysis in 1547 individuals from a large family-based study, the Erasmus Rucphen Family Study (ERF). Linked regions were further explored using exome sequencing. Five suggestive linkage peaks were identified: two for QT interval (1q24, LOD = 2.63; 2q34, LOD = 2.05), one for QRS interval (1p35, LOD = 2.52) and two for PR interval (9p22, LOD = 2.20; 14q11, LOD = 2.29). Fine-mapping using exome sequence data identified a C > G missense variant (c.713C > G, p.Ser238Cys) in the FCRL2 gene associated with QT (rs74608430; P = 2.8 × 10-4, minor allele frequency = 0.019). Heritability analysis demonstrated that the SNP explained 2.42% of the trait’s genetic variability in ERF (P = 0.02). Pathway analysis suggested that the gene is involved in cytosolic Ca2+ levels (P = 3.3 × 10-3) and AMPK stimulated fatty acid oxidation in muscle (P = 4.1 × 10-3). Look-ups in bioinformatics resources showed that expression of FCRL2 is associated with ARHGAP24 and SETBP1 expression. This finding was not replicated in the Rotterdam study. Combining the bioinformatics information with the association and linkage analyses, FCRL2 emerges as a strong candidate gene for QT interval.
Collapse
|
12
|
Svishcheva GR, Belonogova NM, Axenovich TI. [Functional linear models for region-based association analysis]. GENETIKA 2016; 52:1202-1209. [PMID: 29369592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Regional association analysis is one of the most powerful tools for gene mapping because instead analysis of individual variants it simultaneously considers all variants in the region. Recent development of the models for regional association analysis involves functional data analysis approach. In the framework of this approach, genotypes of variants within region as well as their effects are described by continuous functions. Such approach allows us to use information about both linkage and linkage disequilibrium and reduce the influence of noise and/or observation errors. Here we define a functional linear mixed model to test association on independent and structured samples. We demonstrate how to test fixed and random effects of a set of genetic variants in the region on quantitative trait. Estimation of statistical properties of new methods shows that type I errors are in accordance with declared values and power is high especially for models with fixed effects of genotypes. We suppose that new functional regression linear models facilitate identification of rare genetic variants controlling complex human and animal traits. New methods are implemented in computer software FREGAT which is available for free download at http://mga.bionet.nsc.ru/soft/FREGAT/.
Collapse
|
13
|
Svishcheva GR, Belonogova NM, Axenovich TI. Some pitfalls in application of functional data analysis approach to association studies. Sci Rep 2016; 6:23918. [PMID: 27041739 PMCID: PMC4819216 DOI: 10.1038/srep23918] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 03/16/2016] [Indexed: 11/26/2022] Open
Abstract
One of the most effective methods for gene-based mapping employs functional data analysis, which smoothes data using standard basis functions. The full functional linear model includes a functional representation of genotypes and their effects, while the beta-smooth only model smoothes the genotype effects only. Benefits and limitations of the beta-smooth only model should be studied before using it in practice. Here we analytically compare the full and beta-smooth only models under various scenarios. We show that when the full model employs two sets of basis functions equal in type and number, genotypes smoothing is eliminated from the model and it becomes analytically equivalent to the beta-smooth only model. If the basis functions differ only in type, genotypes smoothing is also eliminated from the full model, but the type of basis functions used for smoothing genotype effects becomes redefined. This leads to misinterpretation of the results and may reduce statistical power. When basis functions differ in number, no analytical comparison of the full and beta-smooth only models is possible. However, we show that the numbers of basis functions set unequal can become equal during the analysis, and the full model becomes disadvantageous.
Collapse
|
14
|
Belonogova NM, Svishcheva GR, Axenovich TI. FREGAT: an R package for region-based association analysis. ACTA ACUST UNITED AC 2016; 32:2392-3. [PMID: 27153598 DOI: 10.1093/bioinformatics/btw160] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 03/20/2016] [Indexed: 11/14/2022]
Abstract
UNLABELLED Several approaches to the region-based association analysis of quantitative traits have recently been developed and successively applied. However, no software package has been developed that implements all of these approaches for either independent or structured samples. Here we introduce FREGAT (Family REGional Association Tests), an R package that can handle family and population samples and implements a wide range of region-based association methods including burden tests, functional linear models, and kernel machine-based regression. FREGAT can be used in genome/exome-wide region-based association studies of quantitative traits and candidate gene analysis. FREGAT offers many useful options to empower its users and increase the effectiveness and applicability of region-based association analysis. AVAILABILITY AND IMPLEMENTATION https://cran.r-project.org/web/packages/FREGAT/index.html SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online. CONTACT belon@bionet.nsc.ru.
Collapse
|
15
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
|
16
|
Svishcheva GR, Belonogova NM, Axenovich TI. FFBSKAT: fast family-based sequence kernel association test. PLoS One 2014; 9:e99407. [PMID: 24905468 PMCID: PMC4048315 DOI: 10.1371/journal.pone.0099407] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 05/14/2014] [Indexed: 11/28/2022] Open
Abstract
The kernel machine-based regression is an efficient approach to region-based association analysis aimed at identification of rare genetic variants. However, this method is computationally complex. The running time of kernel-based association analysis becomes especially long for samples with genetic (sub) structures, thus increasing the need to develop new and effective methods, algorithms, and software packages. We have developed a new R-package called fast family-based sequence kernel association test (FFBSKAT) for analysis of quantitative traits in samples of related individuals. This software implements a score-based variance component test to assess the association of a given set of single nucleotide polymorphisms with a continuous phenotype. We compared the performance of our software with that of two existing software for family-based sequence kernel association testing, namely, ASKAT and famSKAT, using the Genetic Analysis Workshop 17 family sample. Results demonstrate that FFBSKAT is several times faster than other available programs. In addition, the calculations of the three-compared software were similarly accurate. With respect to the available analysis modes, we combined the advantages of both ASKAT and famSKAT and added new options to empower FFBSKAT users. The FFBSKAT package is fast, user-friendly, and provides an easy-to-use method to perform whole-exome kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The FFBSKAT package, along with its manual, is available for free download at http://mga.bionet.nsc.ru/soft/FFBSKAT/.
Collapse
|
17
|
Tsepilov YA, Ried JS, Strauch K, Grallert H, van Duijn CM, Axenovich TI, Aulchenko YS. Development and application of genomic control methods for genome-wide association studies using non-additive models. PLoS One 2013; 8:e81431. [PMID: 24358113 PMCID: PMC3864791 DOI: 10.1371/journal.pone.0081431] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 10/12/2013] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) comprise a powerful tool for mapping genes of complex traits. However, an inflation of the test statistic can occur because of population substructure or cryptic relatedness, which could cause spurious associations. If information on a large number of genetic markers is available, adjusting the analysis results by using the method of genomic control (GC) is possible. GC was originally proposed to correct the Cochran-Armitage additive trend test. For non-additive models, correction has been shown to depend on allele frequencies. Therefore, usage of GC is limited to situations where allele frequencies of null markers and candidate markers are matched. In this work, we extended the capabilities of the GC method for non-additive models, which allows us to use null markers with arbitrary allele frequencies for GC. Analytical expressions for the inflation of a test statistic describing its dependency on allele frequency and several population parameters were obtained for recessive, dominant, and over-dominant models of inheritance. We proposed a method to estimate these required population parameters. Furthermore, we suggested a GC method based on approximation of the correction coefficient by a polynomial of allele frequency and described procedures to correct the genotypic (two degrees of freedom) test for cases when the model of inheritance is unknown. Statistical properties of the described methods were investigated using simulated and real data. We demonstrated that all considered methods were effective in controlling type 1 error in the presence of genetic substructure. The proposed GC methods can be applied to statistical tests for GWAS with various models of inheritance. All methods developed and tested in this work were implemented using R language as a part of the GenABEL package.
Collapse
|
18
|
Belonogova NM, Svishcheva GR, van Duijn CM, Aulchenko YS, Axenovich TI. Region-based association analysis of human quantitative traits in related individuals. PLoS One 2013; 8:e65395. [PMID: 23799013 PMCID: PMC3684601 DOI: 10.1371/journal.pone.0065395] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2013] [Accepted: 04/24/2013] [Indexed: 01/27/2023] Open
Abstract
Regional-based association analysis instead of individual testing of each SNP was introduced in genome-wide association studies to increase the power of gene mapping, especially for rare genetic variants. For regional association tests, the kernel machine-based regression approach was recently proposed as a more powerful alternative to collapsing-based methods. However, the vast majority of existing algorithms and software for the kernel machine-based regression are applicable only to unrelated samples. In this paper, we present a new method for the kernel machine-based regression association analysis of quantitative traits in samples of related individuals. The method is based on the GRAMMAR+ transformation of phenotypes of related individuals, followed by use of existing kernel machine-based regression software for unrelated samples. We compared the performance of kernel-based association analysis on the material of the Genetic Analysis Workshop 17 family sample and real human data by using our transformation, the original untransformed trait, and environmental residuals. We demonstrated that only the GRAMMAR+ transformation produced type I errors close to the nominal value and that this method had the highest empirical power. The new method can be applied to analysis of related samples by using existing software for kernel-based association analysis developed for unrelated samples.
Collapse
|
19
|
Amin N, Hottenga JJ, Hansell NK, Janssens ACJW, de Moor MHM, Madden PAF, Zorkoltseva IV, Penninx BW, Terracciano A, Uda M, Tanaka T, Esko T, Realo A, Ferrucci L, Luciano M, Davies G, Metspalu A, Abecasis GR, Deary IJ, Raikkonen K, Bierut LJ, Costa PT, Saviouk V, Zhu G, Kirichenko AV, Isaacs A, Aulchenko YS, Willemsen G, Heath AC, Pergadia ML, Medland SE, Axenovich TI, de Geus E, Montgomery GW, Wright MJ, Oostra BA, Martin NG, Boomsma DI, van Duijn CM. Refining genome-wide linkage intervals using a meta-analysis of genome-wide association studies identifies loci influencing personality dimensions. Eur J Hum Genet 2012; 21:876-82. [PMID: 23211697 DOI: 10.1038/ejhg.2012.263] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Revised: 09/21/2012] [Accepted: 10/26/2012] [Indexed: 11/10/2022] Open
Abstract
Personality traits are complex phenotypes related to psychosomatic health. Individually, various gene finding methods have not achieved much success in finding genetic variants associated with personality traits. We performed a meta-analysis of four genome-wide linkage scans (N=6149 subjects) of five basic personality traits assessed with the NEO Five-Factor Inventory. We compared the significant regions from the meta-analysis of linkage scans with the results of a meta-analysis of genome-wide association studies (GWAS) (N∼17 000). We found significant evidence of linkage of neuroticism to chromosome 3p14 (rs1490265, LOD=4.67) and to chromosome 19q13 (rs628604, LOD=3.55); of extraversion to 14q32 (ATGG002, LOD=3.3); and of agreeableness to 3p25 (rs709160, LOD=3.67) and to two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the individual scans. In the meta-analysis, we found strong evidence of linkage of extraversion to 4q34, 9q34, 10q24 and 11q22, openness to 2p25, 3q26, 9p21, 11q24, 15q26 and 19q13 and agreeableness to 4q34 and 19p13. Significant evidence of association in the GWAS was detected between openness and rs677035 at 11q24 (P-value=2.6 × 10(-06), KCNJ1). The findings of our linkage meta-analysis and those of the GWAS suggest that 11q24 is a susceptible locus for openness, with KCNJ1 as the possible candidate gene.
Collapse
|
20
|
Amin N, Schuur M, Gusareva ES, Isaacs A, Aulchenko YS, Kirichenko AV, Zorkoltseva IV, Axenovich TI, Oostra BA, Janssens ACJW, van Duijn CM. A genome-wide linkage study of individuals with high scores on NEO personality traits. Mol Psychiatry 2012; 17:1031-41. [PMID: 21826060 DOI: 10.1038/mp.2011.97] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The NEO-Five-Factor Inventory divides human personality traits into five dimensions: neuroticism, extraversion, openness, conscientiousness and agreeableness. In this study, we sought to identify regions harboring genes with large effects on the five NEO personality traits by performing genome-wide linkage analysis of individuals scoring in the extremes of these traits (>90th percentile). Affected-only linkage analysis was performed using an Illumina 6K linkage array in a family-based study, the Erasmus Rucphen Family study. We subsequently determined whether distinct, segregating haplotypes found with linkage analysis were associated with the trait of interest in the population. Finally, a dense single-nucleotide polymorphism genotyping array (Illumina 318K) was used to search for copy number variations (CNVs) in the associated regions. In the families with extreme phenotype scores, we found significant evidence of linkage for conscientiousness to 20p13 (rs1434789, log of odds (LOD)=5.86) and suggestive evidence of linkage (LOD >2.8) for neuroticism to 19q, 21q and 22q, extraversion to 1p, 1q, 9p and12q, openness to 12q and 19q, and agreeableness to 2p, 6q, 17q and 21q. Further analysis determined haplotypes in 21q22 for neuroticism (P-values = 0.009, 0.007), in 17q24 for agreeableness (marginal P-value = 0.018) and in 20p13 for conscientiousness (marginal P-values = 0.058, 0.038) segregating in families with large contributions to the LOD scores. No evidence for CNVs in any of the associated regions was found. Our findings imply that there may be genes with relatively large effects involved in personality traits, which may be identified with next-generation sequencing techniques.
Collapse
|
21
|
Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS. Rapid variance components-based method for whole-genome association analysis. Nat Genet 2012; 44:1166-70. [PMID: 22983301 DOI: 10.1038/ng.2410] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 08/16/2012] [Indexed: 11/09/2022]
Abstract
The variance component tests used in genome-wide association studies (GWAS) including large sample sizes become computationally exhaustive when the number of genetic markers is over a few hundred thousand. We present an extremely fast variance components-based two-step method, GRAMMAR-Gamma, developed as an analytical approximation within a framework of the score test approach. Using simulated and real human GWAS data sets, we show that this method provides unbiased estimates of the SNP effect and has a power close to that of the likelihood ratio test-based method. The computational complexity of our method is close to its theoretical minimum, that is, to the complexity of the analysis that ignores genetic structure. The running time of our method linearly depends on sample size, whereas this dependency is quadratic for other existing methods. Simulations suggest that GRAMMAR-Gamma may be used for association testing in whole-genome resequencing studies of large human cohorts.
Collapse
|
22
|
Ibrahim-Verbaas CA, Zorkoltseva IV, Amin N, Schuur M, Coppus AMW, Isaacs A, Aulchenko YS, Breteler MMB, Ikram MA, Axenovich TI, Verbeek MM, van Swieten JC, Oostra BA, van Duijn CM. Linkage analysis for plasma amyloid beta levels in persons with hypertension implicates Aβ-40 levels to presenilin 2. Hum Genet 2012; 131:1869-76. [PMID: 22872014 DOI: 10.1007/s00439-012-1210-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 07/21/2012] [Indexed: 12/16/2022]
Abstract
Plasma concentrations of Aβ40 and Aβ42 rise with age and are increased in people with mutations that cause early-onset Alzheimer's disease (AD). Amyloid beta (Aβ) plasma levels were successfully used as an (endo)phenotype for gene discovery using a linkage approach in families with dominant forms of disease. Here, we searched for loci involved in Aβ plasma levels in a series of non-demented patients with hypertension in the Erasmus Rucphen Family study. Aβ40 and Aβ42 levels were determined in 125 subjects with severe hypertension. All patients were genotyped with a 6,000 single nucleotide polymorphisms (SNPs) illumina array designed for linkage analysis. We conducted linkage analysis of plasma Aβ levels. None of the linkage analyses yielded genome-wide significant logarithm of odds (LOD) score over 3.3, but there was suggestive evidence for linkage (LOD > 1.9) for two regions: 1q41 (LOD = 2.07) and 11q14.3 (LOD = 2.97), both for Aβ40. These regions were followed up with association analysis in the study subjects and in 320 subjects from a population-based cohort. For the Aβ40 region on chromosome 1, association of several SNPs was observed at the presenilin 2 gene (PSEN2) (p = 2.58 × 10(-4) for rs6703170). On chromosome 11q14-21, we found some association (p = 3.1 × 10(-3) for rs2514299). This linkage study of plasma concentrations of Aβ40 and Aβ42 yielded two suggestive regions, of which one points toward a known locus for familial AD.
Collapse
|
23
|
Zorkoltseva IV, Aulchenko YS, van Duijn CM, Axenovich TI. Ped_Outlier software for automatic identification of within-family outliers. Comput Biol Chem 2010; 34:242-3. [PMID: 20884298 DOI: 10.1016/j.compbiolchem.2010.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2010] [Accepted: 08/27/2010] [Indexed: 11/26/2022]
Abstract
A high-throughput resequencing technology has brought family based studies back into genetic research focus. Within-family outliers (the individuals whose phenotype is very much unlike the phenotype of relatives) may carry rare variants of large effects and thus resequencing of these provides a highly powered strategy for rare variants detection. On the other hand, such outliers may complicate search for common variants of smaller effects, because they may obscure a real linkage signal. We have developed a program Ped_Outlier allowing automatic detection of within-family outliers in a sample of pedigrees of arbitrary structure and size. We tested our program by identification of within-family outliers for adult height and intracranial volume in large pedigree. Results of linkage analysis of these traits demonstrated that identification of within-family outliers is one of the important steps of pedigree analysis. The program Ped_outlier is freely available at http://mga.bionet.nsc.ru/soft/index.html.
Collapse
|
24
|
Schuur M, Hommel D, Ikram MA, Amin N, Zorkoltseva IV, Kirichenko A, Koning I, Janssens ACJ, Axenovich TI, Aulchenko YS, Hofman A, Breteler MM, Oostra BA, Swieten JC, Duijn CM. O2‐07‐01: Genome‐wide linkage screen of cognitive function identifies susceptible chromosomal regions. Alzheimers Dement 2010. [DOI: 10.1016/j.jalz.2010.05.346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
25
|
Axenovich TI, Aulchenko YS. MQScore_SNP software for multipoint parametric linkage analysis of quantitative traits in large pedigrees. Ann Hum Genet 2010; 74:286-9. [PMID: 20529018 DOI: 10.1111/j.1469-1809.2010.00576.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
We describe software for multipoint parametric linkage analysis of quantitative traits using information about SNP genotypes. A mixed model of major gene and polygene inheritance is implemented in this software. Implementation of several algorithms to avoid computational underflow and decrease running time permits application of our software to the analysis of very large pedigrees collected in human genetically isolated populations. We tested our software by performing linkage analysis of adult height in a large pedigree from a Dutch isolated population. Three significant and four suggestive loci were identified with the help of our programs, whereas variance-component-based linkage analysis, which requires the pedigree fragmentation, demonstrated only three suggestive peaks. The software package MQScore_SNP is available at http://mga.bionet.nsc.ru/soft/index.html.
Collapse
|