201
|
Hormozdiari F, Zhu A, Kichaev G, Ju CJT, Segrè AV, Joo JWJ, Won H, Sankararaman S, Pasaniuc B, Shifman S, Eskin E. Widespread Allelic Heterogeneity in Complex Traits. Am J Hum Genet 2017; 100:789-802. [PMID: 28475861 PMCID: PMC5420356 DOI: 10.1016/j.ajhg.2017.04.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2016] [Accepted: 04/07/2017] [Indexed: 12/24/2022] Open
Abstract
Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2 = 0.85, p = 2.2 × 10-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.
Collapse
|
202
|
Host Genome Influence on Gut Microbial Composition and Microbial Prediction of Complex Traits in Pigs. Genetics 2017; 206:1637-1644. [PMID: 28468904 DOI: 10.1534/genetics.117.200782] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Accepted: 04/27/2017] [Indexed: 01/07/2023] Open
Abstract
The aim of the present study was to analyze the interplay between gastrointestinal tract (GIT) microbiota, host genetics, and complex traits in pigs using extended quantitative-genetic methods. The study design consisted of 207 pigs that were housed and slaughtered under standardized conditions, and phenotyped for daily gain, feed intake, and feed conversion rate. The pigs were genotyped with a standard 60 K SNP chip. The GIT microbiota composition was analyzed by 16S rRNA gene amplicon sequencing technology. Eight from 49 investigated bacteria genera showed a significant narrow sense host heritability, ranging from 0.32 to 0.57. Microbial mixed linear models were applied to estimate the microbiota variance for each complex trait. The fraction of phenotypic variance explained by the microbial variance was 0.28, 0.21, and 0.16 for daily gain, feed conversion, and feed intake, respectively. The SNP data and the microbiota composition were used to predict the complex traits using genomic best linear unbiased prediction (G-BLUP) and microbial best linear unbiased prediction (M-BLUP) methods, respectively. The prediction accuracies of G-BLUP were 0.35, 0.23, and 0.20 for daily gain, feed conversion, and feed intake, respectively. The corresponding prediction accuracies of M-BLUP were 0.41, 0.33, and 0.33. Thus, in addition to SNP data, microbiota abundances are an informative source of complex trait predictions. Since the pig is a well-suited animal for modeling the human digestive tract, M-BLUP, in addition to G-BLUP, might be beneficial for predicting human predispositions to some diseases, and, consequently, for preventative and personalized medicine.
Collapse
|
203
|
Polimanti R, Zhang H, Smith AH, Zhao H, Farrer LA, Kranzler HR, Gelernter J. Genome-wide association study of body mass index in subjects with alcohol dependence. Addict Biol 2017; 22:535-549. [PMID: 26458734 PMCID: PMC5102811 DOI: 10.1111/adb.12317] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2015] [Revised: 08/27/2015] [Accepted: 09/03/2015] [Indexed: 12/22/2022]
Abstract
Outcomes related to disordered metabolism are common in alcohol dependence (AD). To investigate alterations in the regulation of body mass that occur in the context of AD, we performed a genome-wide association study (GWAS) of body mass index (BMI) in African Americans (AAs) and European Americans (EAs) with AD. Subjects were recruited for genetic studies of AD or drug dependence and evaluated using the Semi-structured Assessment for Drug Dependence and Alcoholism. We investigated a total of 2587 AAs and 2959 EAs with DSM-IV AD diagnosis. In the stage 1 sample (N = 4137), we observed three genome-wide significant (GWS) single-nucleotide polymorphism associations, rs200889048 (P = 8.98 * 10-12 ) and rs12490016 (P = 1.44 * 10-8 ) in EAs and rs1630623 (P = 5.14 * 10-9 ) in AAs and EAs meta-analyzed. In the stage 2 sample (N = 1409), we replicated 278, 253 and 168 of the stage 1 suggestive loci (P < 5*10-4 ) in AAs, EAs, and AAs and EAs meta-analyzed, respectively. A meta-analysis of stage 1 and stage 2 samples (N = 5546) identified two additional GWS signals: rs28562191 in EAs (P = 4.46 * 10-8 ) and rs56950471 in AAs (P = 1.57 * 10-9 ). Three of the GWS loci identified (rs200889048, rs12490016 and rs1630623) were not previously reported by GWAS of BMI in the general population, and two of them raise interesting hypotheses: rs12490016-a regulatory variant located within LINC00880, where there are other GWAS-identified variants associated with birth size, adiposity in newborns and bulimia symptoms, which also interact with social stress in relation to birth size; rs1630623-a regulatory variant related to ALDH1A1, a gene involved in alcohol metabolism and adipocyte plasticity. These loci offer molecular insights regarding the regulatory mechanisms of body mass in the context of AD.
Collapse
|
204
|
Bhat JA, Ali S, Salgotra RK, Mir ZA, Dutta S, Jadon V, Tyagi A, Mushtaq M, Jain N, Singh PK, Singh GP, Prabhu KV. Genomic Selection in the Era of Next Generation Sequencing for Complex Traits in Plant Breeding. Front Genet 2016; 7:221. [PMID: 28083016 PMCID: PMC5186759 DOI: 10.3389/fgene.2016.00221] [Citation(s) in RCA: 128] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 12/12/2016] [Indexed: 12/31/2022] Open
Abstract
Genomic selection (GS) is a promising approach exploiting molecular genetic markers to design novel breeding programs and to develop new markers-based models for genetic evaluation. In plant breeding, it provides opportunities to increase genetic gain of complex traits per unit time and cost. The cost-benefit balance was an important consideration for GS to work in crop plants. Availability of genome-wide high-throughput, cost-effective and flexible markers, having low ascertainment bias, suitable for large population size as well for both model and non-model crop species with or without the reference genome sequence was the most important factor for its successful and effective implementation in crop species. These factors were the major limitations to earlier marker systems viz., SSR and array-based, and was unimaginable before the availability of next-generation sequencing (NGS) technologies which have provided novel SNP genotyping platforms especially the genotyping by sequencing. These marker technologies have changed the entire scenario of marker applications and made the use of GS a routine work for crop improvement in both model and non-model crop species. The NGS-based genotyping have increased genomic-estimated breeding value prediction accuracies over other established marker platform in cereals and other crop species, and made the dream of GS true in crop breeding. But to harness the true benefits from GS, these marker technologies will be combined with high-throughput phenotyping for achieving the valuable genetic gain from complex traits. Moreover, the continuous decline in sequencing cost will make the WGS feasible and cost effective for GS in near future. Till that time matures the targeted sequencing seems to be more cost-effective option for large scale marker discovery and GS, particularly in case of large and un-decoded genomes.
Collapse
|
205
|
Pulit SL, de With SAJ, de Bakker PIW. Resetting the bar: Statistical significance in whole-genome sequencing-based association studies of global populations. Genet Epidemiol 2016; 41:145-151. [PMID: 27990689 DOI: 10.1002/gepi.22032] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 10/31/2016] [Accepted: 10/31/2016] [Indexed: 12/29/2022]
Abstract
Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance thresholds for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome-wide significance at approximately P = 5 × 10-9 , and studies of African samples should apply a more stringent genome-wide significance threshold of P = 1 × 10-9 . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.
Collapse
|
206
|
Human Facial Shape and Size Heritability and Genetic Correlations. Genetics 2016; 205:967-978. [PMID: 27974501 DOI: 10.1534/genetics.116.193185] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Accepted: 12/08/2016] [Indexed: 01/24/2023] Open
Abstract
The human face is an array of variable physical features that together make each of us unique and distinguishable. Striking familial facial similarities underscore a genetic component, but little is known of the genes that underlie facial shape differences. Numerous studies have estimated facial shape heritability using various methods. Here, we used advanced three-dimensional imaging technology and quantitative human genetics analysis to estimate narrow-sense heritability, heritability explained by common genetic variation, and pairwise genetic correlations of 38 measures of facial shape and size in normal African Bantu children from Tanzania. Specifically, we fit a linear mixed model of genetic relatedness between close and distant relatives to jointly estimate variance components that correspond to heritability explained by genome-wide common genetic variation and variance explained by uncaptured genetic variation, the sum representing total narrow-sense heritability. Our significant estimates for narrow-sense heritability of specific facial traits range from 28 to 67%, with horizontal measures being slightly more heritable than vertical or depth measures. Furthermore, for over half of facial traits, >90% of narrow-sense heritability can be explained by common genetic variation. We also find high absolute genetic correlation between most traits, indicating large overlap in underlying genetic loci. Not surprisingly, traits measured in the same physical orientation (i.e., both horizontal or both vertical) have high positive genetic correlations, whereas traits in opposite orientations have high negative correlations. The complex genetic architecture of facial shape informs our understanding of the intricate relationships among different facial features as well as overall facial development.
Collapse
|
207
|
Chiu CY, Jung J, Wang Y, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Mills JL, Boehnke M, Xiong M, Fan R. A comparison study of multivariate fixed models and Gene Association with Multiple Traits (GAMuT) for next-generation sequencing. Genet Epidemiol 2016; 41:18-34. [PMID: 27917525 DOI: 10.1002/gepi.22014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 09/01/2016] [Accepted: 09/19/2016] [Indexed: 01/23/2023]
Abstract
In this paper, extensive simulations are performed to compare two statistical methods to analyze multiple correlated quantitative phenotypes: (1) approximate F-distributed tests of multivariate functional linear models (MFLM) and additive models of multivariate analysis of variance (MANOVA), and (2) Gene Association with Multiple Traits (GAMuT) for association testing of high-dimensional genotype data. It is shown that approximate F-distributed tests of MFLM and MANOVA have higher power and are more appropriate for major gene association analysis (i.e., scenarios in which some genetic variants have relatively large effects on the phenotypes); GAMuT has higher power and is more appropriate for analyzing polygenic effects (i.e., effects from a large number of genetic variants each of which contributes a small amount to the phenotypes). MFLM and MANOVA are very flexible and can be used to perform association analysis for (i) rare variants, (ii) common variants, and (iii) a combination of rare and common variants. Although GAMuT was designed to analyze rare variants, it can be applied to analyze a combination of rare and common variants and it performs well when (1) the number of genetic variants is large and (2) each variant contributes a small amount to the phenotypes (i.e., polygenes). MFLM and MANOVA are fixed effect models that perform well for major gene association analysis. GAMuT can be viewed as an extension of sequence kernel association tests (SKAT). Both GAMuT and SKAT are more appropriate for analyzing polygenic effects and they perform well not only in the rare variant case, but also in the case of a combination of rare and common variants. Data analyses of European cohorts and the Trinity Students Study are presented to compare the performance of the two methods.
Collapse
|
208
|
Abstract
Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.
Collapse
|
209
|
Abstract
The generation of genome-wide variation data has become commonplace. However, the potential for interpretation and application of these data for clinical assessment of outcomes of interest, and prediction of disease risk, is currently not fully realized. Many common, complex diseases now have numerous, well-established "risk" loci, and likely harbor many genetic determinants with effects too small to be detected at genome-wide levels of statistical significance. A simple and intuitive approach for converting genetic data to a predictive measure of disease susceptibility is to aggregate the risk effects of these loci into a single genetic risk score. Here, some common methods and software packages for calculating genetic risk scores, with focus on studies of common, complex diseases, are described. The basic information needed as well as important considerations for constructing genetic risk scores, including specific requirements for phenotypic and genetic data, and limitations in their application is reviewed. © 2016 by John Wiley & Sons, Inc.
Collapse
|
210
|
Fan R, Chiu CY, Jung J, Weeks DE, Wilson AF, Bailey-Wilson JE, Amos CI, Chen Z, Mills JL, Xiong M. A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits. Genet Epidemiol 2016; 40:702-721. [PMID: 27374056 DOI: 10.1002/gepi.21984] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 03/08/2016] [Accepted: 04/26/2016] [Indexed: 12/22/2022]
Abstract
In association studies of complex traits, fixed-effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance-component tests based on mixed models were developed for region-based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT-O), and a combined sum test of rare and common variant effect (SKAT-C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT-O, and SKAT-C, (ii) traditional fixed-effect additive models, and (iii) fixed-effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed-effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed-effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT-O/SKAT-C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed-effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages.
Collapse
|
211
|
Mitra AK, Stessman HAF, Schaefer RJ, Wang W, Myers CL, Van Ness BG, Beiraghi S. Fine-Mapping of 18q21.1 Locus Identifies Single Nucleotide Polymorphisms Associated with Nonsyndromic Cleft Lip with or without Cleft Palate. Front Genet 2016; 7:88. [PMID: 27242896 PMCID: PMC4876112 DOI: 10.3389/fgene.2016.00088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 05/01/2016] [Indexed: 12/26/2022] Open
Abstract
Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is one of the most common congenital birth defects. NSCL/P is a complex multifactorial disease caused by interactions between multiple environmental and genetic factors. However, the causal single nucleotide polymorphism (SNP) signature profile underlying the risk of familial NSCL/P still remains unknown. We previously reported a 5.7-Mb genomic region on chromosome 18q21.1 locus that potentially contributes to autosomal dominant, low-penetrance inheritance of NSCL/P. In the current study, we performed exome sequencing on 12 familial genomes (six affected individuals, two obligate carriers, and four seemingly unaffected individuals) of a six-generation family to identify candidate SNPs associated with NSCL/P risk. Subsequently, targeted bidirectional DNA re-sequencing of polymerase chain reaction (PCR)-amplified high-risk regions of MYO5B gene and sequenom iPLEX genotpying of 29 candidate SNPs were performed on a larger set of 33 members of this NSCL/P family (10 affected + 4 obligate carriers + 19 unaffected relatives) to find SNPs significantly associated with NSCL/P trait. SNP vs. NSCL/P association analysis showed the MYO5B SNP rs183559995 GA genotype had an odds ratio of 18.09 (95% Confidence Interval = 1.86–176.34; gender-adjusted P = 0.0019) compared to the reference GG genotype. Additionally, the following SNPs were also found significantly associated with NSCL/P risk: rs1450425 (LOXHD1), rs6507992 (SKA1), rs78950893 (SMAD7), rs8097060, rs17713847 (SCARNA17), rs6507872 (CTIF), rs8091995 (CTIF), and rs17715416 (MYO5B). We could thus identify mutations in several genes as key candidate SNPs associated with the risk of NSCL/P in this large multi-generation family.
Collapse
|
212
|
Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Subramanian V, Bartell E, Nguyen XX, Ye J, Leyton-Mange J, Dolmatova EV, van der Harst P, de Laat W, Ellinor PT, Newton-Cheh C, Milan DJ, Kellis M, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 2016. [PMID: 27162171 DOI: 10.7554/elife.10557.] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genetic variants identified by genome-wide association studies explain only a modest proportion of heritability, suggesting that meaningful associations lie 'hidden' below current thresholds. Here, we integrate information from association studies with epigenomic maps to demonstrate that enhancers significantly overlap known loci associated with the cardiac QT interval and QRS duration. We apply functional criteria to identify loci associated with QT interval that do not meet genome-wide significance and are missed by existing studies. We demonstrate that these 'sub-threshold' signals represent novel loci, and that epigenomic maps are effective at discriminating true biological signals from noise. We experimentally validate the molecular, gene-regulatory, cellular and organismal phenotypes of these sub-threshold loci, demonstrating that most sub-threshold loci have regulatory consequences and that genetic perturbation of nearby genes causes cardiac phenotypes in mouse. Our work provides a general approach for improving the detection of novel loci associated with complex human traits.
Collapse
|
213
|
Wang X, Tucker NR, Rizki G, Mills R, Krijger PH, de Wit E, Subramanian V, Bartell E, Nguyen XX, Ye J, Leyton-Mange J, Dolmatova EV, van der Harst P, de Laat W, Ellinor PT, Newton-Cheh C, Milan DJ, Kellis M, Boyer LA. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 2016; 5. [PMID: 27162171 PMCID: PMC4862755 DOI: 10.7554/elife.10557] [Citation(s) in RCA: 81] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 04/04/2016] [Indexed: 12/31/2022] Open
Abstract
Genetic variants identified by genome-wide association studies explain only a modest proportion of heritability, suggesting that meaningful associations lie 'hidden' below current thresholds. Here, we integrate information from association studies with epigenomic maps to demonstrate that enhancers significantly overlap known loci associated with the cardiac QT interval and QRS duration. We apply functional criteria to identify loci associated with QT interval that do not meet genome-wide significance and are missed by existing studies. We demonstrate that these 'sub-threshold' signals represent novel loci, and that epigenomic maps are effective at discriminating true biological signals from noise. We experimentally validate the molecular, gene-regulatory, cellular and organismal phenotypes of these sub-threshold loci, demonstrating that most sub-threshold loci have regulatory consequences and that genetic perturbation of nearby genes causes cardiac phenotypes in mouse. Our work provides a general approach for improving the detection of novel loci associated with complex human traits. DOI:http://dx.doi.org/10.7554/eLife.10557.001 Most complex traits are governed by a large number of genetic contributors, each playing only a modest effect. This makes it difficult to identify the genetic variants that increase disease risk, hindering the discovery of new drug targets and the development of new therapeutics. To overcome this limitation in discovery power, the field of human genetics has traditionally sought increasingly large groups, or cohorts, of afflicted and non-afflicted individuals. Studies of large cohorts are a powerful approach for discovering new disease genes, but such groups are often impractical and sometimes impossible to obtain. However, it has become possible to complement the genetic evidence found in disease association studies with biological evidence of the effects of disease-associated genetic variants. Wang et al. focus specifically on genetic sites, or loci, that do not affect protein sequence but instead affect the non-coding control regions. These are known as enhancer elements, as they can enhance the expression of nearby genes. These loci constitute the majority of disease regions, and thus are extremely important, but their discovery has been hindered by our relatively poor understanding of the human genome. Chemical modifications known as epigenomic marks are indicative of enhancer regions. By studying the factors that affect heart rhythm, Wang et al. show that specific combinations of epigenomic marks are enriched in known trait-associated regions. This knowledge was then used to prioritize the further investigation of genetic regions that genome-wide association studies had only weakly linked to heart rhythm alterations. Wang et al. directly confirmed that genetic differences in “sub-threshold” regions indeed alter the activity of these regulatory regions in human heart cells. Furthermore, mutating or perturbing the predicted target genes of the sub-threshold enhancers caused heart defects in mouse and zebrafish. Wang et al. have demonstrated that epigenome maps can help to distinguish which sub-threshold regions from genome-wide association studies are more likely to contribute to a disease. This allows for the discovery of new disease genes with much smaller cohorts than would be needed otherwise, thus speeding up the development of new therapeutics by many years. DOI:http://dx.doi.org/10.7554/eLife.10557.002
Collapse
|
214
|
Nazarian A, Gezan SA. GenoMatrix: A Software Package for Pedigree-Based and Genomic Prediction Analyses on Complex Traits. J Hered 2016; 107:372-9. [PMID: 27025440 DOI: 10.1093/jhered/esw020] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 03/22/2016] [Indexed: 12/19/2022] Open
Abstract
Genomic and pedigree-based best linear unbiased prediction methodologies (G-BLUP and P-BLUP) have proven themselves efficient for partitioning the phenotypic variance of complex traits into its components, estimating the individuals' genetic merits, and predicting unobserved (or yet-to-be observed) phenotypes in many species and fields of study. The GenoMatrix software, presented here, is a user-friendly package to facilitate the process of using genome-wide marker data and parentage information for G-BLUP and P-BLUP analyses on complex traits. It provides users with a collection of applications which help them on a set of tasks from performing quality control on data to constructing and manipulating the genomic and pedigree-based relationship matrices and obtaining their inverses. Such matrices will be then used in downstream analyses by other statistical packages. The package also enables users to obtain predicted values for unobserved individuals based on the genetic values of observed related individuals. GenoMatrix is available to the research community as a Windows 64bit executable and can be downloaded free of charge at: http://compbio.ufl.edu/software/genomatrix/.
Collapse
|
215
|
Nasibullin T, Yagafarova L, Yagafarov I, Timasheva Y, Erdman V, Tuktarova I, Mustafina O. Combinations of Polymorphic Markers of Chemokine Genes, Their Receptors and Acute Phase Protein Genes As Potential Predictors of Coronary Heart Diseases. Acta Naturae 2016; 8:111-6. [PMID: 27099791 PMCID: PMC4837578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
Atherosclerosis, the main factor in the development of coronary heart diseases (CHD), is an inflammatory response to endothelial layer damage in the arterial bed. We have analyzed the association between CHD and the polymorphic markers of genes that control the synthesis of proteins involved in the processes of adhesion and chemotaxis of immunocompetent cells: rs1024611 (-2518A>G, CCL2 gene), rs1799864 (V64I, CCR2 gene), rs3732378 (T280M, CX3CR1 gene), rs1136743 (A70V, SAA1 gene), and rs1205 (2042C>T, CRP gene) in 217 patients with CHD and 250 controls. Using the Monte Carlo method and Markov chains (APSampler), we revealed a combination of alleles/genotypes associated with both a reduced and increased risk of CHD. The most significant alleles/genotypes areSAA1*T/T+CRP*C+CX3CR1*G/A (P perm = 0.0056, OR = 0.07 95%CI 0.009-0.55), SAA1*T+CRP*T+CCR2*G/A+CX3CR1*G (P perm = 0.0063, OR = 14.58 95%CI 1.88-113.04), SAA1*T+CCR2*A+CCL2* G/G (P perm = 0.0351, OR = 10.77 95%CI 1.35-85.74).
Collapse
|
216
|
Meta-analysis of Complex Diseases at Gene Level with Generalized Functional Linear Models. Genetics 2015; 202:457-70. [PMID: 26715663 DOI: 10.1534/genetics.115.180869] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2015] [Accepted: 12/09/2015] [Indexed: 11/18/2022] Open
Abstract
We developed generalized functional linear models (GFLMs) to perform a meta-analysis of multiple case-control studies to evaluate the relationship of genetic data to dichotomous traits adjusting for covariates. Unlike the previously developed meta-analysis for sequence kernel association tests (MetaSKATs), which are based on mixed-effect models to make the contributions of major gene loci random, GFLMs are fixed models; i.e., genetic effects of multiple genetic variants are fixed. Based on GFLMs, we developed chi-squared-distributed Rao's efficient score test and likelihood-ratio test (LRT) statistics to test for an association between a complex dichotomous trait and multiple genetic variants. We then performed extensive simulations to evaluate the empirical type I error rates and power performance of the proposed tests. The Rao's efficient score test statistics of GFLMs are very conservative and have higher power than MetaSKATs when some causal variants are rare and some are common. When the causal variants are all rare [i.e., minor allele frequencies (MAF) < 0.03], the Rao's efficient score test statistics have similar or slightly lower power than MetaSKATs. The LRT statistics generate accurate type I error rates for homogeneous genetic-effect models and may inflate type I error rates for heterogeneous genetic-effect models owing to the large numbers of degrees of freedom and have similar or slightly higher power than the Rao's efficient score test statistics. GFLMs were applied to analyze genetic data of 22 gene regions of type 2 diabetes data from a meta-analysis of eight European studies and detected significant association for 18 genes (P < 3.10 × 10(-6)), tentative association for 2 genes (HHEX and HMGA2; P ≈ 10(-5)), and no association for 2 genes, while MetaSKATs detected none. In addition, the traditional additive-effect model detects association at gene HHEX. GFLMs and related tests can analyze rare or common variants or a combination of the two and can be useful in whole-genome and whole-exome association studies.
Collapse
|
217
|
Nazarian A, Gezan SA. Integrating Nonadditive Genomic Relationship Matrices into the Study of Genetic Architecture of Complex Traits. J Hered 2015; 107:153-62. [PMID: 26712858 DOI: 10.1093/jhered/esv096] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 11/05/2015] [Indexed: 01/22/2023] Open
Abstract
The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects.
Collapse
|
218
|
Signatures of Dobzhansky-Muller Incompatibilities in the Genomes of Recombinant Inbred Lines. Genetics 2015; 202:825-41. [PMID: 26680662 DOI: 10.1534/genetics.115.179473] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Accepted: 12/14/2015] [Indexed: 12/23/2022] Open
Abstract
In the construction of recombinant inbred lines (RILs) from two divergent inbred parents certain genotype (or epigenotype) combinations may be functionally "incompatible" when brought together in the genomes of the progeny, thus resulting in sterility or lower fertility. Natural selection against these epistatic combinations during inbreeding can change haplotype frequencies and distort linkage disequilibrium (LD) relations between loci on the same or on different chromosomes. These LD distortions have received increased experimental attention, because they point to genomic regions that may drive a Dobzhansky-Muller type of reproductive isolation and, ultimately, speciation in the wild. Here we study the selection signatures of two-locus epistatic incompatibility models and quantify their impact on the genetic composition of the genomes of two-way RILs obtained by selfing. We also consider the biases introduced by breeders when trying to counteract the loss of lines by selectively propagating only viable seeds. Building on our theoretical results, we develop model-based maximum-likelihood (ML) tests that can be applied to multilocus RIL genotype data to infer the precise mode of incompatibility as well as the relative fitness of incompatible loci. We illustrate this ML approach in the context of two published Arabidopsis thaliana RIL panels. Our work lays the theoretical foundation for studying more complex systems such as RILs obtained by sibling mating and/or from multiparental crosses.
Collapse
|
219
|
Jha AR, Zhou D, Brown CD, Kreitman M, Haddad GG, White KP. Shared Genetic Signals of Hypoxia Adaptation in Drosophila and in High-Altitude Human Populations. Mol Biol Evol 2015; 33:501-17. [PMID: 26576852 PMCID: PMC4866538 DOI: 10.1093/molbev/msv248] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The ability to withstand low oxygen (hypoxia tolerance) is a polygenic and mechanistically conserved trait that has important implications for both human health and evolution. However, little is known about the diversity of genetic mechanisms involved in hypoxia adaptation in evolving populations. We used experimental evolution and whole-genome sequencing in Drosophila melanogaster to investigate the role of natural variation in adaptation to hypoxia. Using a generalized linear mixed model we identified significant allele frequency differences between three independently evolved hypoxia-tolerant populations and normoxic control populations for approximately 3,800 single nucleotide polymorphisms. Around 50% of these variants are clustered in 66 distinct genomic regions. These regions contain genes that are differentially expressed between hypoxia-tolerant and normoxic populations and several of the differentially expressed genes are associated with metabolic processes. Additional genes associated with respiratory and open tracheal system development also show evidence of directional selection. RNAi-mediated knockdown of several candidate genes’ expression significantly enhanced survival in severe hypoxia. Using genomewide single nucleotide polymorphism data from four high-altitude human populations—Sherpas, Tibetans, Ethiopians, and Andeans, we found that several human orthologs of the genes under selection in flies are also likely under positive selection in all four high-altitude human populations. Thus, our results indicate that selection for hypoxia tolerance can act on standing genetic variation in similar genes and pathways present in organisms diverged by hundreds of millions of years.
Collapse
|
220
|
Anacleto O, Garcia-Cortés LA, Lipschutz-Powell D, Woolliams JA, Doeschl-Wilson AB. A Novel Statistical Model to Estimate Host Genetic Effects Affecting Disease Transmission. Genetics 2015; 201:871-84. [PMID: 26405030 PMCID: PMC4649657 DOI: 10.1534/genetics.115.179853] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 09/17/2015] [Indexed: 11/18/2022] Open
Abstract
There is increasing recognition that genetic diversity can affect the spread of diseases, potentially affecting plant and livestock disease control as well as the emergence of human disease outbreaks. Nevertheless, even though computational tools can guide the control of infectious diseases, few epidemiological models can simultaneously accommodate the inherent individual heterogeneity in multiple infectious disease traits influencing disease transmission, such as the frequently modeled propensity to become infected and infectivity, which describes the host ability to transmit the infection to susceptible individuals. Furthermore, current quantitative genetic models fail to fully capture the heritable variation in host infectivity, mainly because they cannot accommodate the nonlinear infection dynamics underlying epidemiological data. We present in this article a novel statistical model and an inference method to estimate genetic parameters associated with both host susceptibility and infectivity. Our methodology combines quantitative genetic models of social interactions with stochastic processes to model the random, nonlinear, and dynamic nature of infections and uses adaptive Bayesian computational techniques to estimate the model parameters. Results using simulated epidemic data show that our model can accurately estimate heritabilities and genetic risks not only of susceptibility but also of infectivity, therefore exploring a trait whose heritable variation is currently ignored in disease genetics and can greatly influence the spread of infectious diseases. Our proposed methodology offers potential impacts in areas such as livestock disease control through selective breeding and also in predicting and controlling the emergence of disease outbreaks in human populations.
Collapse
|
221
|
Shriner D, Bentley AR, Doumatey AP, Chen G, Zhou J, Adeyemo A, Rotimi CN. Phenotypic variance explained by local ancestry in admixed African Americans. Front Genet 2015; 6:324. [PMID: 26579196 PMCID: PMC4625172 DOI: 10.3389/fgene.2015.00324] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 10/13/2015] [Indexed: 01/11/2023] Open
Abstract
We surveyed 26 quantitative traits and disease outcomes to understand the proportion of phenotypic variance explained by local ancestry in admixed African Americans. After inferring local ancestry as the number of African-ancestry chromosomes at hundreds of thousands of genotyped loci across all autosomes, we used a linear mixed effects model to estimate the variance explained by local ancestry in two large independent samples of unrelated African Americans. We found that local ancestry at major and polygenic effect genes can explain up to 20 and 8% of phenotypic variance, respectively. These findings provide evidence that most but not all additive genetic variance is explained by genetic markers undifferentiated by ancestry. These results also inform the proportion of health disparities due to genetic risk factors and the magnitude of error in association studies not controlling for local ancestry.
Collapse
|
222
|
The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. Genetics 2015; 201:1601-13. [PMID: 26482794 DOI: 10.1534/genetics.115.177220] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 10/09/2015] [Indexed: 02/08/2023] Open
Abstract
We use computer simulations to investigate the amount of genetic variation for complex traits that can be revealed by single-SNP genome-wide association studies (GWAS) or regional heritability mapping (RHM) analyses based on full genome sequence data or SNP chips. We model a large population subject to mutation, recombination, selection, and drift, assuming a pleiotropic model of mutations sampled from a bivariate distribution of effects of mutations on a quantitative trait and fitness. The pleiotropic model investigated, in contrast to previous models, implies that common mutations of large effect are responsible for most of the genetic variation for quantitative traits, except when the trait is fitness itself. We show that GWAS applied to the full sequence increases the number of QTL detected by as much as 50% compared to the number found with SNP chips but only modestly increases the amount of additive genetic variance explained. Even with full sequence data, the total amount of additive variance explained is generally below 50%. Using RHM on the full sequence data, a slightly larger number of QTL are detected than by GWAS if the same probability threshold is assumed, but these QTL explain a slightly smaller amount of genetic variance. Our results also suggest that most of the missing heritability is due to the inability to detect variants of moderate effect (∼0.03-0.3 phenotypic SDs) segregating at substantial frequencies. Very rare variants, which are more difficult to detect by GWAS, are expected to contribute little genetic variation, so their eventual detection is less relevant for resolving the missing heritability problem.
Collapse
|
223
|
Mooney MA, Wilmot B. Gene set analysis: A step-by-step guide. Am J Med Genet B Neuropsychiatr Genet 2015; 168:517-27. [PMID: 26059482 PMCID: PMC4638147 DOI: 10.1002/ajmg.b.32328] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 05/20/2015] [Indexed: 12/21/2022]
Abstract
To maximize the potential of genome-wide association studies, many researchers are performing secondary analyses to identify sets of genes jointly associated with the trait of interest. Although methods for gene-set analyses (GSA), also called pathway analyses, have been around for more than a decade, the field is still evolving. There are numerous algorithms available for testing the cumulative effect of multiple SNPs, yet no real consensus in the field about the best way to perform a GSA. This paper provides an overview of the factors that can affect the results of a GSA, the lessons learned from past studies, and suggestions for how to make analysis choices that are most appropriate for different types of data. © 2015 Wiley Periodicals, Inc.
Collapse
|
224
|
Peñagaricano F, Valente BD, Steibel JP, Bates RO, Ernst CW, Khatib H, Rosa GJM. Exploring causal networks underlying fat deposition and muscularity in pigs through the integration of phenotypic, genotypic and transcriptomic data. BMC SYSTEMS BIOLOGY 2015; 9:58. [PMID: 26376630 PMCID: PMC4574162 DOI: 10.1186/s12918-015-0207-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/04/2015] [Indexed: 12/23/2022]
Abstract
BACKGROUND Joint modeling and analysis of phenotypic, genotypic and transcriptomic data have the potential to uncover the genetic control of gene activity and phenotypic variation, as well as shed light on the manner and extent of connectedness among these variables. Current studies mainly report associations, i.e. undirected connections among variables without causal interpretation. Knowledge regarding causal relationships among genes and phenotypes can be used to predict the behavior of complex systems, as well as to optimize management practices and selection strategies. Here, we performed a multistep procedure for inferring causal networks underlying carcass fat deposition and muscularity in pigs using multi-omics data obtained from an F2 Duroc x Pietrain resource pig population. RESULTS We initially explored marginal associations between genotypes and phenotypic and expression traits through whole-genome scans, and then, in genomic regions with multiple significant hits, we assessed gene-phenotype network reconstruction using causal structural learning algorithms. One genomic region on SSC6 showed significant associations with three relevant phenotypes, off-midline10th-rib backfat thickness, loin muscle weight, and average intramuscular fat percentage, and also with the expression of seven genes, including ZNF24, SSX2IP, and AKR7A2. The inferred network indicated that the genotype affects the three phenotypes mainly through the expression of several genes. Among the phenotypes, fat deposition traits negatively affected loin muscle weight. CONCLUSIONS Our findings shed light on the antagonist relationship between carcass fat deposition and lean meat content in pigs. In addition, the procedure described in this study has the potential to unravel gene-phenotype networks underlying complex phenotypes.
Collapse
|
225
|
Norton HL, Edwards M, Krithika S, Johnson M, Werren EA, Parra EJ. Quantitative assessment of skin, hair, and iris variation in a diverse sample of individuals and associated genetic variation. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2015; 160:570-81. [PMID: 27435525 DOI: 10.1002/ajpa.22861] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/27/2015] [Accepted: 08/25/2015] [Indexed: 11/06/2022]
Abstract
OBJECTIVES The main goals of this study are to 1) quantitatively measure skin, hair, and iris pigmentation in a diverse sample of individuals, 2) describe variation within and between these samples, and 3) demonstrate how quantitative measures can facilitate genotype-phenotype association tests. MATERIALS AND METHODS We quantitatively characterize skin, hair, and iris pigmentation using the Melanin (M) Index (skin) and CIELab values (hair) in 1,450 individuals who self-identify as African American, East Asian, European, Hispanic, or South Asian. We also quantify iris pigmentation in a subset of these individuals using CIELab values from high-resolution iris photographs. We compare mean skin M index and hair and iris CIELab values among populations using ANOVA and MANOVA respectively and test for genotype-phenotype associations in the European sample. RESULTS All five populations are significantly different for skin (P <2 × 10(-16) ) and hair color (P <2 × 10(-16) ). Our quantitative analysis of iris and hair pigmentation reinforces the continuous, rather than discrete, nature of these traits. We confirm the association of three loci (rs16891982, rs12203592, and rs12913832) with skin pigmentation and four loci (rs12913832, rs12203592, rs12896399, and rs16891982) with hair pigmentation. Interestingly, the derived rs12203592 T allele located within the IRF4 gene is associated with lighter skin but darker hair color. DISCUSSION The quantitative methods used here provide a fine-scale assessment of pigmentation phenotype and facilitate genotype-phenotype associations, even with relatively small sample sizes. This represents an important expansion of current investigations into pigmentation phenotype and associated genetic variation by including non-European and admixed populations. Am J Phys Anthropol 160:570-581, 2016. © 2015 Wiley Periodicals, Inc.
Collapse
|