151
|
De La Torre AR, Puiu D, Crepeau MW, Stevens K, Salzberg SL, Langley CH, Neale DB. Genomic architecture of complex traits in loblolly pine. THE NEW PHYTOLOGIST 2019; 221:1789-1801. [PMID: 30318590 DOI: 10.1111/nph.15535] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 10/06/2018] [Indexed: 05/02/2023]
Abstract
Dissecting the genetic and genomic architecture of complex traits is essential to understand the forces maintaining the variation in phenotypic traits of ecological and economical importance. Whole-genome resequencing data were used to generate high-resolution polymorphic single nucleotide polymorphism (SNP) markers and genotype individuals from common gardens across the loblolly pine (Pinus taeda) natural range. Genome-wide associations were tested with a large phenotypic dataset comprising 409 variables including morphological traits (height, diameter, carbon isotope discrimination, pitch canker resistance), and molecular traits such as metabolites and expression of xylem development genes. Our study identified 2335 new SNP × trait associations for the species, with many SNPs located in physical clusters in the genome of the species; and the genomic location of hotspots for metabolic × genotype associations. We found a highly polygenic basis of quantitative inheritance, with significant differences in number, effects size, genomic location and frequency of alleles contributing to variation in phenotypes in the different traits. While mutation-selection balance might be shaping the genetic variation in metabolic traits, balancing selection is more likely to shape the variation in expression of xylem development genes. Our work contributes to the study of complex traits in nonmodel plant species by identifying associations at a whole-genome level.
Collapse
|
152
|
Layers of Cryptic Genetic Variation Underlie a Yeast Complex Trait. Genetics 2019; 211:1469-1482. [PMID: 30787041 PMCID: PMC6456305 DOI: 10.1534/genetics.119.301907] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Accepted: 02/14/2019] [Indexed: 01/13/2023] Open
Abstract
To better understand cryptic genetic variation, Lee et al. comprehensively map the genetic basis of a trait that is typically suppressed in a yeast cross. By determining how three different genetic perturbations give rise... Cryptic genetic variation may be an important contributor to heritable traits, but its extent and regulation are not fully understood. Here, we investigate the cryptic genetic variation underlying a Saccharomyces cerevisiae colony phenotype that is typically suppressed in a cross of the laboratory strain BY4716 (BY) and a derivative of the clinical isolate 322134S (3S). To do this, we comprehensively dissect the trait’s genetic basis in the BYx3S cross in the presence of three different genetic perturbations that enable its expression. This allows us to detect and compare the specific loci that interact with each perturbation to produce the trait. In total, we identify 21 loci, all but one of which interact with just a subset of the perturbations. Beyond impacting which loci contribute to the trait, the genetic perturbations also alter the extent of additivity, epistasis, and genotype–environment interaction among the detected loci. Additionally, we show that the single locus interacting with all three perturbations corresponds to the coding region of the cell surface gene FLO11. While nearly all of the other remaining loci influence FLO11 transcription in cis or trans, the perturbations tend to interact with loci in different pathways and subpathways. Our work shows how layers of cryptic genetic variation can influence complex traits. Here, these layers mainly represent different regulatory inputs into the transcription of a single key gene.
Collapse
|
153
|
A Very Oil Yellow1 Modifier of the Oil Yellow1-N1989 Allele Uncovers a Cryptic Phenotypic Impact of Cis-regulatory Variation in Maize. G3-GENES GENOMES GENETICS 2019; 9:375-390. [PMID: 30518539 PMCID: PMC6385977 DOI: 10.1534/g3.118.200798] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Forward genetics determines the function of genes underlying trait variation by identifying the change in DNA responsible for changes in phenotype. Detecting phenotypically-relevant variation outside protein coding sequences and distinguishing this from neutral variants is not trivial; partly because the mechanisms by which DNA polymorphisms in the intergenic regions affect gene regulation are poorly understood. Here we utilized a dominant genetic reporter to investigate the effect of cis and trans-acting regulatory variation. We performed a forward genetic screen for natural variation that suppressed or enhanced the semi-dominant mutant allele Oy1-N1989, encoding the magnesium chelatase subunit I of maize. This mutant permits rapid phenotyping of leaf color as a reporter for chlorophyll accumulation, and mapping of natural variation in maize affecting chlorophyll metabolism. We identified a single modifier locus segregating between B73 and Mo17 that was linked to the reporter gene itself, which we call very oil yellow1 (vey1). Based on the variation in OY1 transcript abundance and genome-wide association data, vey1 is predicted to consist of multiple cis-acting regulatory sequence polymorphisms encoded at the wild-type oy1 alleles. The vey1 locus appears to be a common polymorphism in the maize germplasm that alters the expression level of a key gene in chlorophyll biosynthesis. These vey1 alleles have no discernable impact on leaf chlorophyll in the absence of the Oy1-N1989 reporter. Thus, the use of a mutant as a reporter for magnesium chelatase activity resulted in the detection of expression-level polymorphisms not readily visible in the laboratory.
Collapse
|
154
|
Johannes F, Schmitz RJ. Spontaneous epimutations in plants. THE NEW PHYTOLOGIST 2019; 221:1253-1259. [PMID: 30216456 DOI: 10.1111/nph.15434] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 08/01/2018] [Indexed: 05/22/2023]
Abstract
Contents Summary 1253 I. Introduction 1253 II. What is the rate and molecular spectrum of spontaneous epimutations? 1254 III. Do spontaneous epimutations have phenotypic consequences? 1257 IV. Conclusion and discussion 1258 Acknowledgements 1258 References 1258 SUMMARY: Heritable gains or losses of cytosine methylation can arise stochastically in plant genomes independently of DNA sequence changes. These so-called 'spontaneous epimutations' appear to be a byproduct of imperfect DNA methylation maintenance and epigenome reinforcement events that occur in specialized cell types. There is continued interest in the plant epigenetics community in trying to understand the broader implications of these stochastic events, as some have been shown to induce heritable gene expression changes, shape patterns of methylation diversity within and among plant populations, and appear to be responsive to multi-generational environmental stressors. In this paper we synthesized our current knowledge of the molecular basis and functional consequences of spontaneous epimutations in plants, discuss technical and conceptual challenges, and highlight emerging research directions.
Collapse
|
155
|
Huang M, Liu X, Zhou Y, Summers RM, Zhang Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. Gigascience 2019; 8:5238723. [PMID: 30535326 PMCID: PMC6365300 DOI: 10.1093/gigascience/giy154] [Citation(s) in RCA: 220] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Revised: 06/18/2018] [Accepted: 11/27/2018] [Indexed: 12/15/2022] Open
Abstract
Big datasets, accumulated from biomedical and agronomic studies, provide the potential to identify genes that control complex human diseases and agriculturally important traits through genome-wide association studies (GWAS). However, big datasets also lead to extreme computational challenges, especially when sophisticated statistical models are employed to simultaneously reduce false positives and false negatives. The newly developed fixed and random model circulating probability unification (FarmCPU) method uses a bin method under the assumption that quantitative trait nucleotides (QTNs) are evenly distributed throughout the genome. The estimated QTNs are used to separate a mixed linear model into a computationally efficient fixed effect model (FEM) and a computationally expensive random effect model (REM), which are then used iteratively. To completely eliminate the computationally expensive REM, we replaced REM with FEM by using Bayesian information criteria. To eliminate the requirement that QTNs be evenly distributed throughout the genome, we replaced the bin method with linkage disequilibrium information. The new method is called Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK). Both real and simulated data analyses demonstrated that BLINK improves statistical power compared to FarmCPU, in addition to remarkably reducing computing time. Now, a dataset with one million individuals and one-half million markers can be analyzed within three hours, instead of one week using FarmCPU.
Collapse
|
156
|
Kichaev G, Bhatia G, Loh PR, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am J Hum Genet 2019; 104:65-75. [PMID: 30595370 DOI: 10.1101/222265] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/14/2018] [Indexed: 05/28/2023] Open
Abstract
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Collapse
|
157
|
Kichaev G, Bhatia G, Loh PR, Gazal S, Burch K, Freund MK, Schoech A, Pasaniuc B, Price AL. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am J Hum Genet 2019; 104:65-75. [PMID: 30595370 PMCID: PMC6323418 DOI: 10.1016/j.ajhg.2018.11.008] [Citation(s) in RCA: 543] [Impact Index Per Article: 108.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 11/14/2018] [Indexed: 12/24/2022] Open
Abstract
Functional genomics data has the potential to increase GWAS power by identifying SNPs that have a higher prior probability of association. Here, we introduce a method that leverages polygenic functional enrichment to incorporate coding, conserved, regulatory, and LD-related genomic annotations into association analyses. We show via simulations with real genotypes that the method, functionally informed novel discovery of risk loci (FINDOR), correctly controls the false-positive rate at null loci and attains a 9%-38% increase in the number of independent associations detected at causal loci, depending on trait polygenicity and sample size. We applied FINDOR to 27 independent complex traits and diseases from the interim UK Biobank release (average N = 130K). Averaged across traits, we attained a 13% increase in genome-wide significant loci detected (including a 20% increase for disease traits) compared to unweighted raw p values that do not use functional data. We replicated the additional loci in independent UK Biobank and non-UK Biobank data, yielding a highly statistically significant replication slope (0.66-0.69) in each case. Finally, we applied FINDOR to the full UK Biobank release (average N = 416K), attaining smaller relative improvements (consistent with simulations) but larger absolute improvements, detecting an additional 583 GWAS loci. In conclusion, leveraging functional enrichment using our method robustly increases GWAS power.
Collapse
|
158
|
Sanchez-Roige S, Fontanillas P, Elson SL, Gray JC, de Wit H, Davis LK, MacKillop J, Palmer AA. Genome-wide association study of alcohol use disorder identification test (AUDIT) scores in 20 328 research participants of European ancestry. Addict Biol 2019; 24:121-131. [PMID: 29058377 PMCID: PMC6988186 DOI: 10.1111/adb.12574] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 09/11/2017] [Accepted: 09/25/2017] [Indexed: 12/26/2022]
Abstract
Genetic factors contribute to the risk for developing alcohol use disorder (AUD). In collaboration with the genetics company 23andMe, Inc., we performed a genome-wide association study of the alcohol use disorder identification test (AUDIT), an instrument designed to screen for alcohol misuse over the past year. Our final sample consisted of 20 328 research participants of European ancestry (55.3% females; mean age = 53.8, SD = 16.1) who reported ever using alcohol. Our results showed that the 'chip-heritability' of AUDIT score, when treated as a continuous phenotype, was 12%. No loci reached genome-wide significance. The gene ADH1C, which has been previously implicated in AUD, was among our most significant associations (4.4 × 10-7 ; rs141973904). We also detected a suggestive association on chromosome 1 (2.1 × 10-7 ; rs182344113) near the gene KCNJ9, which has been implicated in mouse models of high ethanol drinking. Using linkage disequilibrium score regression, we identified positive genetic correlations between AUDIT score, high alcohol consumption and cigarette smoking. We also observed an unexpected positive genetic correlation between AUDIT and educational attainment and additional unexpected negative correlations with body mass index/obesity and attention-deficit/hyperactivity disorder. We conclude that conducting a genetic study using responses to an online questionnaire in a population not ascertained for AUD may represent a cost-effective strategy for elucidating aspects of the etiology of AUD.
Collapse
|
159
|
Gagliano SA, Sengupta S, Sidore C, Maschio A, Cucca F, Schlessinger D, Abecasis GR. Relative impact of indels versus SNPs on complex disease. Genet Epidemiol 2018; 43:112-117. [PMID: 30565766 PMCID: PMC6330128 DOI: 10.1002/gepi.22175] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/07/2018] [Accepted: 10/29/2018] [Indexed: 11/30/2022]
Abstract
It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single‐nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well‐phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping‐associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.
Collapse
|
160
|
Abstract
Mutations are the root source of genetic variation and underlie the process of evolution. Although the rates at which mutations occur vary considerably between species, little is known about differences within species, or the genetic and molecular basis of these differences. Here, we leveraged the power of the yeast Saccharomyces cerevisiae as a model system to uncover natural genetic variants that underlie variation in mutation rate. We developed a high-throughput fluctuation assay and used it to quantify mutation rates in seven natural yeast isolates and in 1040 segregant progeny from a cross between BY, a laboratory strain, and RM, a wine strain. We observed that mutation rate varies among yeast strains and is heritable (H2 = 0.49). We performed linkage mapping in the segregants and identified four quantitative trait loci underlying mutation rate variation in the cross. We fine-mapped two quantitative trait loci to the underlying causal genes, RAD5 and MKT1, that contribute to mutation rate variation. These genes also underlie sensitivity to the DNA-damaging agents 4NQO and MMS, suggesting a connection between spontaneous mutation rate and mutagen sensitivity.
Collapse
|
161
|
Bellot P, de Los Campos G, Pérez-Enciso M. Can Deep Learning Improve Genomic Prediction of Complex Human Traits? Genetics 2018; 210:809-819. [PMID: 30171033 PMCID: PMC6218236 DOI: 10.1534/genetics.118.301298] [Citation(s) in RCA: 91] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 08/24/2018] [Indexed: 11/18/2022] Open
Abstract
The genetic analysis of complex traits does not escape the current excitement around artificial intelligence, including a renewed interest in "deep learning" (DL) techniques such as Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs). However, the performance of DL for genomic prediction of complex human traits has not been comprehensively tested. To provide an evaluation of MLPs and CNNs, we used data from distantly related white Caucasian individuals (n ∼100k individuals, m ∼500k SNPs, and k = 1000) of the interim release of the UK Biobank. We analyzed a total of five phenotypes: height, bone heel mineral density, body mass index, systolic blood pressure, and waist-hip ratio, with genomic heritabilities ranging from ∼0.20 to 0.70. After hyperparameter optimization using a genetic algorithm, we considered several configurations, from shallow to deep learners, and compared the predictive performance of MLPs and CNNs with that of Bayesian linear regressions across sets of SNPs (from 10k to 50k) that were preselected using single-marker regression analyses. For height, a highly heritable phenotype, all methods performed similarly, although CNNs were slightly but consistently worse. For the rest of the phenotypes, the performance of some CNNs was comparable or slightly better than linear methods. Performance of MLPs was highly dependent on SNP set and phenotype. In all, over the range of traits evaluated in this study, CNN performance was competitive to linear models, but we did not find any case where DL outperformed the linear model by a sizable margin. We suggest that more research is needed to adapt CNN methodology, originally motivated by image analysis, to genetic-based problems in order for CNNs to be competitive with linear models.
Collapse
|
162
|
Hannon E, Gorrie-Stone TJ, Smart MC, Burrage J, Hughes A, Bao Y, Kumari M, Schalkwyk LC, Mill J. Leveraging DNA-Methylation Quantitative-Trait Loci to Characterize the Relationship between Methylomic Variation, Gene Expression, and Complex Traits. Am J Hum Genet 2018; 103:654-665. [PMID: 30401456 PMCID: PMC6217758 DOI: 10.1016/j.ajhg.2018.09.007] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 09/14/2018] [Indexed: 11/23/2022] Open
Abstract
Characterizing the complex relationship between genetic, epigenetic, and transcriptomic variation has the potential to increase understanding about the mechanisms underpinning health and disease phenotypes. We undertook a comprehensive analysis of common genetic variation on DNA methylation (DNAm) by using the Illumina EPIC array to profile samples from the UK Household Longitudinal study. We identified 12,689,548 significant DNA methylation quantitative trait loci (mQTL) associations (p < 6.52 × 10-14) occurring between 2,907,234 genetic variants and 93,268 DNAm sites, including a large number not identified by previous DNAm-profiling methods. We demonstrate the utility of these data for interpreting the functional consequences of common genetic variation associated with > 60 human traits by using summary-data-based Mendelian randomization (SMR) to identify 1,662 pleiotropic associations between 36 complex traits and 1,246 DNAm sites. We also use SMR to characterize the relationship between DNAm and gene expression and thereby identify 6,798 pleiotropic associations between 5,420 DNAm sites and the transcription of 1,702 genes. Our mQTL database and SMR results are available via a searchable online database as a resource to the research community.
Collapse
|
163
|
Phenotype-Specific Enrichment of Mendelian Disorder Genes near GWAS Regions across 62 Complex Traits. Am J Hum Genet 2018; 103:535-552. [PMID: 30290150 DOI: 10.1016/j.ajhg.2018.08.017] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 08/28/2018] [Indexed: 01/29/2023] Open
Abstract
Although recent studies provide evidence for a common genetic basis between complex traits and Mendelian disorders, a thorough quantification of their overlap in a phenotype-specific manner remains elusive. Here, we have quantified the overlap of genes identified through large-scale genome-wide association studies (GWASs) for 62 complex traits and diseases with genes containing mutations known to cause 20 broad categories of Mendelian disorders. We identified a significant enrichment of genes linked to phenotypically matched Mendelian disorders in GWAS gene sets; of the total 1,240 comparisons, a higher proportion of phenotypically matched or related pairs (n = 50 of 92 [54%]) than phenotypically unmatched pairs (n = 27 of 1,148 [2%]) demonstrated significant overlap, confirming a phenotype-specific enrichment pattern. Further, we observed elevated GWAS effect sizes near genes linked to phenotypically matched Mendelian disorders. Finally, we report examples of GWAS variants localized at the transcription start site or physically interacting with the promoters of genes linked to phenotypically matched Mendelian disorders. Our results are consistent with the hypothesis that genes that are disrupted in Mendelian disorders are dysregulated by non-coding variants in complex traits and demonstrate how leveraging findings from related Mendelian disorders and functional genomic datasets can prioritize genes that are putatively dysregulated by local and distal non-coding GWAS variants.
Collapse
|
164
|
Bhatta M, Morgounov A, Belamkar V, Baenziger PS. Genome-Wide Association Study Reveals Novel Genomic Regions for Grain Yield and Yield-Related Traits in Drought-Stressed Synthetic Hexaploid Wheat. Int J Mol Sci 2018; 19:E3011. [PMID: 30279375 PMCID: PMC6212811 DOI: 10.3390/ijms19103011] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 09/27/2018] [Accepted: 09/29/2018] [Indexed: 01/09/2023] Open
Abstract
Synthetic hexaploid wheat (SHW; 2n = 6x = 42, AABBDD, Triticum aestivum L.) is produced from an interspecific cross between durum wheat (2n = 4x = 28, AABB, T. turgidum L.) and goat grass (2n = 2x = 14, DD, Aegilops tauschii Coss.) and is reported to have significant novel alleles-controlling biotic and abiotic stresses resistance. A genome-wide association study (GWAS) was conducted to unravel these loci [marker⁻trait associations (MTAs)] using 35,648 genotyping-by-sequencing-derived single nucleotide polymorphisms in 123 SHWs. We identified 90 novel MTAs (45, 11, and 34 on the A, B, and D genomes, respectively) and haplotype blocks associated with grain yield and yield-related traits including root traits under drought stress. The phenotypic variance explained by the MTAs ranged from 1.1% to 32.3%. Most of the MTAs (120 out of 194) identified were found in genes, and of these 45 MTAs were in genes annotated as having a potential role in drought stress. This result provides further evidence for the reliability of MTAs identified. The large number of MTAs (53) identified especially on the D-genome demonstrate the potential of SHWs for elucidating the genetic architecture of complex traits and provide an opportunity for further improvement of wheat under rapidly changing climatic conditions.
Collapse
|
165
|
de Los Campos G, Vazquez AI, Hsu S, Lello L. Complex-Trait Prediction in the Era of Big Data. Trends Genet 2018; 34:746-754. [PMID: 30139641 PMCID: PMC6150788 DOI: 10.1016/j.tig.2018.07.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 07/09/2018] [Accepted: 07/16/2018] [Indexed: 01/18/2023]
Abstract
Accurate prediction of complex traits requires using a large number of DNA variants. Advances in statistical and machine learning methodology enable the identification of complex patterns in high-dimensional settings. However, training these highly parameterized methods requires very large data sets. Until recently, such data sets were not available. But the situation is changing rapidly as very large biomedical data sets comprising individual genotype-phenotype data for hundreds of thousands of individuals become available in public and private domains. We argue that the convergence of advances in methodology and the advent of Big Genomic Data will enable unprecedented improvements in complex-trait prediction; we review theory and evidence supporting our claim and discuss challenges and opportunities that Big Data will bring to complex-trait prediction.
Collapse
|
166
|
Lello L, Avery SG, Tellier L, Vazquez AI, de Los Campos G, Hsu SDH. Accurate Genomic Prediction of Human Height. Genetics 2018; 210:477-497. [PMID: 30150289 PMCID: PMC6216598 DOI: 10.1534/genetics.118.301267] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2018] [Accepted: 08/01/2018] [Indexed: 01/08/2023] Open
Abstract
We construct genomic predictors for heritable but extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). The constructed predictors explain, respectively, ∼40, 20, and 9% of total variance for the three traits, in data not used for training. For example, predicted heights correlate ∼0.65 with actual height; actual heights of most individuals in validation samples are within a few centimeters of the prediction. The proportion of variance explained for height is comparable to the estimated common SNP heritability from genome-wide complex trait analysis (GCTA), and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for SNPs. Thus, our results close the gap between prediction R-squared and common SNP heritability. The ∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common variants. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier genome-wide association studies (GWAS) for out-of-sample validation of our results.
Collapse
|
167
|
Galván-Femenía I, Obón-Santacana M, Piñeyro D, Guindo-Martinez M, Duran X, Carreras A, Pluvinet R, Velasco J, Ramos L, Aussó S, Mercader JM, Puig L, Perucho M, Torrents D, Moreno V, Sumoy L, de Cid R. Multitrait genome association analysis identifies new susceptibility genes for human anthropometric variation in the GCAT cohort. J Med Genet 2018; 55:765-778. [PMID: 30166351 PMCID: PMC6252362 DOI: 10.1136/jmedgenet-2018-105437] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 07/19/2018] [Accepted: 07/21/2018] [Indexed: 12/22/2022]
Abstract
Background Heritability estimates have revealed an important contribution of SNP variants for most common traits; however, SNP analysis by single-trait genome-wide association studies (GWAS) has failed to uncover their impact. In this study, we applied a multitrait GWAS approach to discover additional factor of the missing heritability of human anthropometric variation. Methods We analysed 205 traits, including diseases identified at baseline in the GCAT cohort (Genomes For Life- Cohort study of the Genomes of Catalonia) (n=4988), a Mediterranean adult population-based cohort study from the south of Europe. We estimated SNP heritability contribution and single-trait GWAS for all traits from 15 million SNP variants. Then, we applied a multitrait-related approach to study genome-wide association to anthropometric measures in a two-stage meta-analysis with the UK Biobank cohort (n=336 107). Results Heritability estimates (eg, skin colour, alcohol consumption, smoking habit, body mass index, educational level or height) revealed an important contribution of SNP variants, ranging from 18% to 77%. Single-trait analysis identified 1785 SNPs with genome-wide significance threshold. From these, several previously reported single-trait hits were confirmed in our sample with LINC01432 (p=1.9×10−9) variants associated with male baldness, LDLR variants with hyperlipidaemia (ICD-9:272) (p=9.4×10−10) and variants in IRF4 (p=2.8×10−57), SLC45A2 (p=2.2×10−130), HERC2 (p=2.8×10−176), OCA2 (p=2.4×10−121) and MC1R (p=7.7×10−22) associated with hair, eye and skin colour, freckling, tanning capacity and sun burning sensitivity and the Fitzpatrick phototype score, all highly correlated cross-phenotypes. Multitrait meta-analysis of anthropometric variation validated 27 loci in a two-stage meta-analysis with a large British ancestry cohort, six of which are newly reported here (p value threshold <5×10−9) at ZRANB2-AS2, PIK3R1, EPHA7, MAD1L1, CACUL1 and MAP3K9. Conclusion Considering multiple-related genetic phenotypes improve associated genome signal detection. These results indicate the potential value of data-driven multivariate phenotyping for genetic studies in large population-based cohorts to contribute to knowledge of complex traits.
Collapse
|
168
|
Paré G, Mao S, Deng WQ. A robust method to estimate regional polygenic correlation under misspecified linkage disequilibrium structure. Genet Epidemiol 2018; 42:636-647. [PMID: 30156736 DOI: 10.1002/gepi.22149] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 06/04/2018] [Accepted: 06/18/2018] [Indexed: 01/20/2023]
Abstract
Complex traits can share a substantial proportion of their polygenic heritability. However, genome-wide polygenic correlations between pairs of traits can mask heterogeneity in their shared polygenic effects across loci. We propose a novel method (weighted maximum likelihood-regional polygenic correlation [RPC]) to evaluate polygenic correlation between two complex traits in small genomic regions using summary association statistics. Our method tests for evidence that the polygenic effect at a given region affects two traits concurrently. We show through simulations that our method is well calibrated, powerful, and more robust to misspecification of linkage disequilibrium than other methods under a polygenic model. As small genomic regions are more likely to harbor specific genetic effects, our method is ideal to identify heterogeneity in shared polygenic correlation across regions. We illustrate the usefulness of our method by addressing two questions related to cardiometabolic traits. First, we explored how RPC can inform on the strong epidemiological association between high-density lipoprotein cholesterol and coronary artery disease (CAD), suggesting a key role for triglycerides metabolism. Second, we investigated the potential role of PPARγ activators in the prevention of CAD. Our results provide a compelling argument that shared heritability between complex traits is highly heterogeneous across loci.
Collapse
|
169
|
Blue EE, Yu CE, Thornton TA, Chapman NH, Kernfeld E, Jiang N, Shively KM, Buckingham KJ, Marvin CT, Bamshad MJ, Bird TD, Wijsman EM. Variants regulating ZBTB4 are associated with age-at-onset of Alzheimer's disease. GENES, BRAIN, AND BEHAVIOR 2018; 17:e12429. [PMID: 29045054 PMCID: PMC5902667 DOI: 10.1111/gbb.12429] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 10/11/2017] [Accepted: 10/12/2017] [Indexed: 01/01/2023]
Abstract
The identification of novel genetic modifiers of age-at-onset (AAO) of Alzheimer's disease (AD) could advance our understanding of AD and provide novel therapeutic targets. A previous genome scan for modifiers of AAO among families affected by early-onset AD caused by the PSEN2 N141I variant identified 2 loci with significant evidence for linkage: 1q23.3 and 17p13.2. Here, we describe the fine-mapping of these 2 linkage regions, and test for replication in 6 independent datasets. By fine-mapping these linkage signals in a single large family, we reduced the linkage regions to 11% their original size and nominated 54 candidate variants. Among the 11 variants associated with AAO of AD in a larger sample of Germans from Russia, the strongest evidence implicated promoter variants influencing NCSTN on 1q23.3 and ZBTB4 on 17p13.2. The association between ZBTB4 and AAO of AD was replicated by multiple variants in independent, trans-ethnic datasets. Our results show association between AAO of AD and both ZBTB4 and NCSTN. ZBTB4 is a transcriptional repressor that regulates the cell cycle, including the apoptotic response to amyloid beta, while NCSTN is part of the gamma secretase complex, known to influence amyloid beta production. These genes therefore suggest important roles for amyloid beta and cell cycle pathways in AAO of AD.
Collapse
|
170
|
Gianola D, Cecchinato A, Naya H, Schön CC. Prediction of Complex Traits: Robust Alternatives to Best Linear Unbiased Prediction. Front Genet 2018; 9:195. [PMID: 29951082 PMCID: PMC6008589 DOI: 10.3389/fgene.2018.00195] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2018] [Accepted: 05/14/2018] [Indexed: 12/05/2022] Open
Abstract
A widely used method for prediction of complex traits in animal and plant breeding is “genomic best linear unbiased prediction” (GBLUP). In a quantitative genetics setting, BLUP is a linear regression of phenotypes on a pedigree or on a genomic relationship matrix, depending on the type of input information available. Normality of the distributions of random effects and of model residuals is not required for BLUP but a Gaussian assumption is made implicitly. A potential downside is that Gaussian linear regressions are sensitive to outliers, genetic or environmental in origin. We present simple (relative to a fully Bayesian analysis) to implement robust alternatives to BLUP using a linear model with residual t or Laplace distributions instead of a Gaussian one, and evaluate the methods with milk yield records on Italian Brown Swiss cattle, grain yield data in inbred wheat lines, and using three traits measured on accessions of Arabidopsis thaliana. The methods do not use Markov chain Monte Carlo sampling and model hyper-parameters, viewed here as regularization “knobs,” are tuned via some cross-validation. Uncertainty of predictions are evaluated by employing bootstrapping or by random reconstruction of training and testing sets. It was found (e.g., test-day milk yield in cows, flowering time and FRIGIDA expression in Arabidopsis) that the best predictions were often those obtained with the robust methods. The results obtained are encouraging and stimulate further investigation and generalization.
Collapse
|
171
|
Racimo F, Berg JJ, Pickrell JK. Detecting Polygenic Adaptation in Admixture Graphs. Genetics 2018; 208:1565-1584. [PMID: 29348143 PMCID: PMC5887149 DOI: 10.1534/genetics.117.300489] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 01/16/2018] [Indexed: 01/09/2023] Open
Abstract
An open question in human evolution is the importance of polygenic adaptation: adaptive changes in the mean of a multifactorial trait due to shifts in allele frequencies across many loci. In recent years, several methods have been developed to detect polygenic adaptation using loci identified in genome-wide association studies (GWAS). Though powerful, these methods suffer from limited interpretability: they can detect which sets of populations have evidence for polygenic adaptation, but are unable to reveal where in the history of multiple populations these processes occurred. To address this, we created a method to detect polygenic adaptation in an admixture graph, which is a representation of the historical divergences and admixture events relating different populations through time. We developed a Markov chain Monte Carlo (MCMC) algorithm to infer branch-specific parameters reflecting the strength of selection in each branch of a graph. Additionally, we developed a set of summary statistics that are fast to compute and can indicate which branches are most likely to have experienced polygenic adaptation. We show via simulations that this method-which we call PolyGraph-has good power to detect polygenic adaptation, and applied it to human population genomic data from around the world. We also provide evidence that variants associated with several traits, including height, educational attainment, and self-reported unibrow, have been influenced by polygenic adaptation in different populations during human evolution.
Collapse
|
172
|
A Simple Test Identifies Selection on Complex Traits. Genetics 2018; 209:321-333. [PMID: 29545467 DOI: 10.1534/genetics.118.300857] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/10/2018] [Indexed: 11/18/2022] Open
Abstract
Important traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome-wide association studies and selection-mapping protocols, are designed to identify individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique uses additive-effects estimates from all available markers, and relates these estimates to allele-frequency change over time. Using this information, we generate a composite statistic, denoted [Formula: see text] which can be used to test for significant evidence of selection on a trait. Our test requires pre- and postselection genotypic data but only a single time point with phenotypic information. Simulations demonstrate that [Formula: see text] is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection.
Collapse
|
173
|
Ashbrook DG, Mulligan MK, Williams RW. Post-genomic behavioral genetics: From revolution to routine. GENES, BRAIN, AND BEHAVIOR 2018; 17:e12441. [PMID: 29193773 PMCID: PMC5876106 DOI: 10.1111/gbb.12441] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 11/02/2017] [Accepted: 11/20/2017] [Indexed: 12/16/2022]
Abstract
What was once expensive and revolutionary-full-genome sequence-is now affordable and routine. Costs will continue to drop, opening up new frontiers in behavioral genetics. This shift in costs from the genome to the phenome is most notable in large clinical studies of behavior and associated diseases in cohorts that exceed hundreds of thousands of subjects. Examples include the Women's Health Initiative (www.whi.org), the Million Veterans Program (www. RESEARCH va.gov/MVP), the 100 000 Genomes Project (genomicsengland.co.uk) and commercial efforts such as those by deCode (www.decode.com) and 23andme (www.23andme.com). The same transition is happening in experimental neuro- and behavioral genetics, and sample sizes of many hundreds of cases are becoming routine (www.genenetwork.org, www.mousephenotyping.org). There are two major consequences of this new affordability of massive omics datasets: (1) it is now far more practical to explore genetic modulation of behavioral differences and the key role of gene-by-environment interactions. Researchers are already doing the hard part-the quantitative analysis of behavior. Adding the omics component can provide powerful links to molecules, cells, circuits and even better treatment. (2) There is an acute need to highlight and train behavioral scientists in how best to exploit new omics approaches. This review addresses this second issue and highlights several new trends and opportunities that will be of interest to experts in animal and human behaviors.
Collapse
|
174
|
Polimanti R, Gelernter J. ADH1B: From alcoholism, natural selection, and cancer to the human phenome. Am J Med Genet B Neuropsychiatr Genet 2018; 177:113-125. [PMID: 28349588 PMCID: PMC5617762 DOI: 10.1002/ajmg.b.32523] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022]
Abstract
The ADH1B (Alcohol Dehydrogenase 1B (class I), Beta Polypeptide) gene and its best-known functional alleles, Arg48His (rs1229984, ADH1B*2) and Arg370Cys (rs2066702, ADH1B*3), have been investigated in relation to many phenotypic traits; most frequently including alcohol metabolism and alcohol drinking behaviors, but also human evolution, liver function, cancer, and, recently, the comprehensive human phenome. To understand ADH1B functions and consequences, we provide here a bioinformatic analysis of its gene regulation and molecular functions, literature review of studies focused on this gene, and a discussion regarding future research perspectives. Certain ADH1B alleles have large effects on alcohol metabolism, and this relationship particularly encourages further investigations in relation to alcoholism and alcohol-associated cancer to understand better the mechanisms by which alcohol metabolism contributes to alcohol abuse and carcinogenesis. We also observed that ADH1B has complex mechanisms that regulate its expression across multiple human tissues, and these may be involved in cardiac and metabolic traits. Evolutionary data strongly suggest that the selection signatures at the ADH1B locus are primarily related to effects other than those on alcohol metabolism. This is also supported by the involvement of ADH1B in multiple molecular pathways and by the findings of our recent phenome-wide association study. Accordingly, future studies should also investigate other functions of ADH1B potentially relevant for the human phenome. © 2017 Wiley Periodicals, Inc.
Collapse
|
175
|
Yadav A, Sinha H. Gene-gene and gene-environment interactions in complex traits in yeast. Yeast 2018; 35:403-416. [PMID: 29322552 DOI: 10.1002/yea.3304] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Revised: 12/11/2017] [Accepted: 12/23/2017] [Indexed: 01/05/2023] Open
Abstract
One of the fundamental questions in biology is how the genotype regulates the phenotype. An increasing number of studies indicate that, in most cases, the effect of a genetic locus on the phenotype is context-dependent, i.e. it is influenced by the genetic background and the environment in which the phenotype is measured. Still, the majority of the studies, in both model organisms and humans, that map the genetic regulation of phenotypic variation in complex traits primarily identify additive loci with independent effects. This does not reflect an absence of the contribution of genetic interactions to phenotypic variation, but instead is a consequence of the technical limitations in mapping gene-gene interactions (GGI) and gene-environment interactions (GEI). Yeast, with its detailed molecular understanding, diverse population genomics and ease of genetic manipulation, is a unique and powerful resource to study the contributions of GGI and GEI in the regulation of phenotypic variation. Here we review studies in yeast that have identified GGI and GEI that regulate phenotypic variation, and discuss the contribution of these findings in explaining missing heritability of complex traits, and how observations from these GGI and GEI studies enhance our understanding of the mechanisms underlying genetic robustness and adaptability that shape the architecture of the genotype-phenotype map.
Collapse
|