1
|
Tsouris A, Brach G, Friedrich A, Hou J, Schacherer J. Diallel panel reveals a significant impact of low-frequency genetic variants on gene expression variation in yeast. Mol Syst Biol 2024; 20:362-373. [PMID: 38355920 PMCID: PMC10987670 DOI: 10.1038/s44320-024-00021-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 01/29/2024] [Accepted: 01/30/2024] [Indexed: 02/16/2024] Open
Abstract
Unraveling the genetic sources of gene expression variation is essential to better understand the origins of phenotypic diversity in natural populations. Genome-wide association studies identified thousands of variants involved in gene expression variation, however, variants detected only explain part of the heritability. In fact, variants such as low-frequency and structural variants (SVs) are poorly captured in association studies. To assess the impact of these variants on gene expression variation, we explored a half-diallel panel composed of 323 hybrids originated from pairwise crosses of 26 natural Saccharomyces cerevisiae isolates. Using short- and long-read sequencing strategies, we established an exhaustive catalog of single nucleotide polymorphisms (SNPs) and SVs for this panel. Combining this dataset with the transcriptomes of all hybrids, we comprehensively mapped SNPs and SVs associated with gene expression variation. While SVs impact gene expression variation, SNPs exhibit a higher effect size with an overrepresentation of low-frequency variants compared to common ones. These results reinforce the importance of dissecting the heritability of complex traits with a comprehensive catalog of genetic variants at the population level.
Collapse
Affiliation(s)
- Andreas Tsouris
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Gauthier Brach
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Jing Hou
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France.
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France.
- Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
2
|
Jablonszky M, Canal D, Hegyi G, Herényi M, Laczi M, Markó G, Nagy G, Rosivall B, Szöllősi E, Török J, Garamszegi LZ. The estimation of additive genetic variance of body size in a wild passerine is sensitive to the method used to estimate relatedness among the individuals. Ecol Evol 2024; 14:e10981. [PMID: 38352200 PMCID: PMC10862163 DOI: 10.1002/ece3.10981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 01/17/2024] [Accepted: 01/23/2024] [Indexed: 02/16/2024] Open
Abstract
Assessing additive genetic variance is a crucial step in predicting the evolutionary response of a target trait. However, the estimated genetic variance may be sensitive to the methodology used, e.g., the way relatedness is assessed among the individuals, especially in wild populations where social pedigrees can be inaccurate. To investigate this possibility, we investigated the additive genetic variance in tarsus length, a major proxy of skeletal body size in birds. The model species was the collared flycatcher (Ficedula albicollis), a socially monogamous but genetically polygamous migratory passerine. We used two relatedness matrices to estimate the genetic variance: (1) based solely on social links and (2) a genetic similarity matrix based on a large array of single-nucleotide polymorphisms (SNPs). Depending on the relatedness matrix considered, we found moderate to high additive genetic variance and heritability estimates for tarsus length. In particular, the heritability estimates were higher when obtained with the genetic similarity matrix instead of the social pedigree. Our results confirm the potential for this crucial trait to respond to selection and highlight methodological concerns when calculating additive genetic variance and heritability in phenotypic traits. We conclude that using a social pedigree instead of a genetic similarity matrix to estimate relatedness among individuals in a genetically polygamous wild population may significantly deflate the estimates of additive genetic variation.
Collapse
Affiliation(s)
- Mónika Jablonszky
- Evolutionary Ecology Research GroupInstitute of Ecology and Botany, HUN_REN Centre for Ecological ResearchVácrátotHungary
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - David Canal
- Department of Evolutionary EcologyNational Museum of Natural Sciences (MNCN‐CSIC)MadridSpain
| | - Gergely Hegyi
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - Márton Herényi
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
- Department of Zoology and EcologyHungarian University of Agriculture and Life SciencesGodolloHungary
| | - Miklós Laczi
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
- HUN‐REN‐ELTE‐MTM Integrative Ecology Research GroupBudapestHungary
| | - Gábor Markó
- Department of Plant Pathology, Institute of Plant ProtectionHungarian University of Agriculture and Life SciencesBudapestHungary
| | - Gergely Nagy
- Evolutionary Ecology Research GroupInstitute of Ecology and Botany, HUN_REN Centre for Ecological ResearchVácrátotHungary
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - Balázs Rosivall
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - Eszter Szöllősi
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - János Török
- Behavioural Ecology Group, Department of Systematic Zoology and EcologyELTE Eötvös Loránd UniversityBudapestHungary
| | - László Zsolt Garamszegi
- Evolutionary Ecology Research GroupInstitute of Ecology and Botany, HUN_REN Centre for Ecological ResearchVácrátotHungary
| |
Collapse
|
3
|
Tsouris A, Brach G, Friedrich A, Hou J, Schacherer J. Diallel panel reveals a significant impact of low-frequency genetic variants on gene expression variation in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550015. [PMID: 37503053 PMCID: PMC10370210 DOI: 10.1101/2023.07.21.550015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Unraveling the genetic sources of gene expression variation is essential to better understand the origins of phenotypic diversity in natural populations. Genome-wide association studies identified thousands of variants involved in gene expression variation, however, variants detected only explain part of the heritability. In fact, variants such as low-frequency and structural variants (SVs) are poorly captured in association studies. To assess the impact of these variants on gene expression variation, we explored a half-diallel panel composed of 323 hybrids originated from pairwise crosses of 26 natural Saccharomyces cerevisiae isolates. Using short- and long-read sequencing strategies, we established an exhaustive catalog of single nucleotide polymorphisms (SNPs) and SVs for this panel. Combining this dataset with the transcriptomes of all hybrids, we comprehensively mapped SNPs and SVs associated with gene expression variation. While SVs impact gene expression variation, SNPs exhibit a higher effect size with an overrepresentation of low-frequency variants compared to common ones. These results reinforce the importance of dissecting the heritability of complex traits with a comprehensive catalog of genetic variants at the population level.
Collapse
Affiliation(s)
- Andreas Tsouris
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Gauthier Brach
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Jing Hou
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
- Institut Universitaire de France (IUF), Paris, France
| |
Collapse
|
4
|
Li L, Tian Z, Chen J, Tan Z, Zhang Y, Zhao H, Wu X, Yao X, Wen W, Chen W, Guo L. Characterization of novel loci controlling seed oil content in Brassica napus by marker metabolite-based multi-omics analysis. Genome Biol 2023; 24:141. [PMID: 37337206 DOI: 10.1186/s13059-023-02984-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 06/08/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Seed oil content is an important agronomic trait of Brassica napus (B. napus), and metabolites are considered as the bridge between genotype and phenotype for physical traits. RESULTS Using a widely targeted metabolomics analysis in a natural population of 388 B. napus inbred lines, we quantify 2172 metabolites in mature seeds by liquid chromatography mass spectrometry, in which 131 marker metabolites are identified to be correlated with seed oil content. These metabolites are then selected for further metabolite genome-wide association study and metabolite transcriptome-wide association study. Combined with weighted correlation network analysis, we construct a triple relationship network, which includes 21,000 edges and 4384 nodes among metabolites, metabolite quantitative trait loci, genes, and co-expression modules. We validate the function of BnaA03.TT4, BnaC02.TT4, and BnaC05.UK, three candidate genes predicted by multi-omics analysis, which show significant impacts on seed oil content through regulating flavonoid metabolism in B. napus. CONCLUSIONS This study demonstrates the advantage of utilizing marker metabolites integrated with multi-omics analysis to dissect the genetic basis of agronomic traits in crops.
Collapse
Affiliation(s)
- Long Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Zhitao Tian
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Jie Chen
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Zengdong Tan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Yuting Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Hu Zhao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Xiaowei Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Xuan Yao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Weiwei Wen
- Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, China
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| | - Liang Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China.
- Hubei Hongshan Laboratory, Wuhan, China.
- Shenzhen Institute of Nutrition and Health, Huazhong Agricultural University, Wuhan, China.
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China.
| |
Collapse
|
5
|
Evans LM, Arehart CH, Grotzinger AD, Mize TJ, Brasher MS, Stitzel JA, Ehringer MA, Hoeffer CA. Transcriptome-wide gene-gene interaction associations elucidate pathways and functional enrichment of complex traits. PLoS Genet 2023; 19:e1010693. [PMID: 37216417 PMCID: PMC10237671 DOI: 10.1371/journal.pgen.1010693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 06/02/2023] [Accepted: 03/06/2023] [Indexed: 05/24/2023] Open
Abstract
It remains unknown to what extent gene-gene interactions contribute to complex traits. Here, we introduce a new approach using predicted gene expression to perform exhaustive transcriptome-wide interaction studies (TWISs) for multiple traits across all pairs of genes expressed in several tissue types. Using imputed transcriptomes, we simultaneously reduce the computational challenge and improve interpretability and statistical power. We discover (in the UK Biobank) and replicate (in independent cohorts) several interaction associations, and find several hub genes with numerous interactions. We also demonstrate that TWIS can identify novel associated genes because genes with many or strong interactions have smaller single-locus model effect sizes. Finally, we develop a method to test gene set enrichment of TWIS associations (E-TWIS), finding numerous pathways and networks enriched in interaction associations. Epistasis is may be widespread, and our procedure represents a tractable framework for beginning to explore gene interactions and identify novel genomic targets.
Collapse
Affiliation(s)
- Luke M. Evans
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Christopher H. Arehart
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Andrew D. Grotzinger
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Psychology & Neuroscience, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Travis J. Mize
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Maizy S. Brasher
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Ecology & Evolutionary Biology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Jerry A. Stitzel
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Marissa A. Ehringer
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Charles A. Hoeffer
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
- Department of Integrative Physiology, University of Colorado Boulder, Boulder, Colorado, United States of America
| |
Collapse
|
6
|
Chen J, Zhang Y, Yin H, Liu W, Hu X, Li D, Lan C, Gao L, He Z, Cui F, Fernie AR, Chen W. The pathway of melatonin biosynthesis in common wheat (Triticum aestivum). J Pineal Res 2023; 74:e12841. [PMID: 36396897 PMCID: PMC10078269 DOI: 10.1111/jpi.12841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/25/2022] [Accepted: 11/12/2022] [Indexed: 11/19/2022]
Abstract
Melatonin (Mel) is a multifunctional biomolecule found in both animals and plants. In plants, the biosynthesis of Mel from tryptophan (Trp) has been delineated to comprise of four consecutive reactions. However, while the genes encoding these enzymes in rice are well characterized no systematic evaluation of the overall pathway has, as yet, been published for wheat. In the current study, the relative contents of six Mel-pathway-intermediates including Trp, tryptamine (Trm), serotonin (Ser), 5-methoxy tryptamine (5M-Trm), N-acetyl serotonin (NAS) and Mel, were determined in 24 independent tissues spanning the lifetime of wheat. These studies indicated that Trp was the most abundant among the six metabolites, followed by Trm and Ser. Next, the candidate genes expressing key enzymes involved in the Mel pathway were explored by means of metabolite-based genome-wide association study (mGWAS), wherein two TDC genes, a T5H gene and one SNAT gene were identified as being important for the accumulation of Mel pathway metabolites. Moreover, a 463-bp insertion within the T5H gene was discovered that may be responsible for variation in Ser content. Finally, a ASMT gene was found via sequence alignment against its rice homolog. Validations of these candidate genes were performed by in vitro enzymatic reactions using proteins purified following recombinant expression in Escherichia coli, transient gene expression in tobacco, and transgenic approaches in wheat. Our results thus provide the first comprehensive investigation into the Mel pathway metabolites, and a swift candidate gene identification via forward-genetics strategies, in common wheat.
Collapse
Affiliation(s)
- Jie Chen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Yueqi Zhang
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Huanran Yin
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Wei Liu
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Xin Hu
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| | - Dongqin Li
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
| | | | - Lifeng Gao
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhonghu He
- Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Fa Cui
- Wheat Molecular Breeding Innovation Research Group, Key Laboratory of Molecular Module-Based Breeding of High Yield and Abiotic Resistant Plants in Universities of Shandong, School of Agriculture, Ludong University, Yantai, China
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Potsdam-Golm, Germany
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, Wuhan, China
- Hubei Hongshan Laboratory, Wuhan, China
| |
Collapse
|
7
|
Jablonszky M, Canal D, Hegyi G, Herényi M, Laczi M, Lao O, Markó G, Nagy G, Rosivall B, Szász E, Török J, Zsebõk S, Garamszegi LZ. Estimating heritability of song considering within-individual variance in a wild songbird: The collared flycatcher. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.975687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Heritable genetic variation is a prerequisite for adaptive evolution; however, our knowledge about the heritability of plastic traits, such as behaviors, is scarce, especially in wild populations. In this study, we investigated the heritability of song traits in the collared flycatcher (Ficedula albicollis), a small oscine passerine with complex songs involved in sexual selection. We recorded the songs of 81 males in a natural population and obtained various measures describing the frequency, temporal organization, and complexity of each song. As we had multiple songs from each individual, we were able to statistically account for the first time for the effect of within-individual variance on the heritability of song. Heritability was calculated from the variance estimates of animal models relying on a genetic similarity matrix based on Single Nucleotide Polymorphism screening. Overall, we found small additive genetic variance and heritability values in all song traits, highlighting the role of environmental factors in shaping bird song.
Collapse
|
8
|
Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022; 54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]
Abstract
Background Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage. Methods We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests. Results The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected. Conclusions Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis. Supplementary Information The online version contains supplementary material available at 10.1186/s12711-022-00756-0.
Collapse
|
9
|
Rare and population-specific functional variation across pig lines. Genet Sel Evol 2022; 54:39. [PMID: 35659233 PMCID: PMC9164375 DOI: 10.1186/s12711-022-00732-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 05/17/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND It is expected that functional, mainly missense and loss-of-function (LOF), and regulatory variants are responsible for most phenotypic differences between breeds and genetic lines of livestock species that have undergone diverse selection histories. However, there is still limited knowledge about the existing missense and LOF variation in commercial livestock populations, in particular regarding population-specific variation and how it can affect applications such as across-breed genomic prediction. METHODS We re-sequenced the whole genome of 7848 individuals from nine commercial pig lines (average sequencing coverage: 4.1×) and imputed whole-genome genotypes for 440,610 pedigree-related individuals. The called variants were categorized according to predicted functional annotation (from LOF to intergenic) and prevalence level (number of lines in which the variant segregated; from private to widespread). Variants in each category were examined in terms of their distribution along the genome, alternative allele frequency, per-site Wright's fixation index (FST), individual load, and association to production traits. RESULTS Of the 46 million called variants, 28% were private (called in only one line) and 21% were widespread (called in all nine lines). Genomic regions with a low recombination rate were enriched with private variants. Low-prevalence variants (called in one or a few lines only) were enriched for lower allele frequencies, lower FST, and putatively functional and regulatory roles (including LOF and deleterious missense variants). On average, individuals carried fewer private deleterious missense alleles than expected compared to alleles with other predicted consequences. Only a small subset of the low-prevalence variants had intermediate allele frequencies and explained small fractions of phenotypic variance (up to 3.2%) of production traits. The significant low-prevalence variants had higher per-site FST than the non-significant ones. These associated low-prevalence variants were tagged by other more widespread variants in high linkage disequilibrium, including intergenic variants. CONCLUSIONS Most low-prevalence variants have low minor allele frequencies and only a small subset of low-prevalence variants contributed detectable fractions of phenotypic variance of production traits. Accounting for low-prevalence variants is therefore unlikely to noticeably benefit across-breed analyses, such as the prediction of genomic breeding values in a population using reference populations of a different genetic background.
Collapse
|
10
|
Matsui T, Mullis MN, Roy KR, Hale JJ, Schell R, Levy SF, Ehrenreich IM. The interplay of additivity, dominance, and epistasis on fitness in a diploid yeast cross. Nat Commun 2022; 13:1463. [PMID: 35304450 PMCID: PMC8933436 DOI: 10.1038/s41467-022-29111-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 02/22/2022] [Indexed: 12/27/2022] Open
Abstract
In diploid species, genetic loci can show additive, dominance, and epistatic effects. To characterize the contributions of these different types of genetic effects to heritable traits, we use a double barcoding system to generate and phenotype a panel of ~200,000 diploid yeast strains that can be partitioned into hundreds of interrelated families. This experiment enables the detection of thousands of epistatic loci, many whose effects vary across families. Here, we show traits are largely specified by a small number of hub loci with major additive and dominance effects, and pervasive epistasis. Genetic background commonly influences both the additive and dominance effects of loci, with multiple modifiers typically involved. The most prominent dominance modifier in our data is the mating locus, which has no effect on its own. Our findings show that the interplay between additivity, dominance, and epistasis underlies a complex genotype-to-phenotype map in diploids.
Collapse
Affiliation(s)
- Takeshi Matsui
- Joint Initiative for Metrology in Biology, Stanford, CA, 94305, USA
- SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Martin N Mullis
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
- Twist Bioscience, 681 Gateway Blvd, South San Francisco, CA, 94080, USA
| | - Kevin R Roy
- Joint Initiative for Metrology in Biology, Stanford, CA, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Joseph J Hale
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
| | - Rachel Schell
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA
| | - Sasha F Levy
- Joint Initiative for Metrology in Biology, Stanford, CA, 94305, USA.
- SLAC National Accelerator Laboratory, Menlo Park, CA, 94025, USA.
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA.
| | - Ian M Ehrenreich
- Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
11
|
Towards the Genetic Architecture of Complex Gene Expression Traits: Challenges and Prospects for eQTL Mapping in Humans. Genes (Basel) 2022; 13:genes13020235. [PMID: 35205280 PMCID: PMC8871770 DOI: 10.3390/genes13020235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 12/10/2022] Open
Abstract
The discovery of expression quantitative trait loci (eQTLs) and their target genes (eGenes) has not only compensated for the limitations of genome-wide association studies for complex phenotypes but has also provided a basis for predicting gene expression. Efforts have been made to develop analytical methods in statistical genetics, a key discipline in eQTL analysis. In particular, mixed model– and deep learning–based analytical methods have been extremely beneficial in mapping eQTLs and predicting gene expression. Nevertheless, we still face many challenges associated with eQTL discovery. Here, we discuss two key aspects of these challenges: 1, the complexity of eTraits with various factors such as polygenicity and epistasis and 2, the voluminous work required for various types of eQTL profiles. The properties and prospects of statistical methods, including the mixed model method, Bayesian inference, the deep learning method, and the integration method, are presented as future directions for eQTL discovery. This review will help expedite the design and use of efficient methods for eQTL discovery and eTrait prediction.
Collapse
|
12
|
Berhe M, Dossa K, You J, Mboup PA, Diallo IN, Diouf D, Zhang X, Wang L. Genome-wide association study and its applications in the non-model crop Sesamum indicum. BMC PLANT BIOLOGY 2021; 21:283. [PMID: 34157965 PMCID: PMC8218510 DOI: 10.1186/s12870-021-03046-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 05/17/2021] [Indexed: 05/05/2023]
Abstract
BACKGROUND Sesame is a rare example of non-model and minor crop for which numerous genetic loci and candidate genes underlying features of interest have been disclosed at relatively high resolution. These progresses have been achieved thanks to the applications of the genome-wide association study (GWAS) approach. GWAS has benefited from the availability of high-quality genomes, re-sequencing data from thousands of genotypes, extensive transcriptome sequencing, development of haplotype map and web-based functional databases in sesame. RESULTS In this paper, we reviewed the GWAS methods, the underlying statistical models and the applications for genetic discovery of important traits in sesame. A novel online database SiGeDiD ( http://sigedid.ucad.sn/ ) has been developed to provide access to all genetic and genomic discoveries through GWAS in sesame. We also tested for the first time, applications of various new GWAS multi-locus models in sesame. CONCLUSIONS Collectively, this work portrays steps and provides guidelines for efficient GWAS implementation in sesame, a non-model crop.
Collapse
Affiliation(s)
- Muez Berhe
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, and Rural Affairs, No.2 Xudong 2nd Road, Wuhan, 430062, China
- Humera Agricultural Research Center of Tigray Agricultural Research Institute, Humera, Tigray, Ethiopia
| | - Komivi Dossa
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, and Rural Affairs, No.2 Xudong 2nd Road, Wuhan, 430062, China.
- Laboratoire Campus de Biotechnologies Végétales, Département de Biologie Végétale, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, 10700, Dakar, Senegal.
- Laboratory of Genetics, Horticulture and Seed Sciences, Faculty of Agronomic Sciences, University of Abomey-Calavi, 01 BP 526, Cotonou, Republic of Benin.
| | - Jun You
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, and Rural Affairs, No.2 Xudong 2nd Road, Wuhan, 430062, China
| | - Pape Adama Mboup
- Département de Mathématiques et Informatique, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, 10700, Dakar, Senegal
| | - Idrissa Navel Diallo
- Laboratoire Campus de Biotechnologies Végétales, Département de Biologie Végétale, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, 10700, Dakar, Senegal
- Département de Mathématiques et Informatique, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, 10700, Dakar, Senegal
| | - Diaga Diouf
- Laboratoire Campus de Biotechnologies Végétales, Département de Biologie Végétale, Faculté des Sciences et Techniques, Université Cheikh Anta Diop, BP 5005 Dakar-Fann, 10700, Dakar, Senegal
| | - Xiurong Zhang
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, and Rural Affairs, No.2 Xudong 2nd Road, Wuhan, 430062, China
| | - Linhai Wang
- Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, and Rural Affairs, No.2 Xudong 2nd Road, Wuhan, 430062, China.
| |
Collapse
|
13
|
Tang S, Zhao H, Lu S, Yu L, Zhang G, Zhang Y, Yang QY, Zhou Y, Wang X, Ma W, Xie W, Guo L. Genome- and transcriptome-wide association studies provide insights into the genetic basis of natural variation of seed oil content in Brassica napus. MOLECULAR PLANT 2021; 14:470-487. [PMID: 33309900 DOI: 10.1016/j.molp.2020.12.003] [Citation(s) in RCA: 78] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 11/01/2020] [Accepted: 12/04/2020] [Indexed: 05/25/2023]
Abstract
Seed oil content (SOC) is a highly important and complex trait in oil crops. Here, we decipher the genetic basis of natural variation in SOC of Brassica napus by genome- and transcriptome-wide association studies using 505 inbred lines. We mapped reliable quantitative trait loci (QTLs) that control SOC in eight environments, evaluated the effect of each QTL on SOC, and analyzed selection in QTL regions during breeding. Six-hundred and ninety-two genes and four gene modules significantly associated with SOC were identified by analyzing population transcriptomes from seeds. A gene prioritization framework, POCKET (prioritizing the candidate genes by incorporating information on knowledge-based gene sets, effects of variants, genome-wide association studies, and transcriptome-wide association studies), was implemented to determine the causal genes in the QTL regions based on multi-omic datasets. A pair of homologous genes, BnPMT6s, in two QTLs were identified and experimentally demonstrated to negatively regulate SOC. This study provides rich genetic resources for improving SOC and valuable insights toward understanding the complex machinery that directs oil accumulation in the seeds of B. napus and other oil crops.
Collapse
Affiliation(s)
- Shan Tang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Hu Zhao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Shaoping Lu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Liangqian Yu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Guofang Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Yuting Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Qing-Yong Yang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Yongming Zhou
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Xuemin Wang
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA; Donald Danforth Plant Science Center, St. Louis, MO 63132, USA
| | - Wei Ma
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China; Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China.
| | - Liang Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
14
|
Genome-Wide Association Mapping Unravels the Genetic Control of Seed Vigor under Low-Temperature Conditions in Rapeseed ( Brassica napus L.). PLANTS 2021; 10:plants10030426. [PMID: 33668258 PMCID: PMC7996214 DOI: 10.3390/plants10030426] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 02/15/2021] [Accepted: 02/15/2021] [Indexed: 11/16/2022]
Abstract
Low temperature inhibits rapid germination and successful seedling establishment of rapeseed (Brassica napus L.), leading to significant productivity losses. Little is known about the genetic diversity for seed vigor under low-temperature conditions in rapeseed, which motivated our investigation of 13 seed germination- and emergence-related traits under normal and low-temperature conditions for 442 diverse rapeseed accessions. The stress tolerance index was calculated for each trait based on performance under non-stress and low-temperature stress conditions. Principal component analysis of the low-temperature stress tolerance indices identified five principal components that captured 100% of the seedling response to low temperature. A genome-wide association study using ~8 million SNP (single-nucleotide polymorphism) markers identified from genome resequencing was undertaken to uncover the genetic basis of seed vigor related traits in rapeseed. We detected 22 quantitative trait loci (QTLs) significantly associated with stress tolerance indices regarding seed vigor under low-temperature stress. Scrutiny of the genes in these QTL regions identified 62 candidate genes related to specific stress tolerance indices of seed vigor, and the majority were involved in DNA repair, RNA translation, mitochondrial activation and energy generation, ubiquitination and degradation of protein reserve, antioxidant system, and plant hormone and signal transduction. The high effect variation and haplotype-based effect of these candidate genes were evaluated, and high priority could be given to the candidate genes BnaA03g40290D, BnaA06g07530D, BnaA09g06240D, BnaA09g06250D, and BnaC02g10720D in further study. These findings should be useful for marker-assisted breeding and genomic selection of rapeseed to increase seed vigor under low-temperature stress.
Collapse
|
15
|
Erickson PA, Weller CA, Song DY, Bangerter AS, Schmidt P, Bergland AO. Unique genetic signatures of local adaptation over space and time for diapause, an ecologically relevant complex trait, in Drosophila melanogaster. PLoS Genet 2020; 16:e1009110. [PMID: 33216740 PMCID: PMC7717581 DOI: 10.1371/journal.pgen.1009110] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 12/04/2020] [Accepted: 09/10/2020] [Indexed: 02/07/2023] Open
Abstract
Organisms living in seasonally variable environments utilize cues such as light and temperature to induce plastic responses, enabling them to exploit favorable seasons and avoid unfavorable ones. Local adapation can result in variation in seasonal responses, but the genetic basis and evolutionary history of this variation remains elusive. Many insects, including Drosophila melanogaster, are able to undergo an arrest of reproductive development (diapause) in response to unfavorable conditions. In D. melanogaster, the ability to diapause is more common in high latitude populations, where flies endure harsher winters, and in the spring, reflecting differential survivorship of overwintering populations. Using a novel hybrid swarm-based genome wide association study, we examined the genetic basis and evolutionary history of ovarian diapause. We exposed outbred females to different temperatures and day lengths, characterized ovarian development for over 2800 flies, and reconstructed their complete, phased genomes. We found that diapause, scored at two different developmental cutoffs, has modest heritability, and we identified hundreds of SNPs associated with each of the two phenotypes. Alleles associated with one of the diapause phenotypes tend to be more common at higher latitudes, but these alleles do not show predictable seasonal variation. The collective signal of many small-effect, clinally varying SNPs can plausibly explain latitudinal variation in diapause seen in North America. Alleles associated with diapause are segregating in Zambia, suggesting that variation in diapause relies on ancestral polymorphisms, and both pro- and anti-diapause alleles have experienced selection in North America. Finally, we utilized outdoor mesocosms to track diapause under natural conditions. We found that hybrid swarms reared outdoors evolved increased propensity for diapause in late fall, whereas indoor control populations experienced no such change. Our results indicate that diapause is a complex, quantitative trait with different evolutionary patterns across time and space.
Collapse
Affiliation(s)
- Priscilla A. Erickson
- Department of Biology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Cory A. Weller
- Department of Biology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Daniel Y. Song
- Department of Biology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Alyssa S. Bangerter
- Department of Biology, University of Virginia, Charlottesville, Virginia, United States of America
| | - Paul Schmidt
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Alan O. Bergland
- Department of Biology, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
16
|
Chen J, Hu X, Shi T, Yin H, Sun D, Hao Y, Xia X, Luo J, Fernie AR, He Z, Chen W. Metabolite-based genome-wide association study enables dissection of the flavonoid decoration pathway of wheat kernels. PLANT BIOTECHNOLOGY JOURNAL 2020; 18:1722-1735. [PMID: 31930656 PMCID: PMC7336285 DOI: 10.1111/pbi.13335] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 12/29/2019] [Indexed: 05/02/2023]
Abstract
The marriage of metabolomic approaches with genetic design has proven a powerful tool in dissecting diversity in the metabolome and has additionally enhanced our understanding of complex traits. That said, such studies have rarely been carried out in wheat. In this study, we detected 805 metabolites from wheat kernels and profiled their relative contents among 182 wheat accessions, conducting a metabolite-based genome-wide association study (mGWAS) utilizing 14 646 previously described polymorphic SNP markers. A total of 1098 mGWAS associations were detected with large effects, within which 26 candidate genes were tentatively designated for 42 loci. Enzymatic assay of two candidates indicated they could catalyse glucosylation and subsequent malonylation of various flavonoids and thereby the major flavonoid decoration pathway of wheat kernel was dissected. Moreover, numerous high-confidence genes associated with metabolite contents have been provided, as well as more subdivided metabolite networks which are yet to be explored within our data. These combined efforts presented the first step towards realizing metabolomics-associated breeding of wheat.
Collapse
Affiliation(s)
- Jie Chen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Xin Hu
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Taotao Shi
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Huanran Yin
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Dongfa Sun
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| | - Yuanfeng Hao
- National Wheat Improvement CenterInstitute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Xianchun Xia
- National Wheat Improvement CenterInstitute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Jie Luo
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
| | | | - Zhonghu He
- National Wheat Improvement CenterInstitute of Crop SciencesChinese Academy of Agricultural SciencesBeijingChina
| | - Wei Chen
- National Key Laboratory of Crop Genetic Improvement and National Center of Plant Gene Research (Wuhan)Huazhong Agricultural UniversityWuhanChina
- College of Plant Science and TechnologyHuazhong Agricultural UniversityWuhanChina
| |
Collapse
|
17
|
Pralle RS, Schultz NE, White HM, Weigel KA. Hyperketonemia GWAS and parity-dependent SNP associations in Holstein dairy cows intensively sampled for blood β-hydroxybutyrate concentration. Physiol Genomics 2020; 52:347-357. [PMID: 32628084 DOI: 10.1152/physiolgenomics.00016.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Hyperketonemia (HYK) is a metabolic disorder that affects early postpartum dairy cows; however, there has been limited success in identifying genomic variants contributing to HYK susceptibility. We conducted a genome-wide association study (GWAS) using HYK phenotypes based on an intensive screening protocol, interrogated genotype interactions with parity group (GWIS), and evaluated the enrichment of annotated metabolic pathways. Holstein cows were enrolled into the experiment after parturition, and blood samples were collected at four timepoints between 5 and 18 days postpartum. Concentration of blood β-hydroxybutyrate (BHB) was quantified cow-side via a handheld BHB meter. Cows were labeled as a HYK case when at least one blood sample had BHB ≥ 1.2 mmol/L, and all other cows were considered non-HYK controls. After quality control procedures, 1,710 cows and 58,699 genotypes were available for further analysis. The GWAS and GWIS were performed using the forward feature select linear mixed model method. There was evidence for an association between ARS-BFGL-NGS-91238 and HYK susceptibility, as well as parity-dependent associations to HYK for BovineHD0600024247 and BovineHD1400023753. Candidate genes annotated to these single nuclear polymorphism associations have been previously associated with obesity, diabetes, insulin resistance, and fatty liver in humans and rodent models. Enrichment analysis revealed focal adhesion and axon guidance as metabolic pathways contributing to HYK etiology, while genetic variation in pathways related to insulin secretion and sensitivity may affect HYK susceptibility in a parity-dependent matter. In conclusion, the present work proposes several novel marker associations and metabolic pathways contributing to genetic risk for HYK susceptibility.
Collapse
Affiliation(s)
- Ryan S Pralle
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Nichol E Schultz
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Heather M White
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| | - Kent A Weigel
- Department of Dairy Science, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
18
|
Fournier T, Abou Saada O, Hou J, Peter J, Caudal E, Schacherer J. Extensive impact of low-frequency variants on the phenotypic landscape at population-scale. eLife 2019; 8:49258. [PMID: 31647416 PMCID: PMC6892612 DOI: 10.7554/elife.49258] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 10/23/2019] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies (GWAS) allow to dissect complex traits and map genetic variants, which often explain relatively little of the heritability. One potential reason is the preponderance of undetected low-frequency variants. To increase their allele frequency and assess their phenotypic impact in a population, we generated a diallel panel of 3025 yeast hybrids, derived from pairwise crosses between natural isolates and examined a large number of traits. Parental versus hybrid regression analysis showed that while most phenotypic variance is explained by additivity, a third is governed by non-additive effects, with complete dominance having a key role. By performing GWAS on the diallel panel, we found that associated variants with low frequency in the initial population are overrepresented and explain a fraction of the phenotypic variance as well as an effect size similar to common variants. Overall, we highlighted the relevance of low-frequency variants on the phenotypic variation.
Collapse
Affiliation(s)
- Téo Fournier
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Jing Hou
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Jackson Peter
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | - Elodie Caudal
- Université de Strasbourg, CNRS, GMGM UMR 7156, Strasbourg, France
| | | |
Collapse
|
19
|
He T, Hill CB, Angessa TT, Zhang XQ, Chen K, Moody D, Telfer P, Westcott S, Li C. Gene-set association and epistatic analyses reveal complex gene interaction networks affecting flowering time in a worldwide barley collection. JOURNAL OF EXPERIMENTAL BOTANY 2019; 70:5603-5616. [PMID: 31504706 PMCID: PMC6812734 DOI: 10.1093/jxb/erz332] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 08/13/2019] [Indexed: 05/10/2023]
Abstract
Single-marker genome-wide association studies (GWAS) have successfully detected associations between single nucleotide polymorphisms (SNPs) and agronomic traits such as flowering time and grain yield in barley. However, the analysis of individual SNPs can only account for a small proportion of genetic variation, and can only provide limited knowledge on gene network interactions. Gene-based GWAS approaches provide enormous opportunity both to combine genetic information and to examine interactions among genetic variants. Here, we revisited a previously published phenotypic and genotypic data set of 895 barley varieties grown in two years at four different field locations in Australia. We employed statistical models to examine gene-phenotype associations, as well as two-way epistasis analyses to increase the capability to find novel genes that have significant roles in controlling flowering time in barley. Genetic associations were tested between flowering time and corresponding genotypes of 174 putative flowering time-related genes. Gene-phenotype association analysis detected 113 genes associated with flowering time in barley, demonstrating the unprecedented power of gene-based analysis. Subsequent two-way epistasis analysis revealed 19 pairs of gene×gene interactions involved in controlling flowering time. Our study demonstrates that gene-based association approaches can provide higher capacity for future crop improvement to increase crop performance and adaptation to different environments.
Collapse
Affiliation(s)
- Tianhua He
- Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, Australia
| | - Camilla Beate Hill
- Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, Australia
| | - Tefera Tolera Angessa
- Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, Australia
| | - Xiao-Qi Zhang
- Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, Australia
| | - Kefei Chen
- SAGI-WEST, Faculty of Science and Engineering, Curtin University, Bentley, WA, Australia
| | | | - Paul Telfer
- Australian Grain Technologies Pty Ltd (AGT), SA, Australia
| | - Sharon Westcott
- Agriculture and Food, Department of Primary Industries and Regional Development, South Perth, WA, Australia
| | - Chengdao Li
- Western Barley Genetics Alliance, Western Australian State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, WA, Australia
- Agriculture and Food, Department of Primary Industries and Regional Development, South Perth, WA, Australia
- Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Hubei Jingzhou, China
- Correspondence:
| |
Collapse
|
20
|
Grinberg NF, Orhobor OI, King RD. An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Mach Learn 2019; 109:251-277. [PMID: 32174648 PMCID: PMC7048706 DOI: 10.1007/s10994-019-05848-5] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 09/17/2019] [Accepted: 09/19/2019] [Indexed: 11/01/2022]
Abstract
In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.
Collapse
Affiliation(s)
- Nastasiya F. Grinberg
- School of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL UK
- Present Address: Department of Medicine, Cambridge Institute of Therapeutic Immunology & Infectious Disease, Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Cambridge, CB2 0AW UK
| | | | - Ross D. King
- Department of Biology and Biological Engineering, Division of Systems and Synthetic Biology, Chalmers University of Technology, Kemivägen 10, SE-412 96 Gothenburg, Sweden
| |
Collapse
|
21
|
Lee C. Bayesian Inference for Mixed Model-Based Genome-Wide Analysis of Expression Quantitative Trait Loci by Gibbs Sampling. Front Genet 2019; 10:199. [PMID: 30967893 PMCID: PMC6438854 DOI: 10.3389/fgene.2019.00199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 02/25/2019] [Indexed: 11/13/2022] Open
Abstract
The importance of expression quantitative trait locus (eQTL) has been emphasized in understanding the genetic basis of cellular activities and complex phenotypes. Mixed models can be employed to effectively identify eQTLs by explaining polygenic effects. In these mixed models, the polygenic effects are considered as random variables, and their variability is explained by the polygenic variance component. The polygenic and residual variance components are first estimated, and then eQTL effects are estimated depending on the variance component estimates within the frequentist mixed model framework. The Bayesian approach to the mixed model-based genome-wide eQTL analysis can also be applied to estimate the parameters that exhibit various benefits. Bayesian inferences on unknown parameters are based on their marginal posterior distributions, and the marginalization of the joint posterior distribution is a challenging task. This problem can be solved by employing a numerical algorithm of integrals called Gibbs sampling as a Markov chain Monte Carlo. This article reviews the mixed model-based Bayesian eQTL analysis by Gibbs sampling. Theoretical and practical issues of Bayesian inference are discussed using a concise description of Bayesian modeling and the corresponding Gibbs sampling. The strengths of Bayesian inference are also discussed. Posterior probability distribution in the Bayesian inference reflects uncertainty in unknown parameters. This factor is useful in the context of eQTL analysis where a sample size is too small to apply the frequentist approach. Bayesian inference based on the posterior that reflects prior knowledge, will be increasingly preferred with the accumulation of eQTL data. Extensive use of the mixed model-based Bayesian eQTL analysis will accelerate understanding of eQTLs exhibiting various regulatory functions.
Collapse
Affiliation(s)
- Chaeyoung Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul, South Korea
| |
Collapse
|
22
|
Ward BP, Brown-Guedira G, Kolb FL, Van Sanford DA, Tyagi P, Sneller CH, Griffey CA. Genome-wide association studies for yield-related traits in soft red winter wheat grown in Virginia. PLoS One 2019; 14:e0208217. [PMID: 30794545 PMCID: PMC6386437 DOI: 10.1371/journal.pone.0208217] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 02/05/2019] [Indexed: 01/19/2023] Open
Abstract
Grain yield is a trait of paramount importance in the breeding of all cereals. In wheat (Triticum aestivum L.), yield has steadily increased since the Green Revolution, though the current rate of increase is not forecasted to keep pace with demand due to growing world population and increasing affluence. While several genome-wide association studies (GWAS) on yield and related component traits have been performed in wheat, the previous lack of a reference genome has made comparisons between studies difficult. In this study, a GWAS for yield and yield-related traits was carried out on a population of 322 soft red winter wheat lines across a total of four rain-fed environments in the state of Virginia using single-nucleotide polymorphism (SNP) marker data generated by a genotyping-by-sequencing (GBS) protocol. Two separate mixed linear models were used to identify significant marker-trait associations (MTAs). The first was a single-locus model utilizing a leave-one-chromosome-out approach to estimating kinship. The second was a sub-setting kinship estimation multi-locus method (FarmCPU). The single-locus model identified nine significant MTAs for various yield-related traits, while the FarmCPU model identified 74 significant MTAs. The availability of the wheat reference genome allowed for the description of MTAs in terms of both genetic and physical positions, and enabled more extensive post-GWAS characterization of significant MTAs. The results indicate a number of promising candidate genes contributing to grain yield, including an ortholog of the rice aberrant panicle organization (APO1) protein and a gibberellin oxidase protein (GA2ox-A1) affecting the trait grains per square meter, an ortholog of the Arabidopsis thaliana mother of flowering time and terminal flowering 1 (MFT) gene affecting the trait seeds per square meter, and a B2 heat stress response protein affecting the trait seeds per head.
Collapse
Affiliation(s)
- Brian P. Ward
- Department Of Crop and Soil Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Gina Brown-Guedira
- Eastern Regional Small Grains Genotyping Laboratory, USDA-ARS, Raleigh, North Carolina, United States of America
| | - Frederic L. Kolb
- Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - David A. Van Sanford
- Department of Plant and Soil Sciences, University of Kentucky, Lexington, Kentucky, United States of America
| | - Priyanka Tyagi
- Department of Crop and Soil Sciences, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Clay H. Sneller
- Ohio Agricultural Research and Development Center, The Ohio State University, Wooster, Ohio, United States of America
| | - Carl A. Griffey
- Department Of Crop and Soil Environmental Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| |
Collapse
|
23
|
Saad M, Wijsman EM. Association score testing for rare variants and binary traits in family data with shared controls. Brief Bioinform 2019; 20:245-253. [PMID: 28968627 DOI: 10.1093/bib/bbx107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Indexed: 11/12/2022] Open
Abstract
Genome-wide association studies have been an important approach used to localize trait loci, with primary focus on common variants. The multiple rare variant-common disease hypothesis may explain the missing heritability remaining after accounting for identified common variants. Advances of sequencing technologies with their decreasing costs, coupled with methodological advances in the context of association studies in large samples, now make the study of rare variants at a genome-wide scale feasible. The resurgence of family-based association designs because of their advantage in studying rare variants has also stimulated more methods development, mainly based on linear mixed models (LMMs). Other tests such as score tests can have advantages over the LMMs, but to date have mainly been proposed for single-marker association tests. In this article, we extend several score tests (χcorrected2, WQLS, and SKAT) to the multiple variant association framework. We evaluate and compare their statistical performances relative with the LMM. Moreover, we show that three tests can be cast as the difference between marker allele frequencies (AFs) estimated in each of the group of affected and unaffected subjects. We show that these tests are flexible, as they can be based on related, unrelated or both related and unrelated subjects. They also make feasible an increasingly common design that only sequences a subset of affected subjects (related or unrelated) and uses for comparison publicly available AFs estimated in a group of healthy subjects. Finally, we show the great impact of linkage disequilibrium on the performance of all these tests.
Collapse
Affiliation(s)
- Mohamad Saad
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA.,Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
| | - Ellen M Wijsman
- Department of Biostatistics, University of Washington, Seattle, USA.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, USA
| |
Collapse
|
24
|
Beilsmith K, Thoen MPM, Brachi B, Gloss AD, Khan MH, Bergelson J. Genome-wide association studies on the phyllosphere microbiome: Embracing complexity in host-microbe interactions. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2019; 97:164-181. [PMID: 30466152 DOI: 10.1111/tpj.14170] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 11/08/2018] [Accepted: 11/16/2018] [Indexed: 05/18/2023]
Abstract
Environmental sequencing shows that plants harbor complex communities of microbes that vary across environments. However, many approaches for mapping plant genetic variation to microbe-related traits were developed in the relatively simple context of binary host-microbe interactions under controlled conditions. Recent advances in sequencing and statistics make genome-wide association studies (GWAS) an increasingly promising approach for identifying the plant genetic variation associated with microbes in a community context. This review discusses early efforts on GWAS of the plant phyllosphere microbiome and the outlook for future studies based on human microbiome GWAS. A workflow for GWAS of the phyllosphere microbiome is then presented, with particular attention to how perspectives on the mechanisms, evolution and environmental dependence of plant-microbe interactions will influence the choice of traits to be mapped.
Collapse
Affiliation(s)
- Kathleen Beilsmith
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Manus P M Thoen
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Benjamin Brachi
- BIOGECO, INRA, University of Bordeaux, 33610, Cestas, France
| | - Andrew D Gloss
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Mohammad H Khan
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| | - Joy Bergelson
- Department of Ecology and Evolution, University of Chicago, 1101 E 57th St, Chicago, IL, 60637, USA
| |
Collapse
|
25
|
|
26
|
Jaillard M, Lima L, Tournoud M, Mahé P, van Belkum A, Lacroix V, Jacob L. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLoS Genet 2018; 14:e1007758. [PMID: 30419019 PMCID: PMC6258240 DOI: 10.1371/journal.pgen.1007758] [Citation(s) in RCA: 93] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Revised: 11/26/2018] [Accepted: 10/12/2018] [Indexed: 11/21/2022] Open
Abstract
Genome-wide association study (GWAS) methods applied to bacterial genomes have shown promising results for genetic marker discovery or detailed assessment of marker effect. Recently, alignment-free methods based on k-mer composition have proven their ability to explore the accessory genome. However, they lead to redundant descriptions and results which are sometimes hard to interpret. Here we introduce DBGWAS, an extended k-mer-based GWAS method producing interpretable genetic variants associated with distinct phenotypes. Relying on compacted De Bruijn graphs (cDBG), our method gathers cDBG nodes, identified by the association model, into subgraphs defined from their neighbourhood in the initial cDBG. DBGWAS is alignment-free and only requires a set of contigs and phenotypes. In particular, it does not require prior annotation or reference genomes. It produces subgraphs representing phenotype-associated genetic variants such as local polymorphisms and mobile genetic elements (MGE). It offers a graphical framework which helps interpret GWAS results. Importantly it is also computationally efficient-experiments took one hour and a half on average. We validated our method using antibiotic resistance phenotypes for three bacterial species. DBGWAS recovered known resistance determinants such as mutations in core genes in Mycobacterium tuberculosis, and genes acquired by horizontal transfer in Staphylococcus aureus and Pseudomonas aeruginosa-along with their MGE context. It also enabled us to formulate new hypotheses involving genetic variants not yet described in the antibiotic resistance literature. An open-source tool implementing DBGWAS is available at https://gitlab.com/leoisl/dbgwas.
Collapse
Affiliation(s)
- Magali Jaillard
- bioMérieux, Marcy l’Étoile, France
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France
| | - Leandro Lima
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France
- EPI ERABLE - Inria Grenoble, Rhône-Alpes, France
| | | | | | | | - Vincent Lacroix
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France
- EPI ERABLE - Inria Grenoble, Rhône-Alpes, France
| | - Laurent Jacob
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR5558 F-69622 Villeurbanne, France
| |
Collapse
|
27
|
Lee C. Genome-Wide Expression Quantitative Trait Loci Analysis Using Mixed Models. Front Genet 2018; 9:341. [PMID: 30186313 PMCID: PMC6110903 DOI: 10.3389/fgene.2018.00341] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2018] [Accepted: 08/09/2018] [Indexed: 01/22/2023] Open
Abstract
Expression quantitative trait loci (eQTLs) are important for understanding the genetic basis of cellular activities and complex phenotypes. Genome-wide eQTL analyses can be effectively conducted by employing a mixed model. The mixed model includes random polygenic effects with variability, which can be estimated by the covariance structure of pairwise genomic similarity among individuals based on genotype information for nucleotide sequence variants. This increases the accuracy of identifying eQTLs by avoiding population stratification. Its extensive use will accelerate our understanding of the genetics of gene expression and complex phenotypes. An overview of genome-wide eQTL analyses using mixed model methodology is provided, including discussions of both theoretical and practical issues. The advantages of employing mixed models are also discussed in this review.
Collapse
Affiliation(s)
- Chaeyoung Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul, South Korea
| |
Collapse
|
28
|
Ganjgahi H, Winkler AM, Glahn DC, Blangero J, Donohue B, Kochunov P, Nichols TE. Fast and powerful genome wide association of dense genetic data with high dimensional imaging phenotypes. Nat Commun 2018; 9:3254. [PMID: 30108209 PMCID: PMC6092439 DOI: 10.1038/s41467-018-05444-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Accepted: 07/09/2018] [Indexed: 01/05/2023] Open
Abstract
Genome wide association (GWA) analysis of brain imaging phenotypes can advance our understanding of the genetic basis of normal and disorder-related variation in the brain. GWA approaches typically use linear mixed effect models to account for non-independence amongst subjects due to factors, such as family relatedness and population structure. The use of these models with high-dimensional imaging phenotypes presents enormous challenges in terms of computational intensity and the need to account multiple testing in both the imaging and genetic domain. Here we present a method that makes mixed models practical with high-dimensional traits by a combination of a transformation applied to the data and model, and the use of a non-iterative variance component estimator. With such speed enhancements permutation tests are feasible, which allows inference on powerful spatial tests like the cluster size statistic.
Collapse
Affiliation(s)
- Habib Ganjgahi
- Department of Statistics, University of Oxford, Oxford, UK
- Medical Research Council Harwell Institute, Harwell, UK
| | - Anderson M Winkler
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Big Data Analytics Group, Hospital Israelita Albert Einstein, São Paulo, SP, Brazil
| | - David C Glahn
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Brian Donohue
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Thomas E Nichols
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- Department of Statistics, University of Warwick, Coventry, UK.
| |
Collapse
|
29
|
Sun S, Wang T, Wang L, Li X, Jia Y, Liu C, Huang X, Xie W, Wang X. Natural selection of a GSK3 determines rice mesocotyl domestication by coordinating strigolactone and brassinosteroid signaling. Nat Commun 2018; 9:2523. [PMID: 29955063 PMCID: PMC6023860 DOI: 10.1038/s41467-018-04952-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 06/01/2018] [Indexed: 11/09/2022] Open
Abstract
Mesocotyl is the crucial organ for pushing buds out of deep water or soil after germination in monocots. Deep direct seeding or mechanized dry seeding cultivation practice requires rice cultivars having long mesocotyl. However, the mechanisms of mesocotyl elongation and domestication remain unknown. Here, our genome-wide association study (GWAS) reveals that natural variations of OsGSK2, a conserved GSK3-like kinase involved in brassinosteroid signaling, determine rice mesocotyl length variation. Variations in the coding region of OsGSK2 alter its kinase activity. It is selected for mesocotyl length variation during domestication. Molecular analyses show that brassinosteroid-promoted mesocotyl elongation functions by suppressing the phosphorylation of an U-type cyclin, CYC U2, by OsGSK2. Importantly, the F-box protein D3, a major positive component in strigolactone signaling, can degrade the OsGSK2-phosphorylated CYC U2 to inhibit mesocotyl elongation. Together, these results suggest that OsGSK2 is selected to regulate mesocotyl length by coordinating strigolactone and brassinosteroid signaling during domestication.
Collapse
Affiliation(s)
- Shiyong Sun
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Tao Wang
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Linlin Wang
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xiaoming Li
- Department of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yancui Jia
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Chang Liu
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China
| | - Xuehui Huang
- College of Life and Environment Sciences, Shanghai Normal University, Shanghai, China
| | - Weibo Xie
- National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research, Huazhong Agricultural University, Wuhan, China
| | - Xuelu Wang
- National Key Laboratory of Crop Genetic Improvement, Center of Integrative Biology, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, China.
| |
Collapse
|
30
|
Weissbrod O, Rahmani E, Schweiger R, Rosset S, Halperin E. Association testing of bisulfite-sequencing methylation data via a Laplace approximation. Bioinformatics 2018; 33:i325-i332. [PMID: 28881982 PMCID: PMC5870555 DOI: 10.1093/bioinformatics/btx248] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation Epigenome-wide association studies can provide novel insights into the regulation of genes involved in traits and diseases. The rapid emergence of bisulfite-sequencing technologies enables performing such genome-wide studies at the resolution of single nucleotides. However, analysis of data produced by bisulfite-sequencing poses statistical challenges owing to low and uneven sequencing depth, as well as the presence of confounding factors. The recently introduced Mixed model Association for Count data via data AUgmentation (MACAU) can address these challenges via a generalized linear mixed model when confounding can be encoded via a single variance component. However, MACAU cannot be used in the presence of multiple variance components. Additionally, MACAU uses a computationally expensive Markov Chain Monte Carlo (MCMC) procedure, which cannot directly approximate the model likelihood. Results We present a new method, Mixed model Association via a Laplace ApproXimation (MALAX), that is more computationally efficient than MACAU and allows to model multiple variance components. MALAX uses a Laplace approximation rather than MCMC based approximations, which enables to directly approximate the model likelihood. Through an extensive analysis of simulated and real data, we demonstrate that MALAX successfully addresses statistical challenges introduced by bisulfite-sequencing while controlling for complex sources of confounding, and can be over 50% faster than the state of the art. Availability and Implementation The full source code of MALAX is available at https://github.com/omerwe/MALAX. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Omer Weissbrod
- Statistics Department, Tel Aviv University, Tel Aviv, Israel.,Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| | - Elior Rahmani
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Saharon Rosset
- Statistics Department, Tel Aviv University, Tel Aviv, Israel
| | - Eran Halperin
- Computer Science Department, University of California Los Angeles, Los Angeles, CA, USA.,Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
31
|
Environment dominates over host genetics in shaping human gut microbiota. Nature 2018; 555:210-215. [PMID: 29489753 DOI: 10.1038/nature25973] [Citation(s) in RCA: 1576] [Impact Index Per Article: 262.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 01/16/2018] [Indexed: 02/06/2023]
Abstract
Human gut microbiome composition is shaped by multiple factors but the relative contribution of host genetics remains elusive. Here we examine genotype and microbiome data from 1,046 healthy individuals with several distinct ancestral origins who share a relatively common environment, and demonstrate that the gut microbiome is not significantly associated with genetic ancestry, and that host genetics have a minor role in determining microbiome composition. We show that, by contrast, there are significant similarities in the compositions of the microbiomes of genetically unrelated individuals who share a household, and that over 20% of the inter-person microbiome variability is associated with factors related to diet, drugs and anthropometric measurements. We further demonstrate that microbiome data significantly improve the prediction accuracy for many human traits, such as glucose and obesity measures, compared to models that use only host genetic and environmental data. These results suggest that microbiome alterations aimed at improving clinical outcomes may be carried out across diverse genetic backgrounds.
Collapse
|
32
|
Gumpinger AC, Roqueiro D, Grimm DG, Borgwardt KM. Methods and Tools in Genome-wide Association Studies. Methods Mol Biol 2018; 1819:93-136. [PMID: 30421401 DOI: 10.1007/978-1-4939-8618-7_5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Many traits, such as height, the response to a given drug, or the susceptibility to certain diseases are presumably co-determined by genetics. Especially in the field of medicine, it is of major interest to identify genetic aberrations that alter an individual's risk to develop a certain phenotypic trait. Addressing this question requires the availability of comprehensive, high-quality genetic datasets. The technological advancements and the decreasing cost of genotyping in the last decade led to an increase in such datasets. Parallel to and in line with this technological progress, an analysis framework under the name of genome-wide association studies was developed to properly collect and analyze these data. Genome-wide association studies aim at finding statistical dependencies-or associations-between a trait of interest and point-mutations in the DNA. The statistical models used to detect such associations are diverse, spanning the whole range from the frequentist to the Bayesian setting.Since genetic datasets are inherently high-dimensional, the search for associations poses not only a statistical but also a computational challenge. As a result, a variety of toolboxes and software packages have been developed, each implementing different statistical methods while using various optimizations and mathematical techniques to enhance the computations.This chapter is devoted to the discussion of widely used methods and tools in genome-wide association studies. We present the different statistical models and the assumptions on which they are based, explain peculiarities of the data that have to be accounted for and, most importantly, introduce commonly used tools and software packages for the different tasks in a genome-wide association study, complemented with examples for their application.
Collapse
Affiliation(s)
- Anja C Gumpinger
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Damian Roqueiro
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Dominik G Grimm
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Karsten M Borgwardt
- Machine Learning and Computational Biology Lab, D-BSSE, ETH Zurich, Basel, Switzerland. .,SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| |
Collapse
|
33
|
Hellman F, Hoffmann A, Tserkovnyak Y, Beach GSD, Fullerton EE, Leighton C, MacDonald AH, Ralph DC, Arena DA, Dürr HA, Fischer P, Grollier J, Heremans JP, Jungwirth T, Kimel AV, Koopmans B, Krivorotov IN, May SJ, Petford-Long AK, Rondinelli JM, Samarth N, Schuller IK, Slavin AN, Stiles MD, Tchernyshyov O, Thiaville A, Zink BL. Interface-Induced Phenomena in Magnetism. REVIEWS OF MODERN PHYSICS 2017; 89:025006. [PMID: 28890576 PMCID: PMC5587142 DOI: 10.1103/revmodphys.89.025006] [Citation(s) in RCA: 181] [Impact Index Per Article: 25.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
This article reviews static and dynamic interfacial effects in magnetism, focusing on interfacially-driven magnetic effects and phenomena associated with spin-orbit coupling and intrinsic symmetry breaking at interfaces. It provides a historical background and literature survey, but focuses on recent progress, identifying the most exciting new scientific results and pointing to promising future research directions. It starts with an introduction and overview of how basic magnetic properties are affected by interfaces, then turns to a discussion of charge and spin transport through and near interfaces and how these can be used to control the properties of the magnetic layer. Important concepts include spin accumulation, spin currents, spin transfer torque, and spin pumping. An overview is provided to the current state of knowledge and existing review literature on interfacial effects such as exchange bias, exchange spring magnets, spin Hall effect, oxide heterostructures, and topological insulators. The article highlights recent discoveries of interface-induced magnetism and non-collinear spin textures, non-linear dynamics including spin torque transfer and magnetization reversal induced by interfaces, and interfacial effects in ultrafast magnetization processes.
Collapse
Affiliation(s)
- Frances Hellman
- Department of Physics, University of California, Berkeley, Berkeley, California 94720, USA; Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Axel Hoffmann
- Materials Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
| | - Yaroslav Tserkovnyak
- Department of Physics and Astronomy, University of California, Los Angeles, California 90095, USA
| | - Geoffrey S D Beach
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Eric E Fullerton
- Center for Memory and Recording Research, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0401, USA
| | - Chris Leighton
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Allan H MacDonald
- Department of Physics, University of Texas at Austin, Austin, Texas 78712-0264, USA
| | - Daniel C Ralph
- Physics Department, Cornell University, Ithaca, New York 14853, USA; Kavli Institute at Cornell, Cornell University, Ithaca, New York 14853, USA
| | - Dario A Arena
- Department of Physics, University of South Florida, Tampa, Florida 33620-7100, USA
| | - Hermann A Dürr
- Stanford Institute for Materials and Energy Sciences, SLAC National Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, California 94025, USA
| | - Peter Fischer
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA; Physics Department, University of California, 1156 High Street, Santa Cruz, California 94056, USA
| | - Julie Grollier
- Unité Mixte de Physique CNRS/Thales and Université Paris Sud 11, 1 Avenue Fresnel, 91767 Palaiseau, France
| | - Joseph P Heremans
- Department of Mechanical and Aerospace Engineering, The Ohio State University, Columbus, Ohio 43210, USA; Department of Materials Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA; Department of Physics, The Ohio State University, Columbus, Ohio 43210, USA
| | - Tomas Jungwirth
- Institute of Physics, Academy of Sciences of the Czech Republic, Cukrovarnicka 10, 162 53 Praha 6, Czech Republic; School of Physics and Astronomy, University of Nottingham, Nottingham NG7 2RD, United Kingdom
| | - Alexey V Kimel
- Radboud University, Institute for Molecules and Materials, Nijmegen 6525 AJ, The Netherlands
| | - Bert Koopmans
- Department of Applied Physics, Center for NanoMaterials, COBRA Research Institute, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands
| | - Ilya N Krivorotov
- Department of Physics and Astronomy, University of California, Irvine, California 92697, USA
| | - Steven J May
- Department of Materials Science & Engineering, Drexel University, Philadelphia, Pennsylvania 19104, USA
| | - Amanda K Petford-Long
- Materials Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, Illinois 60439, USA; Department of Materials Science and Engineering, Northwestern University, 2220 Campus Drive, Evanston, Illinois 60208, USA
| | - James M Rondinelli
- Department of Materials Science and Engineering, Northwestern University, Evanston, Illinois 60208, USA
| | - Nitin Samarth
- Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ivan K Schuller
- Department of Physics and Center for Advanced Nanoscience, University of California, San Diego, La Jolla, California 92093, USA; Materials Science and Engineering Program, University of California, San Diego, La Jolla, California 92093, USA
| | - Andrei N Slavin
- Department of Physics, Oakland University, Rochester, Michigan 48309, USA
| | - Mark D Stiles
- Center for Nanoscale Science and Technology, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-6202, USA
| | - Oleg Tchernyshyov
- Department of Physics and Astronomy, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - André Thiaville
- Laboratoire de Physique des Solides, UMR CNRS 8502, Université Paris-Sud, 91405 Orsay, France
| | - Barry L Zink
- Department of Physics and Astronomy, University of Denver, Denver, CO 80208, USA
| |
Collapse
|
34
|
Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat Genet 2017; 49:568-578. [PMID: 28263315 DOI: 10.1038/ng.3809] [Citation(s) in RCA: 264] [Impact Index Per Article: 37.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 02/10/2017] [Indexed: 02/07/2023]
Abstract
Genetic factors modifying the blood metabolome have been investigated through genome-wide association studies (GWAS) of common genetic variants and through exome sequencing. We conducted a whole-genome sequencing study of common, low-frequency and rare variants to associate genetic variations with blood metabolite levels using comprehensive metabolite profiling in 1,960 adults. We focused the analysis on 644 metabolites with consistent levels across three longitudinal data collections. Genetic sequence variations at 101 loci were associated with the levels of 246 (38%) metabolites (P ≤ 1.9 × 10-11). We identified 113 (10.7%) among 1,054 unrelated individuals in the cohort who carried heterozygous rare variants likely influencing the function of 17 genes. Thirteen of the 17 genes are associated with inborn errors of metabolism or other pediatric genetic conditions. This study extends the map of loci influencing the metabolome and highlights the importance of heterozygous rare variants in determining abnormal blood metabolic phenotypes in adults.
Collapse
|
35
|
Ram R, Wakil S, Muiya N, Andres E, Mazhar N, Hagos S, Alshahid M, Meyer B, Morahan G, Dzimiri N. A common variant association study in ethnic Saudi Arabs reveals novel susceptibility loci for hypertriglyceridemia. Clin Genet 2017; 91:371-378. [DOI: 10.1111/cge.12859] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 08/30/2016] [Accepted: 08/30/2016] [Indexed: 12/15/2022]
Affiliation(s)
- R. Ram
- Centre for Diabetes Research, The Harry Perkinsn Institute for Medical Research Perth WA Australia
- Centre for Medical ResearchUniversity of Western Australia Perth WA Australia
| | - S.M. Wakil
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - N.P. Muiya
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - E. Andres
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - N. Mazhar
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - S. Hagos
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - M. Alshahid
- King Faisal Heart InstituteKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - B.F. Meyer
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| | - G. Morahan
- Centre for Diabetes Research, The Harry Perkinsn Institute for Medical Research Perth WA Australia
- Centre for Medical ResearchUniversity of Western Australia Perth WA Australia
| | - N. Dzimiri
- Genetics DepartmentKing Faisal Specialist Hospital and Research Centre Riyadh KSA
| |
Collapse
|
36
|
Grimm DG, Roqueiro D, Salomé PA, Kleeberger S, Greshake B, Zhu W, Liu C, Lippert C, Stegle O, Schölkopf B, Weigel D, Borgwardt KM. easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies. THE PLANT CELL 2017; 29:5-19. [PMID: 27986896 PMCID: PMC5304348 DOI: 10.1105/tpc.16.00551] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2016] [Revised: 11/28/2016] [Accepted: 12/13/2016] [Indexed: 05/22/2023]
Abstract
The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana, using flowering and growth-related traits.
Collapse
Affiliation(s)
- Dominik G Grimm
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
- Zentrum für Bioinformatik, Eberhard Karls Universität, 72074 Tübingen, Germany
- Department for Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Damian Roqueiro
- Department for Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| | - Patrice A Salomé
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Stefan Kleeberger
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Bastian Greshake
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Wangsheng Zhu
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Chang Liu
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Christoph Lippert
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Oliver Stegle
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Bernhard Schölkopf
- Department of Empirical Inference, Max Planck Institute for Intelligent Systems, 72076 Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Karsten M Borgwardt
- Machine Learning and Computational Biology Research Group, Max Planck Institute for Intelligent Systems and Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
- Zentrum für Bioinformatik, Eberhard Karls Universität, 72074 Tübingen, Germany
- Department for Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland
- Swiss Institute of Bioinformatics, 4058 Basel, Switzerland
| |
Collapse
|
37
|
Wakil SM, Ram R, Muiya NP, Mehta M, Andres E, Mazhar N, Baz B, Hagos S, Alshahid M, Meyer BF, Morahan G, Dzimiri N. Data on common variants associated with coronary artery disease/myocardial infarction in ethnic Arabs. Data Brief 2016; 7:172-176. [PMID: 27761488 PMCID: PMC5063810 DOI: 10.1016/j.dib.2016.02.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2015] [Revised: 01/21/2016] [Accepted: 02/02/2016] [Indexed: 11/02/2022] Open
Abstract
The data shows results acquired in a large cohort of 5668 ethnic Arabs involved in a common variants association study for coronary artery disease (CAD) and myocardial infarction (MI) using the Affymetrix Axiom Genotyping platform ("A genome-wide association study reveals susceptibility loci for myocardial infarction/coronary artery disease in Saudi Arabs" Wakil et al. (2015) [1] ). Several loci were described that conferred risk for CAD or MI, some of which were validated in an independent set of samples. Principal Component (PCA) analysis suggested that the Saudi Cohort was close to the CEU and TSI populations, thus pointing to similarity with European populations.
Collapse
Affiliation(s)
- Salma M Wakil
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Ramesh Ram
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Nzioka P Muiya
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Munish Mehta
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Editha Andres
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Nejat Mazhar
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Batoul Baz
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Samya Hagos
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Maie Alshahid
- King Faisal Heart Institute, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Brian F Meyer
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| | - Grant Morahan
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Nduna Dzimiri
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, KSA, Saudi Arabia
| |
Collapse
|
38
|
|
39
|
Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet 2016; 12:e1005767. [PMID: 26828793 PMCID: PMC4734661 DOI: 10.1371/journal.pgen.1005767] [Citation(s) in RCA: 672] [Impact Index Per Article: 84.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2015] [Accepted: 12/03/2015] [Indexed: 12/05/2022] Open
Abstract
False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.
Collapse
Affiliation(s)
- Xiaolei Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
| | - Meng Huang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
| | - Bin Fan
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Animal Science and Technology, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Edward S. Buckler
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America
- United States Department of Agriculture (USDA)–Agricultural Research Service (ARS), Ithaca, New York, United States of America
| | - Zhiwu Zhang
- Department of Crop and Soil Sciences, Washington State University, Pullman, Washington, United States of America
- Department of Animal Sciences, Northeast Agricultural University, Harbin, Heilongjiang, China
| |
Collapse
|
40
|
Wakil SM, Ram R, Muiya NP, Mehta M, Andres E, Mazhar N, Baz B, Hagos S, Alshahid M, Meyer BF, Morahan G, Dzimiri N. A genome-wide association study reveals susceptibility loci for myocardial infarction/coronary artery disease in Saudi Arabs. Atherosclerosis 2015; 245:62-70. [PMID: 26708285 DOI: 10.1016/j.atherosclerosis.2015.11.019] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Revised: 11/09/2015] [Accepted: 11/18/2015] [Indexed: 02/07/2023]
Abstract
BACKGROUND Multiple loci have been identified for coronary artery disease (CAD) by genome-wide association studies (GWAS), but no such studies on CAD incidence has been reported yet for any Middle Eastern population. METHODS In this study, we performed a GWAS for CAD and myocardial infarction (MI) incidence in 5668 Saudis of Arab descent using the Affymetrix Axiom Genotyping platform. RESULTS We describe SNPs at 16 loci that showed significant (P < 5 × 10(-8)) or suggestive GWAS association (P < 1 × 10(-5)) with CAD or MI, in the ethnic Saudi Arab population. Among the four variants reaching GWAS significance in the present study, the rs10738607_G [0.78(0.71-0.85); p = 2.17E-08] in CDNK2A/B gene was associated with CAD. Two other SNPs on the same gene, rs10757274_G [0.79(0.73-0.86); p = 2.98E-08] and rs1333045_C [0.79(0.73-0.86); p = 1.15E-08] as well as the rs9982601_T [1.38(1.23-1.55); p = 3.49E-08] on KCNE2 were associated with MI. These variants have been previously described in other populations. Several SNPs, including the rs7421388 (PLCL1) and rs12541758 (TRPA1) displaying a suggestive GWAS association (P < 1 × 10(-5)) with CAD as well as rs41411047 (RNF13), rs32793 (PDZD2) and rs4739066 (YTHDF3), similarly showing weak association with MI, were confirmed in an independent dataset. Furthermore, our estimation of heritability of CAD and MI based on observed genome-wide sharing in unrelated Saudi Arabs was approximately 33% and 44%, respectively. CONCLUSIONS Our study has identified susceptibility variants for CAD/MI in ethnic Arabs. These findings provide further insights into pathways contributing to the susceptibility for CAD and will enable more comprehensive genetic studies of these diseases in Middle East populations.
Collapse
Affiliation(s)
- Salma M Wakil
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Ramesh Ram
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Nzioka P Muiya
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Munish Mehta
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Editha Andres
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Nejat Mazhar
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Batoul Baz
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Samya Hagos
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Maie Alshahid
- King Faisal Heart Institute, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Brian F Meyer
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Grant Morahan
- Harry Perkins Institute of Medical Research, University of Western Australia, Australia
| | - Nduna Dzimiri
- Genetics Department, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia.
| |
Collapse
|
41
|
Two-Variance-Component Model Improves Genetic Prediction in Family Datasets. Am J Hum Genet 2015; 97:677-90. [PMID: 26544803 DOI: 10.1016/j.ajhg.2015.10.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 10/03/2015] [Indexed: 12/15/2022] Open
Abstract
Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively with best linear unbiased prediction (BLUP) methods. Such methods were pioneered in plant and animal-breeding literature and have since been applied to predict human traits, with the aim of eventual clinical utility. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two-variance-component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated with genetic markers. In simulations using real genotypes from the Candidate-gene Association Resource (CARe) and Framingham Heart Study (FHS) family cohorts, we demonstrate that the two-variance-component model achieves gains in prediction r(2) over standard BLUP at current sample sizes, and we project, based on simulations, that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two-variance-component model significantly improves prediction r(2) in each case, with up to a 20% relative improvement. We also find that standard mixed-model association tests can produce inflated test statistics in datasets with related individuals, whereas the two-variance-component model corrects for inflation.
Collapse
|
42
|
Sparse whole-genome sequencing identifies two loci for major depressive disorder. Nature 2015; 523:588-91. [PMID: 26176920 PMCID: PMC4522619 DOI: 10.1038/nature14659] [Citation(s) in RCA: 592] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 06/11/2015] [Indexed: 12/22/2022]
Abstract
Major depressive disorder (MDD), one of the most frequently encountered forms of mental illness and a leading cause of disability worldwide, poses a major challenge to genetic analysis. To date, no robustly replicated genetic loci have been identified, despite analysis of more than 9,000 cases. Here, using low-coverage whole-genome sequencing of 5,303 Chinese women with recurrent MDD selected to reduce phenotypic heterogeneity, and 5,337 controls screened to exclude MDD, we identified, and subsequently replicated in an independent sample, two loci contributing to risk of MDD on chromosome 10: one near the SIRT1 gene (P = 2.53 × 10(-10)), the other in an intron of the LHPP gene (P = 6.45 × 10(-12)). Analysis of 4,509 cases with a severe subtype of MDD, melancholia, yielded an increased genetic signal at the SIRT1 locus. We attribute our success to the recruitment of relatively homogeneous cases with severe illness.
Collapse
|
43
|
Accurate liability estimation improves power in ascertained case-control studies. Nat Methods 2015; 12:332-4. [PMID: 25664543 DOI: 10.1038/nmeth.3285] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2014] [Accepted: 12/18/2014] [Indexed: 12/31/2022]
Abstract
Linear mixed models (LMMs) have emerged as the method of choice for confounded genome-wide association studies. However, the performance of LMMs in nonrandomly ascertained case-control studies deteriorates with increasing sample size. We propose a framework called LEAP (liability estimator as a phenotype; https://github.com/omerwe/LEAP) that tests for association with estimated latent values corresponding to severity of phenotype, and we demonstrate that this can lead to a substantial power increase.
Collapse
|