351
|
Friesen ML, Cordeiro MA, Penmetsa RV, Badri M, Huguet T, Aouani ME, Cook DR, Nuzhdin SV. Population genomic analysis of Tunisian Medicago truncatula reveals candidates for local adaptation. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 63:623-35. [PMID: 20545888 DOI: 10.1111/j.1365-313x.2010.04267.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Genome-wide association studies rely upon segregating natural genetic variation, particularly the patterns of polymorphism and correlation between adjacent markers. To facilitate association studies in the model legume Medicago truncatula, we present a genome-scale polymorphism scan using existing Affymetrix microarrays. We develop and validate a method that uses a simple information-criteria algorithm to call polymorphism from microarray data without reliance on a reference genotype. We genotype 12 inbred M. truncatula lines sampled from four wild Tunisian populations and find polymorphisms at approximately 7% of features, comprising 31 419 probes. Only approximately 3% of these markers assort by population, and of these only 10% differentiate between populations from saline and non-saline sites. Fifty-two differentiated probes with unique genome locations correspond to 18 distinct genome regions. Sanger resequencing was used to characterize a subset of maker loci and develop a single nucleotide polymorphism (SNP)-typing assay that confirmed marker assortment by habitat in an independent sample of 33 individuals from the four populations. Genome-wide linkage disequilibrium (LD) extends on average for approximately 10 kb, falling to background levels by approximately 500 kb. A similar range of LD decay was observed in the 18 genome regions that assort by habitat; these LD blocks delimit candidate genes for local adaptation, many of which encode proteins with predicted functions in abiotic stress tolerance and are targets for functional genomic studies. Tunisian M. truncatula populations contain substantial amounts of genetic variation that is structured in relatively small LD blocks, suggesting a history of migration and recombination. These populations provide a strong resource for genome-wide association studies.
Collapse
Affiliation(s)
- Maren L Friesen
- Department of Molecular and Computational Biology, University of Southern California, Los Angeles, CA, USA.
| | | | | | | | | | | | | | | |
Collapse
|
352
|
Sulpice R, Trenkamp S, Steinfath M, Usadel B, Gibon Y, Witucka-Wall H, Pyl ET, Tschoep H, Steinhauser MC, Guenther M, Hoehne M, Rohwer JM, Altmann T, Fernie AR, Stitt M. Network analysis of enzyme activities and metabolite levels and their relationship to biomass in a large panel of Arabidopsis accessions. THE PLANT CELL 2010; 22:2872-93. [PMID: 20699391 PMCID: PMC2947169 DOI: 10.1105/tpc.110.076653] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2010] [Revised: 07/16/2010] [Accepted: 07/25/2010] [Indexed: 05/17/2023]
Abstract
Natural genetic diversity provides a powerful resource to investigate how networks respond to multiple simultaneous changes. In this work, we profile maximum catalytic activities of 37 enzymes from central metabolism and generate a matrix to investigate species-wide connectivity between metabolites, enzymes, and biomass. Most enzyme activities change in a highly coordinated manner, especially those in the Calvin-Benson cycle. Metabolites show coordinated changes in defined sectors of metabolism. Little connectivity was observed between maximum enzyme activities and metabolites, even after applying multivariate analysis methods. Measurements of posttranscriptional regulation will be required to relate these two functional levels. Individual enzyme activities correlate only weakly with biomass. However, when they are used to estimate protein abundances, and the latter are summed and expressed as a fraction of total protein, a significant positive correlation to biomass is observed. The correlation is additive to that obtained between starch and biomass. Thus, biomass is predicted by two independent integrative metabolic biomarkers: preferential investment in photosynthetic machinery and optimization of carbon use.
Collapse
Affiliation(s)
- Ronan Sulpice
- Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam-Golm, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
353
|
Sillanpää MJ. Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity (Edinb) 2010; 106:511-9. [PMID: 20628415 DOI: 10.1038/hdy.2010.91] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Population-based genomic association analyses are more powerful than within-family analyses. However, population stratification (unknown or ignored origin of individuals from multiple source populations) and cryptic relatedness (unknown or ignored covariance between individuals because of their relatedness) are confounding factors in population-based genomic association analyses, which inflate the false-positive rate. As a consequence, false association signals may arise in genomic data association analyses for reasons other than true association between the tested genomic factor (marker genotype, gene or protein expression) and the study phenotype. It is therefore important to correct or account for these confounders in population-based genomic data association analyses. The common correction techniques for population stratification and cryptic relatedness problems are presented here in the phenotype-marker association analysis context, and comments on their suitability for other types of genomic association analyses (for example, phenotype-expression association) are also provided. Even though many of these techniques have originally been developed in the context of human genetics, most of them are also applicable to model organisms and breeding populations.
Collapse
Affiliation(s)
- M J Sillanpää
- Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
354
|
Application of association mapping to understanding the genetic diversity of plant germplasm resources. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2010; 2008:574927. [PMID: 18551188 PMCID: PMC2423417 DOI: 10.1155/2008/574927] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2007] [Accepted: 04/18/2008] [Indexed: 02/05/2023]
Abstract
Compared to the conventional linkage mapping, linkage disequilibrium (LD)-mapping, using the nonrandom associations of loci in haplotypes, is a powerful high-resolution mapping tool for complex quantitative traits. The recent advances in the development of unbiased association mapping approaches for plant population with their successful applications in dissecting a number of simple to complex traits in many crop species demonstrate a flourish of the approach as a “powerful gene tagging” tool for crops in the plant genomics era of 21st century. The goal of this review is to provide nonexpert readers of crop breeding community with (1) the basic concept, merits, and simple description of existing methodologies for an association mapping with the recent improvements for plant populations, and (2) the details of some of pioneer and recent studies on association mapping in various crop species to demonstrate the feasibility, success, problems, and future perspectives of the efforts in plants. This should be helpful for interested readers of international plant research community as a guideline for the basic understanding, choosing the appropriate methods, and its application.
Collapse
|
355
|
Zhao K, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, Tyagi W, Ali ML, Tung CW, Reynolds A, Bustamante CD, McCouch SR. Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome. PLoS One 2010; 5:e10780. [PMID: 20520727 PMCID: PMC2875394 DOI: 10.1371/journal.pone.0010780] [Citation(s) in RCA: 218] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2010] [Accepted: 04/30/2010] [Indexed: 11/23/2022] Open
Abstract
Background The domestication of Asian rice (Oryza sativa) was a complex process punctuated by episodes of introgressive hybridization among and between subpopulations. Deep genetic divergence between the two main varietal groups (Indica and Japonica) suggests domestication from at least two distinct wild populations. However, genetic uniformity surrounding key domestication genes across divergent subpopulations suggests cultural exchange of genetic material among ancient farmers. Methodology/Principal Findings In this study, we utilize a novel 1,536 SNP panel genotyped across 395 diverse accessions of O. sativa to study genome-wide patterns of polymorphism, to characterize population structure, and to infer the introgression history of domesticated Asian rice. Our population structure analyses support the existence of five major subpopulations (indica, aus, tropical japonica, temperate japonica and GroupV) consistent with previous analyses. Our introgression analysis shows that most accessions exhibit some degree of admixture, with many individuals within a population sharing the same introgressed segment due to artificial selection. Admixture mapping and association analysis of amylose content and grain length illustrate the potential for dissecting the genetic basis of complex traits in domesticated plant populations. Conclusions/Significance Genes in these regions control a myriad of traits including plant stature, blast resistance, and amylose content. These analyses highlight the power of population genomics in agricultural systems to identify functionally important regions of the genome and to decipher the role of human-directed breeding in refashioning the genomes of a domesticated species.
Collapse
Affiliation(s)
- Keyan Zhao
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
| | - Mark Wright
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Jennifer Kimball
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Georgia Eizenga
- Dale Bumpers National Rice Research Center, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Stuttgart, Arkansas, United States of America
| | - Anna McClung
- Dale Bumpers National Rice Research Center, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Stuttgart, Arkansas, United States of America
| | - Michael Kovach
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Wricha Tyagi
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Md. Liakat Ali
- Rice Research and Extension Center, University of Arkansas, Stuttgart, Arkansas, United States of America
| | - Chih-Wei Tung
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Andy Reynolds
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
| | - Carlos D. Bustamante
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, United States of America
- Department of Genetics, Stanford University, Stanford, California, United States of America
- * E-mail: (CDB); (SRM)
| | - Susan R. McCouch
- Department of Plant Breeding and Genetics, Cornell University, Ithaca, New York, United States of America
- * E-mail: (CDB); (SRM)
| |
Collapse
|
356
|
Allelic variation in cell wall candidate genes affecting solid wood properties in natural populations and land races of Pinus radiata. Genetics 2010; 185:1477-87. [PMID: 20498299 DOI: 10.1534/genetics.110.116582] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Forest trees are ideally suited to association mapping due to their high levels of diversity and low genomic linkage disequilibrium. Using an association mapping approach, single-nucleotide polymorphism (SNP) markers influencing quantitative variation in wood quality were identified in a natural population of Pinus radiata. Of 149 sites examined, 10 demonstrated significant associations (P < 0.05, q < 0.1) with one or more traits after accounting for population structure and experimentwise error. Without accounting for marker interactions, phenotypic variation attributed to individual SNPs ranged from 2 to 6.5%. Undesirable negative correlations between wood quality and growth were not observed, indicating potential to break negative correlations by selecting for individual SNPs in breeding programs. Markers that yielded significant associations were reexamined in an Australian land race. SNPs from three genes (PAL1, PCBER, and SUSY) yielded significant associations. Importantly, associations with two of these genes validated associations with density previously observed in the discovery population. In both cases, decreased wood density was associated with the minor allele, suggesting that these SNPs may be under weak negative purifying selection for density in the natural populations. These results demonstrate the utility of LD mapping to detect associations, even when the power to detect SNPs with small effect is anticipated to be low.
Collapse
|
357
|
Genome-wide survey of Arabidopsis natural variation in downy mildew resistance using combined association and linkage mapping. Proc Natl Acad Sci U S A 2010; 107:10302-7. [PMID: 20479233 DOI: 10.1073/pnas.0913160107] [Citation(s) in RCA: 95] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The model plant Arabidopsis thaliana exhibits extensive natural variation in resistance to parasites. Immunity is often conferred by resistance (R) genes that permit recognition of specific races of a disease. The number of such R genes and their distribution are poorly understood. In this study, we investigated the basis for resistance to the downy mildew agent Hyaloperonospora arabidopsidis ex parasitica (Hpa) in a global sample of A. thaliana. We implemented a combined genome-wide mapping of resistance using populations of recombinant inbred lines and a collection of wild A. thaliana accessions. We tested the interaction between 96 host genotypes collected worldwide and five strains of Hpa. Then, a fraction of the species-wide resistance was genetically dissected using six recently constructed populations of recombinant inbred lines. We found that resistance is usually governed by single dominant R genes that are concentrated in four genomic regions only. We show that association genetics of resistance to diseases such as downy mildew enables increased mapping resolution from quantitative trait loci interval to candidate gene level. Association patterns in quantitative trait loci intervals indicate that the pool of A. thaliana resistance sources against the tested Hpa isolates may be predominantly confined to six RPP (Resistance to Hpa) loci isolated in previous studies. Our results suggest that combining association and linkage mapping could accelerate resistance gene discovery in plants.
Collapse
|
358
|
Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, Bergelson J, Cuguen J, Roux F. Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet 2010; 6:e1000940. [PMID: 20463887 PMCID: PMC2865524 DOI: 10.1371/journal.pgen.1000940] [Citation(s) in RCA: 305] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 04/06/2010] [Indexed: 12/28/2022] Open
Abstract
Flowering time is a key life-history trait in the plant life cycle. Most studies to unravel the genetics of flowering time in Arabidopsis thaliana have been performed under greenhouse conditions. Here, we describe a study about the genetics of flowering time that differs from previous studies in two important ways: first, we measure flowering time in a more complex and ecologically realistic environment; and, second, we combine the advantages of genome-wide association (GWA) and traditional linkage (QTL) mapping. Our experiments involved phenotyping nearly 20,000 plants over 2 winters under field conditions, including 184 worldwide natural accessions genotyped for 216,509 SNPs and 4,366 RILs derived from 13 independent crosses chosen to maximize genetic and phenotypic diversity. Based on a photothermal time model, the flowering time variation scored in our field experiment was poorly correlated with the flowering time variation previously obtained under greenhouse conditions, reinforcing previous demonstrations of the importance of genotype by environment interactions in A. thaliana and the need to study adaptive variation under natural conditions. The use of 4,366 RILs provides great power for dissecting the genetic architecture of flowering time in A. thaliana under our specific field conditions. We describe more than 60 additive QTLs, all with relatively small to medium effects and organized in 5 major clusters. We show that QTL mapping increases our power to distinguish true from false associations in GWA mapping. QTL mapping also permits the identification of false negatives, that is, causative SNPs that are lost when applying GWA methods that control for population structure. Major genes underpinning flowering time in the greenhouse were not associated with flowering time in this study. Instead, we found a prevalence of genes involved in the regulation of the plant circadian clock. Furthermore, we identified new genomic regions lacking obvious candidate genes.
Collapse
Affiliation(s)
- Benjamin Brachi
- Laboratoire Génétique et Evolution des Populations Végétales, Unité Mixte de Recherche CNRS 8016, Université des Sciences et Technologies de Lille 1, Villeneuve d'Ascq, France
| | | | | | | | | | | | | | | | | |
Collapse
|
359
|
Abstract
The genetics of phenotypic variation in inbred mice has for nearly a century provided a primary weapon in the medical research arsenal. A catalog of the genetic variation among inbred mouse strains, however, is required to enable powerful positional cloning and association techniques. A recent whole-genome resequencing study of 15 inbred mouse strains captured a significant fraction of the genetic variation among a limited number of strains, yet the common use of hundreds of inbred strains in medical research motivates the need for a high-density variation map of a larger set of strains. Here we report a dense set of genotypes from 94 inbred mouse strains containing 10.77 million genotypes over 121,433 single nucleotide polymorphisms (SNPs), dispersed at 20-kb intervals on average across the genome, with an average concordance of 99.94% with previous SNP sets. Through pairwise comparisons of the strains, we identified an average of 4.70 distinct segments over 73 classical inbred strains in each region of the genome, suggesting limited genetic diversity between the strains. Combining these data with genotypes of 7570 gap-filling SNPs, we further imputed the untyped or missing genotypes of 94 strains over 8.27 million Perlegen SNPs. The imputation accuracy among classical inbred strains is estimated at 99.7% for the genotypes imputed with high confidence. We demonstrated the utility of these data in high-resolution linkage mapping through power simulations and statistical power analysis and provide guidelines for developing such studies. We also provide a resource of in silico association mapping between the complex traits deposited in the Mouse Phenome Database with our genotypes. We expect that these resources will facilitate effective designs of both human and mouse studies for dissecting the genetic basis of complex traits.
Collapse
|
360
|
Zhang N, Gur A, Gibon Y, Sulpice R, Flint-Garcia S, McMullen MD, Stitt M, Buckler ES. Genetic analysis of central carbon metabolism unveils an amino acid substitution that alters maize NAD-dependent isocitrate dehydrogenase activity. PLoS One 2010; 5:e9991. [PMID: 20376324 PMCID: PMC2848677 DOI: 10.1371/journal.pone.0009991] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Accepted: 02/05/2010] [Indexed: 01/24/2023] Open
Abstract
Background Central carbon metabolism (CCM) is a fundamental component of life. The participating genes and enzymes are thought to be structurally and functionally conserved across and within species. Association mapping utilizes a rich history of mutation and recombination to achieve high resolution mapping. Therefore, applying association mapping in maize (Zea mays ssp. mays), the most diverse model crop species, to study the genetics of CCM is a particularly attractive system. Methodology/Principal Findings We used a maize diversity panel to test the CCM functional conservation. We found heritable variation in enzyme activity for every enzyme tested. One of these enzymes was the NAD-dependent isocitrate dehydrogenase (IDH, E.C. 1.1.1.41), in which we identified a novel amino-acid substitution in a phylogenetically conserved site. Using candidate gene association mapping, we identified that this non-synonymous polymorphism was associated with IDH activity variation. The proposed mechanism for the IDH activity variation includes additional components regulating protein level. With the comparison of sequences from maize and teosinte (Zea mays ssp. Parviglumis), the maize wild ancestor, we found that some CCM genes had also been targeted for selection during maize domestication. Conclusions/Significance Our results demonstrate the efficacy of association mapping for dissecting natural variation in primary metabolic pathways. The considerable genetic diversity observed in maize CCM genes underlies heritable phenotypic variation in enzyme activities and can be useful to identify putative functional sites.
Collapse
Affiliation(s)
- Nengyi Zhang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, United States of America.
| | | | | | | | | | | | | | | |
Collapse
|
361
|
Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, Meng D, Platt A, Tarone AM, Hu TT, Jiang R, Muliyati NW, Zhang X, Amer MA, Baxter I, Brachi B, Chory J, Dean C, Debieu M, de Meaux J, Ecker JR, Faure N, Kniskern JM, Jones JDG, Michael T, Nemri A, Roux F, Salt DE, Tang C, Todesco M, Traw MB, Weigel D, Marjoram P, Borevitz JO, Bergelson J, Nordborg M. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 2010; 465:627-31. [PMID: 20336072 PMCID: PMC3023908 DOI: 10.1038/nature08800] [Citation(s) in RCA: 1168] [Impact Index Per Article: 83.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2009] [Accepted: 12/30/2009] [Indexed: 11/09/2022]
Abstract
Although pioneered by human geneticists as a potential solution to the challenging problem of finding the genetic basis of common human diseases1,2, advances in genotyping and sequencing technology have made genome-wide association (GWA) studies an obvious general approach for studying the genetics of natural variation and traits of agricultural importance. They are particularly useful when inbred lines are available because once these lines have been genotyped, they can be phenotyped multiple times, making it possible (as well as extremely cost-effective) to study many different traits in many different environments, while replicating the phenotypic measurements to reduce environmental noise. Here we demonstrate the power of this approach by carrying out a GWA study of 107 phenotypes in Arabidopsis thaliana, a widely distributed, predominantly selfing model plant, known to harbor considerable genetic variation for many adaptively important traits3. Our results are dramatically different from those of human GWA studies in that we identify many common alleles with major effect, but they are also, in many cases, harder to interpret because confounding by complex genetics and population structure make it difficult to distinguish true from false associations. However, a priori candidates are significantly overrepresented among these associations as well, making many of them excellent candidates for follow-up experiments by the Arabidopsis community. Our study clearly demonstrates the feasibility of GWA studies in A. thaliana, and suggests that the approach will be appropriate for many other organisms.
Collapse
Affiliation(s)
- Susanna Atwell
- Molecular and Computational Biology, University of Southern California, Los Angeles, California 90089, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
362
|
Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010; 42:348-54. [PMID: 20208533 DOI: 10.1038/ng.548] [Citation(s) in RCA: 1794] [Impact Index Per Article: 128.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 02/09/2010] [Indexed: 02/07/2023]
Abstract
Although genome-wide association studies (GWASs) have identified numerous loci associated with complex traits, imprecise modeling of the genetic relatedness within study samples may cause substantial inflation of test statistics and possibly spurious associations. Variance component approaches, such as efficient mixed-model association (EMMA), can correct for a wide range of sample structures by explicitly accounting for pairwise relatedness between individuals, using high-density markers to model the phenotype distribution; but such approaches are computationally impractical. We report here a variance component approach implemented in publicly available software, EMMA eXpedited (EMMAX), that reduces the computational time for analyzing large GWAS data sets from years to hours. We apply this method to two human GWAS data sets, performing association analysis for ten quantitative traits from the Northern Finland Birth Cohort and seven common diseases from the Wellcome Trust Case Control Consortium. We find that EMMAX outperforms both principal component analysis and genomic control in correcting for sample structure.
Collapse
|
363
|
Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES. Mixed linear model approach adapted for genome-wide association studies. Nat Genet 2010; 42:355-60. [PMID: 20208535 DOI: 10.1038/ng.546] [Citation(s) in RCA: 1252] [Impact Index Per Article: 89.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2009] [Accepted: 02/09/2010] [Indexed: 11/09/2022]
Abstract
Mixed linear model (MLM) methods have proven useful in controlling for population structure and relatedness within genome-wide association studies. However, MLM-based methods can be computationally challenging for large datasets. We report a compression approach, called 'compressed MLM', that decreases the effective sample size of such datasets by clustering individuals into groups. We also present a complementary approach, 'population parameters previously determined' (P3D), that eliminates the need to re-compute variance components. We applied these two methods both independently and combined in selected genetic association datasets from human, dog and maize. The joint implementation of these two methods markedly reduced computing time and either maintained or improved statistical power. We used simulations to demonstrate the usefulness in controlling for substructure in genetic association datasets for a range of species and genetic architectures. We have made these methods available within an implementation of the software program TASSEL.
Collapse
Affiliation(s)
- Zhiwu Zhang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
364
|
Archetti M. Complementation, Genetic Conflict, and the Evolution of Sex and Recombination. J Hered 2010; 101 Suppl 1:S21-33. [DOI: 10.1093/jhered/esq009] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
|
365
|
Abstract
The control of flowering time in plants is critical for plant fitness and for agriculture. The genetic pathways governing this developmental transition are reasonably well understood in Arabidopsis, although substantial new gains are still being made in this system. Much new work is focusing on how the genetic networks governing flowering function in other species.
Collapse
Affiliation(s)
- Julin N Maloof
- Department of Plant Biology, University of California, Davis 1 Shields Avenue, Davis, CA 95616 USA
| |
Collapse
|
366
|
Sun G, Zhu C, Kramer MH, Yang SS, Song W, Piepho HP, Yu J. Variation explained in mixed-model association mapping. Heredity (Edinb) 2010; 105:333-40. [PMID: 20145669 DOI: 10.1038/hdy.2010.11] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Genomic mapping of complex traits across species demands integrating genetics and statistics. In particular, because it is easily interpreted, the R(2) statistic is commonly used in quantitative trait locus (QTL) mapping studies to measure the proportion of phenotypic variation explained by molecular markers. Mixed models with random polygenic effects have been used in complex trait dissection in different species. However, unlike fixed linear regression models, linear mixed models have no well-established R(2) statistic for assessing goodness-of-fit and prediction power. Our objectives were to assess the performance of several R(2)-like statistics for a linear mixed model in association mapping and to identify any such statistic that measures model-data agreement and provides an intuitive indication of QTL effect. Our results showed that the likelihood-ratio-based R(2) (R(LR)(2)) satisfies several critical requirements proposed for the R(2)-like statistic. As R(LR)(2) reduces to the regular R(2) for fixed models without random effects other than residual, it provides a general measure for the effect of QTL in mixed-model association mapping. Moreover, we found that R(LR)(2) can help explain the overlap between overall population structure modeled as fixed effects and relative kinship modeled though random effects. As both approaches are derived from molecular marker information and are not mutually exclusive, comparing R(LR)(2) values from different models provides a logical bridge between statistical analysis and underlying genetics of complex traits.
Collapse
Affiliation(s)
- G Sun
- Department of Agronomy, Kansas State University, Manhattan, KS, USA
| | | | | | | | | | | | | |
Collapse
|
367
|
Su WL, Sieberts SK, Kleinhanz RR, Lux K, Millstein J, Molony C, Schadt EE. Assessing the prospects of genome-wide association studies performed in inbred mice. Mamm Genome 2010; 21:143-52. [PMID: 20135320 DOI: 10.1007/s00335-010-9249-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 01/05/2010] [Indexed: 01/18/2023]
Abstract
The remarkable success in mapping genes linked to a number of disease traits using genome-wide association studies (GWAS) in human cohorts has renewed interest in applying this same technique in model organisms such as inbred laboratory mice. Unlike humans, however, the limited genetic diversity in the ancestry of laboratory mice combined with selection pressure over the past decades have yielded an intricate population genetic structure that can complicate the results obtained from association studies. This problem is further exacerbated by the small number of strains typically used in such studies where multiple spurious associations arise as a result of random chance. We sought to empirically assess the viability of GWAS in inbred mice using hundreds of expression traits for which the true location of the expression quantitative trait locus was known a priori. We then measured transcript abundance levels for these expression traits in 16 classical and 3 wild-derived inbred strains and carried out a genome-wide association scan, demonstrating the low statistical power of such studies and empirically estimating the large extent to which allelic association of transcripts gives rise to spurious associations. We provide evidence illustrating that in a large fraction of cases, the marker with the most significant p values fails to map to the location of the true eQTL. Finally, we provide experimental support for hundreds of traits, and that combining linkage analysis with association mapping provides significant increases in statistical power over a stand-alone GWAS as well as significantly higher mapping resolution than either study alone.
Collapse
Affiliation(s)
- Wan-Lin Su
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | |
Collapse
|
368
|
Zhang Z, Buckler ES, Casstevens TM, Bradbury PJ. Software engineering the mixed model for genome-wide association studies on large samples. Brief Bioinform 2010; 10:664-75. [PMID: 19933212 DOI: 10.1093/bib/bbp050] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Mixed models improve the ability to detect phenotype-genotype associations in the presence of population stratification and multiple levels of relatedness in genome-wide association studies (GWAS), but for large data sets the resource consumption becomes impractical. At the same time, the sample size and number of markers used for GWAS is increasing dramatically, resulting in greater statistical power to detect those associations. The use of mixed models with increasingly large data sets depends on the availability of software for analyzing those models. While multiple software packages implement the mixed model method, no single package provides the best combination of fast computation, ability to handle large samples, flexible modeling and ease of use. Key elements of association analysis with mixed models are reviewed, including modeling phenotype-genotype associations using mixed models, population stratification, kinship and its estimation, variance component estimation, use of best linear unbiased predictors or residuals in place of raw phenotype, improving efficiency and software-user interaction. The available software packages are evaluated, and suggestions made for future software development.
Collapse
Affiliation(s)
- Zhiwu Zhang
- Institute for Genomic Diversity, Cornell University, Ithaca, New York, USA.
| | | | | | | |
Collapse
|
369
|
Abstract
Gene expression microarrays allow rapid and easy quantification of transcript accumulation for almost transcripts present in a genome. This technology has been utilized for diverse investigations from studying gene regulation in response to genetic or environmental fluctuation to global expression QTL (eQTL) analyses of natural variation. Typical analysis techniques focus on responses of individual genes in isolation of other genes. However, emerging evidence indicates that genes are organized into regulons, i.e., they respond as groups due to individual transcription factors binding multiple promoters, creating what is commonly called a network. We have developed a set of statistical approaches that allow researchers to test specific network hypothesis using a priori-defined gene networks. When applied to Arabidopsis thaliana this approach has been able to identify natural genetic variation that controls networks. In this chapter we describe approaches to develop and test specific network hypothesis utilizing natural genetic variation. This approach can be expanded to facilitate direct tests of the relationship between phenotypic trait and transcript genetic architecture. Finally, the use of a priori network definitions can be applied to any microarray experiment to directly conduct hypothesis testing at a genomics level.
Collapse
|
370
|
Hall D, Tegstrom C, Ingvarsson PK. Using association mapping to dissect the genetic basis of complex traits in plants. Brief Funct Genomics 2010; 9:157-65. [DOI: 10.1093/bfgp/elp048] [Citation(s) in RCA: 116] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
371
|
Weber AL, Zhao Q, McMullen MD, Doebley JF. Using association mapping in teosinte to investigate the function of maize selection-candidate genes. PLoS One 2009; 4:e8227. [PMID: 20011044 PMCID: PMC2785427 DOI: 10.1371/journal.pone.0008227] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2009] [Accepted: 11/16/2009] [Indexed: 12/02/2022] Open
Abstract
Background Large-scale screens of the maize genome identified 48 genes that show the putative signature of artificial selection during maize domestication or improvement. These selection-candidate genes may act as quantitative trait loci (QTL) that control the phenotypic differences between maize and its progenitor, teosinte. The selection-candidate genes appear to be located closer in the genome to domestication QTL than expected by chance. Methods and Findings As a step toward defining the traits controlled by these genes, we performed phenotype-genotype association mapping in teosinte for 32 of the 48 plus three other selection-candidate genes. Our analyses assayed 32 phenotypic traits, many of which were altered during maize domestication or improvement. We observed several significant associations between SNPs in the selection-candidate genes and trait variation in teosinte. These included two associations that surpassed the Bonferroni correction and five instances where a gene significantly associated with the same trait in both of our association mapping panels. Despite these significant associations, when compared as a group the selection-candidate genes performed no better than randomly chosen genes. Conclusions Our results suggest association analyses can be helpful for identifying traits under the control of selection-candidate genes. Indeed, we present evidence for new functions for several selection-candidate genes. However, with the current set of selection-candidate genes and our association mapping strategy, we found very few significant associations overall and no more than we would have found with randomly chosen genes. We discuss possible reasons that a large number of significant genotype-phenotype associations were not discovered.
Collapse
Affiliation(s)
- Allison L Weber
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America.
| | | | | | | |
Collapse
|
372
|
Li M, Reilly C, Hanson T. Association Tests for a Censored Quantitative Trait and Candidate Genes in Structured Populations with Multilevel Genetic Relatedness. Biometrics 2009; 66:925-33. [DOI: 10.1111/j.1541-0420.2009.01352.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
373
|
Sano CM, Bohn MO, Paige KN, Jacobs TW. Heritable variation in the inflorescence replacement program of Arabidopsis thaliana. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2009; 119:1461-1476. [PMID: 19787332 DOI: 10.1007/s00122-009-1148-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2008] [Accepted: 08/30/2009] [Indexed: 05/28/2023]
Abstract
Owing to their sessile habits and trophic position within global ecosystems, higher plants display a sundry assortment of adaptations to the threat of predation. Unlike animals, nearly all higher plants can replace reproductive structures lost to predators by activating reserved growing points called axillary meristems. As the first step in a program aimed at defining the genetic architecture of the inflorescence replacement program (IRP) of Arabidopsis thaliana, we describe the results of a quantitative germplasm survey of developmental responses to loss of the primary reproductive axis. Eighty-five diverse accessions were grown in a replicated common garden and assessed for six life history traits and four IRP traits, including the number and lengths of axillary inflorescences present on the day that the first among them re-flowered after basal clipping of the primary inflorescence. Significant natural variation and high heritabilities were observed for all measured characters. Pairwise correlations among the 10 focal traits revealed a multi-dimensional phenotypic space sculpted by ontogenic and plastic allometries as well as apparent constraints and outliers of genetic interest. Cluster analysis of the IRP traits sorted the 85 accessions into 5 associations, a topology that establishes the boundaries within which the evolving Arabidopsis genome extends and restricts the species' IRP repertoire to that observable worldwide.
Collapse
Affiliation(s)
- Cecile M Sano
- Department of Plant Biology, University of Illinois, 191 Edward R. Madigan Laboratory, 1201 West Gregory Drive, Urbana, IL, 61801, USA
| | | | | | | |
Collapse
|
374
|
Astle W, Balding DJ. Population Structure and Cryptic Relatedness in Genetic Association Studies. Stat Sci 2009. [DOI: 10.1214/09-sts307] [Citation(s) in RCA: 310] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
375
|
Tazib T, Kobayashi Y, Ikka T, Zhao CR, Iuchi S, Kobayashi M, Kimura K, Koyama H. Association mapping of cadmium, copper and hydrogen peroxide tolerance of roots and translocation capacities of cadmium and copper in Arabidopsis thaliana. PHYSIOLOGIA PLANTARUM 2009; 137:235-248. [PMID: 19832939 DOI: 10.1111/j.1399-3054.2009.01286.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Association mapping analysis of Cd, Cu and H (2)O (2) tolerance, judged by relative root length (RRL: % of root length in stress condition relative to that in control condition), and Cd and Cu translocation ratios (amount of metal in the shoot to the total) were performed using 90 accessions of Arabidopsis thaliana. Using 140 SNPs that were distributed across the genome, association mapping analysis was performed with a haploid setting by the Q + K method, which minimizes detection of false associations by combining the Q-matrix of the structured association (Q) with kinship (K) to control for the population structure. Six, five and five significant (-log (10)P-value is 1.3 > or =) linkages were detected between the SNPs and Cd, Cu and H(2)O(2) resistant RRLs, respectively. In addition, six significant linkages were identified with translocation capacities of Cd and Cu. Among those detected loci, two each of Cu and Cd tolerance RRLs were collocated with those of H(2)O(2) tolerance RRL, while one locus each was detected by Cu and Cd tolerance RRLs that collocated with their translocation ratios. These results suggested that these factors might partly explain the phenotypic variation of tolerance RRLs to Cd and Cu of Arabidopsis thaliana. Finally, using a different approach to analyze interactions between individual phenotypes, namely clustering analysis, we found an expected segregation of resistant SNPs (single-nucleotide polymorphisms) of the multiple RRLs in the typical accession groups carrying multiple traits. Almost none of the loci detected by association mapping analysis were linked to the loci of previously identified critical genes regulating the traits, suggesting that this could be useful to identify complex architecture of genetic factors determining variation among multiple accessions.
Collapse
Affiliation(s)
- Tanveer Tazib
- Laboratory of Plant Cell Technology, Faculty of Applied Biological Sciences, Gifu University, Gifu, Japan
| | | | | | | | | | | | | | | |
Collapse
|
376
|
Yan WG, Li Y, Agrama HA, Luo D, Gao F, Lu X, Ren G. Association mapping of stigma and spikelet characteristics in rice (Oryza sativa L.). MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2009; 24:277-292. [PMID: 20234878 PMCID: PMC2837221 DOI: 10.1007/s11032-009-9290-y] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Accepted: 04/27/2009] [Indexed: 05/20/2023]
Abstract
Stigma and spikelet characteristics play an essential role in hybrid seed production. A mini-core of 90 accessions developed from USDA rice core collection was phenotyped in field grown for nine traits of stigma and spikelet and genotyped with 109 DNA markers, 108 SSRs plus an indel. Three major clusters were built upon Rogers' genetic distance, indicative of indicas, and temperate and tropical japonicas. A mixed linear model combining PC-matrix and K-matrix was adapted for mapping marker-trait associations. Resulting associations were adjusted using false discovery rate technique. We identified 34 marker-trait associations involving 22 SSR markers for eight traits. Four markers were associated with single stigma exsertion (SStgE), six with dual exsertion (DStgE) and five with total exsertion. RM5_Chr1 played major role indicative of high regression with not only DStgE but also SStgE. Four markers were associated with spikelet length, three with width and seven with L/W ratio. Numerous markers were co-associated with multiple traits that were phenotypically correlated, i.e. RM12521_Chr2 associated with all three correlated spikelet traits. The co-association should improve breeding efficiency because single marker could be used to assist breeding for multiple traits. Indica entry 1032 (cultivar 50638) and japonica entry 671 (cultivar Linia 84 Icar) with 80.65 and 75.17% of TStgE, respectively are recommended to breeder for improving stigma exsertion. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-009-9290-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wen Gui Yan
- Dale Bumpers National Rice Research Center, United States Department of Agriculture, Agricultural Research Service, 2890 Highway 130 East, Stuttgart, AR 72160 USA
| | - Yong Li
- Sichuan Academy of Agricultural Sciences, No. 20 Jingjusi Road, 610066 Chengdu, Sichuan China
| | - Hesham A. Agrama
- Rice Research and Extension Center, University of Arkansas, 2890 Highway 130 East, Stuttgart, AR 72160 USA
| | - Dagang Luo
- Sichuan Academy of Agricultural Sciences, No. 20 Jingjusi Road, 610066 Chengdu, Sichuan China
| | - Fangyuan Gao
- Sichuan Academy of Agricultural Sciences, No. 20 Jingjusi Road, 610066 Chengdu, Sichuan China
| | - Xianjun Lu
- Sichuan Academy of Agricultural Sciences, No. 20 Jingjusi Road, 610066 Chengdu, Sichuan China
| | - Guangjun Ren
- Sichuan Academy of Agricultural Sciences, No. 20 Jingjusi Road, 610066 Chengdu, Sichuan China
| |
Collapse
|
377
|
Montesinos A, Tonsor SJ, Alonso-Blanco C, Picó FX. Demographic and genetic patterns of variation among populations of Arabidopsis thaliana from contrasting native environments. PLoS One 2009; 4:e7213. [PMID: 19787050 PMCID: PMC2746291 DOI: 10.1371/journal.pone.0007213] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Accepted: 08/24/2009] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Understanding the relationship between environment and genetics requires the integration of knowledge on the demographic behavior of natural populations. However, the demographic performance and genetic composition of Arabidopsis thaliana populations in the species' native environments remain largely uncharacterized. This information, in combination with the advances on the study of gene function, will improve our understanding on the genetic mechanisms underlying adaptive evolution in A. thaliana. METHODOLOGY/PRINCIPAL FINDINGS We report the extent of environmental, demographic, and genetic variation among 10 A. thaliana populations from Mediterranean (coastal) and Pyrenean (montane) native environments in northeast Spain. Geographic, climatic, landscape, and soil data were compared. Demographic traits, including the dynamics of the soil seed bank and the attributes of aboveground individuals followed over a complete season, were also analyzed. Genetic data based on genome-wide SNP markers were used to describe genetic diversity, differentiation, and structure. Coastal and montane populations significantly differed in terms of environmental, demographic, and genetic characteristics. Montane populations, at higher altitude and farther from the sea, are exposed to colder winters and prolonged spring moisture compared to coastal populations. Montane populations showed stronger secondary seed dormancy, higher seedling/juvenile mortality in winter, and initiated flowering later than coastal populations. Montane and coastal regions were genetically differentiated, montane populations bearing lower genetic diversity than coastal ones. No significant isolation-by-distance pattern and no shared multilocus genotypes among populations were detected. CONCLUSIONS/SIGNIFICANCE Between-region variation in climatic patterns can account for differences in demographic traits, such as secondary seed dormancy, plant mortality, and recruitment, between coastal and montane A. thaliana populations. In addition, differences in plant mortality can partly account for differences in the genetic composition of coastal and montane populations. This study shows how the interplay between variation in environmental, demographic, and genetic parameters may operate in natural A. thaliana populations.
Collapse
Affiliation(s)
- Alicia Montesinos
- Departamento de Ecología Integrativa, Estación Biológica de Doñana (EBD), Consejo Superior de Investigaciones Científicas (CSIC), Sevilla, Spain
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Stephen J. Tonsor
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Carlos Alonso-Blanco
- Departamento de Genética Molecular de Plantas, Centro Nacional de Biotecnología (CNB), Consejo Superior de Investigaciones Científicas (CSIC), Madrid, Spain
| | - F. Xavier Picó
- Departamento de Ecología Integrativa, Estación Biológica de Doñana (EBD), Consejo Superior de Investigaciones Científicas (CSIC), Sevilla, Spain
| |
Collapse
|
378
|
Liu N, Bucala R, Zhao H. Modeling Informatively Missing Genotypes in Haplotype Analysis. COMMUN STAT-THEOR M 2009; 38:3445-3460. [PMID: 20052310 DOI: 10.1080/03610920802696588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
It is common to have missing genotypes in practical genetic studies. The majority of the existing statistical methods, including those on haplotype analysis, assume that genotypes are missing at random-that is, at a given marker, different genotypes and different alleles are missing with the same probability. In our previous work, we have demonstrated that the violation of this assumption may lead to serious bias in haplotype frequency estimates and haplotype association analysis. We have proposed a general missing data model to simultaneously characterize missing data patterns across a set of two or more biallelic markers. We have proved that haplotype frequencies and missing data probabilities are identifiable if and only if there is linkage disequilibrium between these markers under the general missing data model. In this study, we extend our work to multi-allelic markers and observe a similar finding. Simulation studies on the analysis of haplotypes consisting of two markers illustrate that our proposed model can reduce the bias for haplotype frequency estimates due to incorrect assumptions on the missing data mechanism. Finally, we illustrate the utilities of our method through its application to a real data set from a study of scleroderma.
Collapse
Affiliation(s)
- Nianjun Liu
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL
| | | | | |
Collapse
|
379
|
Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics 2009; 185:991-1007. [PMID: 19737743 DOI: 10.1534/genetics.109.108522] [Citation(s) in RCA: 135] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
With the improvement and decline in cost of high-throughput genotyping and phenotyping technologies, genome-wide association (GWA) studies are fast becoming a preferred approach for dissecting complex quantitative traits. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of quantitative traits. GSLs are key defenses against insects in the wild and the relatively large number of cloned quantitative trait locus (QTL) controlling GSL traits allows comparison of GWA to previous QTL analyses. To better understand the specieswide genomic architecture controlling plant-insect interactions and the relative strengths of GWA and QTL studies, we conducted a GWA mapping study using 96 A. thaliana accessions, 43 GSL phenotypes, and approximately 230,000 SNPs. Our GWA analysis identified the two major polymorphic loci controlling GSL variation (AOP and MAM) in natural populations within large blocks of positive associations encompassing dozens of genes. These blocks of positive associations showed extended linkage disequilibrium (LD) that we hypothesize to have arisen from balancing or fluctuating selective sweeps at both the AOP and MAM loci. These potential sweep blocks are likely linked with the formation of new defensive chemistries that alter plant fitness in natural environments. Interestingly, this GWA analysis did not identify the majority of previously identified QTL even though these polymorphisms were present in the GWA population. This may be partly explained by a nonrandom distribution of phenotypic variation across population subgroups that links population structure and GSL variation, suggesting that natural selection can hinder the detection of phenotype-genotype associations in natural populations.
Collapse
|
380
|
Ehrenreich IM, Gerke JP, Kruglyak L. Genetic dissection of complex traits in yeast: insights from studies of gene expression and other phenotypes in the BYxRM cross. COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY 2009; 74:145-53. [PMID: 19734204 DOI: 10.1101/sqb.2009.74.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The genetic basis of many phenotypes of biological and medical interest, including susceptibility to common human diseases, is complex, involving multiple genes that interact with one another and the environment. Despite decades of effort, we possess neither a full grasp of the general rules that govern complex trait genetics nor a detailed understanding of the genetic basis of specific complex traits. We have used a cross between two yeast strains, BY and RM, to systematically investigate the genetic complexity underlying differences in global gene expression and other traits. The number and diversity of traits dissected to the locus, gene, and nucleotide levels in the BYxRM cross make it arguably the most extensively characterized system with regard to causal effects of genetic variation on phenotype. We summarize the insights obtained to date into the genetics of complex traits in yeast, with an emphasis on the BYxRM cross. We then highlight the central outstanding questions about the genetics of complex traits and discuss how to answer them using yeast as a model system.
Collapse
Affiliation(s)
- I M Ehrenreich
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | | | | |
Collapse
|
381
|
Quantitative genetic bases of anthocyanin variation in grape (Vitis vinifera L. ssp. sativa) berry: a quantitative trait locus to quantitative trait nucleotide integrated study. Genetics 2009; 183:1127-39. [PMID: 19720862 DOI: 10.1534/genetics.109.103929] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The combination of QTL mapping studies of synthetic lines and association mapping studies of natural diversity represents an opportunity to throw light on the genetically based variation of quantitative traits. With the positional information provided through quantitative trait locus (QTL) mapping, which often leads to wide intervals encompassing numerous genes, it is now feasible to directly target candidate genes that are likely to be responsible for the observed variation in completely sequenced genomes and to test their effects through association genetics. This approach was performed in grape, a newly sequenced genome, to decipher the genetic architecture of anthocyanin content. Grapes may be either white or colored, ranging from the lightest pink to the darkest purple tones according to the amount of anthocyanin accumulated in the berry skin, which is a crucial trait for both wine quality and human nutrition. Although the determinism of the white phenotype has been fully identified, the genetic bases of the quantitative variation of anthocyanin content in berry skin remain unclear. A single QTL responsible for up to 62% of the variation in the anthocyanin content was mapped on a Syrah x Grenache F(1) pseudo-testcross. Among the 68 unigenes identified in the grape genome within the QTL interval, a cluster of four Myb-type genes was selected on the basis of physiological evidence (VvMybA1, VvMybA2, VvMybA3, and VvMybA4). From a core collection of natural resources (141 individuals), 32 polymorphisms revealed significant association, and extended linkage disequilibrium was observed. Using a multivariate regression method, we demonstrated that five polymorphisms in VvMybA genes except VvMybA4 (one retrotransposon, three single nucleotide polymorphisms and one 2-bp insertion/deletion) accounted for 84% of the observed variation. All these polymorphisms led to either structural changes in the MYB proteins or differences in the VvMybAs promoters. We concluded that the continuous variation in anthocyanin content in grape was explained mainly by a single gene cluster of three VvMybA genes. The use of natural diversity helped to reduce one QTL to a set of five quantitative trait nucleotides and gave a clear picture of how isogenes combined their effects to shape grape color. Such analysis also illustrates how isogenes combine their effect to shape a complex quantitative trait and enables the definition of markers directly targeted for upcoming breeding programs.
Collapse
|
382
|
Lefebvre V, Kiani SP, Durand-Tardif M. A focus on natural variation for abiotic constraints response in the model species Arabidopsis thaliana. Int J Mol Sci 2009; 10:3547-82. [PMID: 20111677 PMCID: PMC2812820 DOI: 10.3390/ijms10083547] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Revised: 08/04/2009] [Accepted: 08/11/2009] [Indexed: 11/30/2022] Open
Abstract
Plants are particularly subject to environmental stress, as they cannot move from unfavourable surroundings. As a consequence they have to react in situ. In any case, plants have to sense the stress, then the signal has to be transduced to engage the appropriate response. Stress response is effected by regulating genes, by turning on molecular mechanisms to protect the whole organism and its components and/or to repair damage. Reactions vary depending on the type of stress and its intensity, but some are commonly turned on because some responses to different abiotic stresses are shared. In addition, there are multiple ways for plants to respond to environmental stress, depending on the species and life strategy, but also multiple ways within a species depending on plant variety or ecotype. It is regularly accepted that populations of a single species originating from diverse geographic origins and/or that have been subjected to different selective pressure, have evolved retaining the best alleles for completing their life cycle. Therefore, the study of natural variation in response to abiotic stress, can help unravel key genes and alleles for plants to cope with their unfavourable physical and chemical surroundings. This review is focusing on Arabidopsis thaliana which has been largely adopted by the global scientific community as a model organism. Also, tools and data that facilitate investigation of natural variation and abiotic stress encountered in the wild are set out. Characterization of accessions, QTLs detection and cloning of alleles responsible for variation are presented.
Collapse
Affiliation(s)
- Valérie Lefebvre
- INRA/IJPB, Genetics and Plant Breeding Laboratory, UR 254, Route de St Cyr, F-78000 Versailles, France; E-Mails:
(V.L.);
(S.P.K.)
| | - Seifollah Poormohammad Kiani
- INRA/IJPB, Genetics and Plant Breeding Laboratory, UR 254, Route de St Cyr, F-78000 Versailles, France; E-Mails:
(V.L.);
(S.P.K.)
| | - Mylène Durand-Tardif
- INRA/IJPB, Genetics and Plant Breeding Laboratory, UR 254, Route de St Cyr, F-78000 Versailles, France; E-Mails:
(V.L.);
(S.P.K.)
| |
Collapse
|
383
|
Fogelqvist J, Niittyvuopio A, Agren J, Savolainen O, Lascoux M. Cryptic population genetic structure: the number of inferred clusters depends on sample size. Mol Ecol Resour 2009; 10:314-23. [PMID: 21565026 DOI: 10.1111/j.1755-0998.2009.02756.x] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Clustering methods have been used extensively to unravel cryptic population genetic structure. We investigated the effect of the number of individuals sampled in each location on the resulting number of clusters. Our study was motivated by recent results in Arabidopsis thaliana: studies in which more than one individual was sampled per location apparently have led to a much higher number of clusters than studies where only one individual was sampled in each location, as is generally done in this species. We show, using computer simulations and microsatellite data in A. thaliana, that the number of sampled individuals indeed has a strong impact on the number of resulting clusters. This effect is smaller if the sampled populations have a hierarchical structure. In most cases, sampling 5-10 individuals per population should be enough. The results argue for abandoning the concept of 'accessions' in partially selfing organisms.
Collapse
Affiliation(s)
- Johan Fogelqvist
- Evolutionary Functional Genomics, Department of Evolution, Genomics and Systematics, Uppsala University, Norbyvägen 18 D, SE-752 36 Uppsala, Sweden Department of Biology, University of Oulu, 90014 Oulu, Finland Plant Ecology, Department of Ecology and Evolution, Uppsala University, Villavägen 14, SE-752 36 Uppsala, Sweden
| | | | | | | | | |
Collapse
|
384
|
Akhunov E, Nicolet C, Dvorak J. Single nucleotide polymorphism genotyping in polyploid wheat with the Illumina GoldenGate assay. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2009; 119:507-17. [PMID: 19449174 PMCID: PMC2715469 DOI: 10.1007/s00122-009-1059-5] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2008] [Accepted: 04/24/2009] [Indexed: 05/18/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are indispensable in such applications as association mapping and construction of high-density genetic maps. These applications usually require genotyping of thousands of SNPs in a large number of individuals. Although a number of SNP genotyping assays are available, most of them are designed for SNP genotyping in diploid individuals. Here, we demonstrate that the Illumina GoldenGate assay could be used for SNP genotyping of homozygous tetraploid and hexaploid wheat lines. Genotyping reactions could be carried out directly on genomic DNA without the necessity of preliminary PCR amplification. A total of 53 tetraploid and 38 hexaploid homozygous wheat lines were genotyped at 96 SNP loci. The genotyping error rate estimated after removal of low-quality data was 0 and 1% for tetraploid and hexaploid wheat, respectively. Developed SNP genotyping assays were shown to be useful for genotyping wheat cultivars. This study demonstrated that the GoldenGate assay is a very efficient tool for high-throughput genotyping of polyploid wheat, opening new possibilities for the analysis of genetic variation in wheat and dissection of genetic basis of complex traits using association mapping approach.
Collapse
Affiliation(s)
- Eduard Akhunov
- Department of Plant Sciences, University of California, Davis, CA 95616, USA.
| | | | | |
Collapse
|
385
|
Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE, Buckler ES. Association mapping: critical considerations shift from genotyping to experimental design. THE PLANT CELL 2009; 21:2194-202. [PMID: 19654263 PMCID: PMC2751942 DOI: 10.1105/tpc.109.068437] [Citation(s) in RCA: 445] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Revised: 07/06/2009] [Accepted: 07/13/2009] [Indexed: 05/18/2023]
Abstract
The goal of many plant scientists' research is to explain natural phenotypic variation in terms of simple changes in DNA sequence. Traditionally, linkage mapping has been the most commonly employed method to reach this goal: experimental crosses are made to generate a family with known relatedness, and attempts are made to identify cosegregation of genetic markers and phenotypes within this family. In vertebrate systems, association mapping (also known as linkage disequilibrium mapping) is increasingly being adopted as the mapping method of choice. Association mapping involves searching for genotype-phenotype correlations in unrelated individuals and often is more rapid and cost-effective than traditional linkage mapping. We emphasize here that linkage and association mapping are complementary approaches and are more similar than is often assumed. Unlike in vertebrates, where controlled crosses can be expensive or impossible (e.g., in humans), the plant scientific community can exploit the advantages of both controlled crosses and association mapping to increase statistical power and mapping resolution. While the time and money required for the collection of genotype data were critical considerations in the past, the increasing availability of inexpensive DNA sequencing and genotyping methods should prompt researchers to shift their attention to experimental design. This review provides thoughts on finding the optimal experimental mix of association mapping using unrelated individuals and controlled crosses to identify the genes underlying phenotypic variation.
Collapse
Affiliation(s)
- Sean Myles
- Institute for Genomic Diversity, Cornell University, Ithaca, New York 14853-2703, USA.
| | | | | | | | | | | | | |
Collapse
|
386
|
Flowers JM, Hanzawa Y, Hall MC, Moore RC, Purugganan MD. Population Genomics of the Arabidopsis thaliana Flowering Time Gene Network. Mol Biol Evol 2009; 26:2475-86. [DOI: 10.1093/molbev/msp161] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
|
387
|
A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet 2009; 5:e1000551. [PMID: 19593375 PMCID: PMC2700969 DOI: 10.1371/journal.pgen.1000551] [Citation(s) in RCA: 367] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 06/08/2009] [Indexed: 12/29/2022] Open
Abstract
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms. Most traits of economic and evolutionary interest vary quantitatively and have multiple genes affecting their expression. Dissecting the genetic basis of such traits is crucial for the improvement of crops and management of diseases. Here, we develop a new resource to identify genes underlying such quantitative traits in Arabidopsis thaliana, a genetic model organism in plants. We show that using a large population of inbred lines derived from intercrossing 19 parents, we can localize the genes underlying quantitative traits better than with existing methods. Using these lines, we were able to replicate the identification of previously known genes that affect developmental traits in A. thaliana and identify some new ones. This paper also presents all the necessary biological and computational material necessary for the scientific community to use these lines in their own research. Our results suggest that the use of lines derived from a multiparent advanced generation inter-cross (MAGIC lines) should be very useful in other organisms.
Collapse
|
388
|
Splicing variation at a FLOWERING LOCUS C homeolog is associated with flowering time variation in the tetraploid Capsella bursa-pastoris. Genetics 2009; 183:337-45. [PMID: 19581451 DOI: 10.1534/genetics.109.103705] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The long-term fates of duplicate genes are well studied both empirically and theoretically, but how the short-term evolution of duplicate genes contributes to phenotypic variation is less well known. Here, we have studied the genetic basis of flowering time variation in the disomic tetraploid Capsella bursa-pastoris. We sequenced four duplicate candidate genes for flowering time and 10 background loci in samples from western Eurasia and China. Using a mixed-model approach that accounts for population structure, we found that polymorphisms at one homeolog of two candidate genes, FLOWERING LOCUS C (FLC) and CRYPTOCHROME1 (CRY1), were associated with natural flowering time variation. No potentially causative polymorphisms were found in the coding region of CRY1; however, at FLC two splice site polymorphisms were associated with early flowering. Accessions harboring nonconsensus splice sites expressed an alternatively spliced transcript or did not express this FLC homeolog. Our results are consistent with the function of FLC as a major repressor of flowering in Arabidopsis thaliana and imply that nonfunctionalization of duplicate genes could provide an important source of phenotypic variation.
Collapse
|
389
|
Abstract
The pathways responsible for flowering time in Arabidopsis thaliana comprise one of the best characterized genetic networks in plants. We harness this extensive molecular genetic knowledge to identify potential flowering time quantitative trait genes (QTGs) through candidate gene association mapping using 51 flowering time loci. We genotyped common single nucleotide polymorphisms (SNPs) at these genes in 275 A. thaliana accessions that were also phenotyped for flowering time and rosette leaf number in long and short days. Using structured association techniques, we find that haplotype-tagging SNPs in 27 flowering time genes show significant associations in various trait/environment combinations. After correction for multiple testing, between 2 and 10 genes remain significantly associated with flowering time, with CO arguably possessing the most promising associations. We also genotyped a subset of these flowering time gene SNPs in an independent recombinant inbred line population derived from the intercrossing of 19 accessions. Approximately one-third of significant polymorphisms that were associated with flowering time in the accessions and genotyped in the outbred population were replicated in both mapping populations, including SNPs at the CO, FLC, VIN3, PHYD, and GA1 loci, and coding region deletions at the FRI gene. We conservatively estimate that approximately 4-14% of known flowering time genes may harbor common alleles that contribute to natural variation in this life history trait.
Collapse
|
390
|
Rossi M, Bitocchi E, Bellucci E, Nanni L, Rau D, Attene G, Papa R. Linkage disequilibrium and population structure in wild and domesticated populations of Phaseolus vulgaris L. Evol Appl 2009; 2:504-22. [PMID: 25567895 PMCID: PMC3352449 DOI: 10.1111/j.1752-4571.2009.00082.x] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 05/24/2009] [Indexed: 01/07/2023] Open
Abstract
Together with the knowledge of the population structure, a critical aspect for the planning of association and/or population genomics studies is the level of linkage disequilibrium (LD) that characterizes the species and the population used for such an analysis. We have analyzed the population structure and LD in wild and domesticated populations of Phaseolus vulgaris L. using amplified fragment length polymorphism markers, most of which were genetically mapped in two recombinant inbred populations. Our results reflect the previous knowledge of the occurrence of two major wild gene pools of P. vulgaris, from which two independent domestication events originated, one in the Andes and one in Mesoamerica. The high level of LD in the whole sample was mostly due to the gene pool structure, with a much higher LD in domesticated compared to wild populations. In relation to association studies, our results also suggest that whole-genome-scan approaches are feasible in the common bean. Interestingly, an excess of inter-chromosomal LD was found in the domesticated populations, which suggests an important role for epistatic selection during domestication. Moreover, our results indicate the occurrence of a strong bottleneck in the Andean wild population before domestication, suggesting a Mesoamerican origin of P. vulgaris. Finally, our data support the occurrence of a single domestication event in Mesoamerica, and the same scenario in the Andes.
Collapse
Affiliation(s)
- Monica Rossi
- Scienze Ambientali e delle Produzioni Vegetali, Università Politecnica delle Marche Ancona, Italy
| | - Elena Bitocchi
- Scienze Ambientali e delle Produzioni Vegetali, Università Politecnica delle Marche Ancona, Italy
| | - Elisa Bellucci
- Scienze Ambientali e delle Produzioni Vegetali, Università Politecnica delle Marche Ancona, Italy
| | - Laura Nanni
- Scienze Ambientali e delle Produzioni Vegetali, Università Politecnica delle Marche Ancona, Italy
| | - Domenico Rau
- Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari Sassari, Italy
| | - Giovanna Attene
- Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari Sassari, Italy
| | - Roberto Papa
- Scienze Ambientali e delle Produzioni Vegetali, Università Politecnica delle Marche Ancona, Italy
| |
Collapse
|
391
|
Abdurakhmonov IY, Saha S, Jenkins JN, Buriev ZT, Shermatov SE, Scheffler BE, Pepper AE, Yu JZ, Kohel RJ, Abdukarimov A. Linkage disequilibrium based association mapping of fiber quality traits in G. hirsutum L. variety germplasm. Genetica 2009; 136:401-17. [PMID: 19067183 DOI: 10.1007/s10709-008-9337-8] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2008] [Accepted: 11/17/2008] [Indexed: 02/08/2023]
Abstract
Cotton is the world's leading cash crop, but it lags behind other major crops for marker-assisted breeding due to limited polymorphisms and a genetic bottleneck through historic domestication. This underlies a need for characterization, tagging, and utilization of existing natural polymorphisms in cotton germplasm collections. Here we report genetic diversity, population characteristics, the extent of linkage disequilibrium (LD), and association mapping of fiber quality traits using 202 microsatellite marker primer pairs in 335 G. hirsutum germplasm grown in two diverse environments, Uzbekistan and Mexico. At the significance threshold (r (2) >or= 0.1), a genome-wide average of LD extended up to genetic distance of 25 cM in assayed cotton variety accessions. Genome wide LD at r (2) >or= 0.2 was reduced to approximately 5-6 cM, providing evidence of the potential for association mapping of agronomically important traits in cotton. Results suggest linkage, selection, inbreeding, population stratification, and genetic drift as the potential LD-generating factors in cotton. In two environments, an average of ~20 SSR markers was associated with each main fiber quality traits using a unified mixed liner model (MLM) incorporating population structure and kinship. These MLM-derived significant associations were confirmed in general linear model and structured association test, accounting for population structure and permutation-based multiple testing. Several common markers, showing the significant associations in both Uzbekistan and Mexican environments, were determined. Between 7 and 43% of the MLM-derived significant associations were supported by a minimum Bayes factor at 'moderate to strong' and 'strong to very strong' evidence levels, suggesting their usefulness for marker-assisted breeding programs and overall effectiveness of association mapping using cotton germplasm resources.
Collapse
Affiliation(s)
- Ibrokhim Y Abdurakhmonov
- Center of Genomic Technologies, Institute of Genetics and Plant Experimental Biology, Academy of Sciences of Uzbekistan, Tashkent, Uzbekistan.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
392
|
Rakovski CS, Stram DO. A kinship-based modification of the armitage trend test to address hidden population structure and small differential genotyping errors. PLoS One 2009; 4:e5825. [PMID: 19503792 PMCID: PMC2688076 DOI: 10.1371/journal.pone.0005825] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Accepted: 05/06/2009] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND/AIMS We propose a modification of the well-known Armitage trend test to address the problems associated with hidden population structure and hidden relatedness in genome-wide case-control association studies. METHODS The new test adopts beneficial traits from three existing testing strategies: the principal components, mixed model, and genomic control while avoiding some of their disadvantageous characteristics, such as the tendency of the principal components method to over-correct in certain situations or the failure of the genomic control approach to reorder the adjusted tests based on their degree of alignment with the underlying hidden structure. The new procedure is based on Gauss-Markov estimators derived from a straightforward linear model with an imposed variance structure proportional to an empirical relatedness matrix. Lastly, conceptual and analytical similarities to and distinctions from other approaches are emphasized throughout. RESULTS Our simulations show that the power performance of the proposed test is quite promising compared to the considered competing strategies. The power gains are especially large when small differential differences between cases and controls are present; a likely scenario when public controls are used in multiple studies. CONCLUSION The proposed modified approach attains high power more consistently than that of the existing commonly implemented tests. Its performance improvement is most apparent when small but detectable systematic differences between cases and controls exist.
Collapse
Affiliation(s)
- Cyril S Rakovski
- Department of Mathematics and Computer Science, Chapman University, Orange, California, United States of America.
| | | |
Collapse
|
393
|
Comadran J, Thomas WTB, van Eeuwijk FA, Ceccarelli S, Grando S, Stanca AM, Pecchioni N, Akar T, Al-Yassin A, Benbelkacem A, Ouabbou H, Bort J, Romagosa I, Hackett CA, Russell JR. Patterns of genetic diversity and linkage disequilibrium in a highly structured Hordeum vulgare association-mapping population for the Mediterranean basin. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2009; 119:175-87. [PMID: 19415228 DOI: 10.1007/s00122-009-1027-0] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2008] [Accepted: 03/25/2009] [Indexed: 05/18/2023]
Abstract
Population structure and genome-wide linkage disequilibrium (LD) were investigated in 192 Hordeum vulgare accessions providing a comprehensive coverage of past and present barley breeding in the Mediterranean basin, using 50 nuclear microsatellite and 1,130 DArT((R)) markers. Both clustering and principal coordinate analyses clearly sub-divided the sample into five distinct groups centred on key ancestors and regions of origin of the germplasm. For given genetic distances, large variation in LD values was observed, ranging from closely linked markers completely at equilibrium to marker pairs at 50 cM separation still showing significant LD. Mean LD values across the whole population sample decayed below r (2) of 0.15 after 3.2 cM. By assaying 1,130 genome-wide DArT((R)) markers, we demonstrated that, after accounting for population substructure, current genome coverage of 1 marker per 1.5 cM except for chromosome 4H with 1 marker per 3.62 cM is sufficient for whole genome association scans. We show, by identifying associations with powdery mildew that map in genomic regions known to have resistance loci, that associations can be detected in strongly stratified samples provided population structure is effectively controlled in the analysis. The population we describe is, therefore, shown to be a valuable resource, which can be used in basic and applied research in barley.
Collapse
Affiliation(s)
- Jordi Comadran
- Genetics Programme, Scottish Crop Research Institute (SCRI), Invergowrie, Dundee, Scotland, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
394
|
Baxter I. Ionomics: studying the social network of mineral nutrients. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:381-6. [PMID: 19481970 PMCID: PMC2701637 DOI: 10.1016/j.pbi.2009.05.002] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2008] [Revised: 04/30/2009] [Accepted: 05/04/2009] [Indexed: 05/18/2023]
Abstract
The accumulation of a given element is a complex process controlled by a network of gene products critical for uptake, binding, transportation, and sequestration. Many of these genes and physiological processes affect more than one element. Therefore, to understand how elements are regulated, it is necessary to measure as many of the elements contained in a cell, tissue, or organism (the ionome) as possible. The elements that share components of their network vary depending on the species and genotype of the plants that are studied and environment they are grown in. Several recent papers describe high-throughput elemental profiling studies of how the ionome responds to the environment or explores the genetics that control the ionome. When combined with new genotyping technologies, ionomics provides a rapid way to identify genes that control elemental accumulation in plants.
Collapse
Affiliation(s)
- Ivan Baxter
- Bindley Bioscience Center, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
395
|
Abstract
Highly recombinant populations derived from inbred lines, such as advanced intercross lines and heterogeneous stocks, can be used to map loci far more accurately than is possible with standard intercrosses. However, the varying degrees of relatedness that exist between individuals complicate analysis, potentially leading to many false positive signals. We describe a method to deal with these problems that does not require pedigree information and accounts for model uncertainty through model averaging. In our method, we select multiple quantitative trait loci (QTL) models using forward selection applied to resampled data sets obtained by nonparametric bootstrapping and subsampling. We provide model-averaged statistics about the probability of loci or of multilocus regions being included in model selection, and this leads to more accurate identification of QTL than by single-locus mapping. The generality of our approach means it can potentially be applied to any population of unknown structure.
Collapse
|
396
|
Correcting for relatedness in Bayesian models for genomic data association analysis. Heredity (Edinb) 2009; 103:223-37. [PMID: 19455182 DOI: 10.1038/hdy.2009.56] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
For small pedigrees, the issue of correcting for known or estimated relatedness structure in population-based Bayesian multilocus association analysis is considered. Two such relatedness corrections: [1] a random term arising from the infinite polygenic model and [2] a fixed covariate following the class D model of Bonney, are compared with the case of no correction using both simulated and real marker and gene-expression data from lymphoblastoid cell lines from four CEPH families. This comparison is performed with clinical quantitative trait locus (cQTL) models-multilocus association models where marker data and expression levels of gene transcripts as well as possible genotype x expression interaction terms are jointly used to explain quantitative trait variation. We found out that regardless of having a correction term in the model, the cQTL-models fit a few extra small-effect components (similar to finite polygenic models) which itself serves as a relatedness correction. For small data and small heritability one may use the covariate model, which clearly outperforms the infinite polygenic model in small data examples.
Collapse
|
397
|
Nonmetric multidimensional scaling corrects for population structure in association mapping with different sample types. Genetics 2009; 182:875-88. [PMID: 19414565 DOI: 10.1534/genetics.108.098863] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent research has developed various promising methods to control for population structure in genomewide association mapping of complex traits, but systematic examination of how well these methods perform under different genetic scenarios is still lacking. Appropriate methods for controlling genetic relationships among individuals need to balance the concern of false positives and statistical power, which can vary for different association sample types. We used a series of simulated samples and empirical data sets from cross- and self-pollinated species to demonstrate the performance of several contemporary methods in correcting for different types of genetic relationships encountered in association analysis. We proposed a two-stage dimension determination approach for both principal component analysis and nonmetric multidimensional scaling (nMDS) to capture the major structure pattern in association mapping samples. Our results showed that by exploiting both genotypic and phenotypic information, this two-stage dimension determination approach balances the trade-off between data fit and model complexity, resulting in an effective reduction in false positive rate with minimum loss in statistical power. Further, the nMDS technique of correcting for genetic relationship proved to be a powerful complement to other existing methods. Our findings highlight the significance of appropriate application of different statistical methods for dealing with complex genetic relationships in various genomewide association studies.
Collapse
|
398
|
Manicacci D, Camus-Kulandaivelu L, Fourmann M, Arar C, Barrault S, Rousselet A, Feminias N, Consoli L, Francès L, Méchin V, Murigneux A, Prioul JL, Charcosset A, Damerval C. Epistatic interactions between Opaque2 transcriptional activator and its target gene CyPPDK1 control kernel trait variation in maize. PLANT PHYSIOLOGY 2009; 150:506-20. [PMID: 19329568 PMCID: PMC2675748 DOI: 10.1104/pp.108.131888] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2008] [Accepted: 03/23/2009] [Indexed: 05/18/2023]
Abstract
Association genetics is a powerful method to track gene polymorphisms responsible for phenotypic variation, since it takes advantage of existing collections and historical recombination to study the correlation between large genetic diversity and phenotypic variation. We used a collection of 375 maize (Zea mays ssp. mays) inbred lines representative of tropical, American, and European diversity, previously characterized for genome-wide neutral markers and population structure, to investigate the roles of two functionally related candidate genes, Opaque2 and CyPPDK1, on kernel quality traits. Opaque2 encodes a basic leucine zipper transcriptional activator specifically expressed during endosperm development that controls the transcription of many target genes, including CyPPDK1, which encodes a cytosolic pyruvate orthophosphate dikinase. Using statistical models that correct for population structure and individual kinship, Opaque2 polymorphism was found to be strongly associated with variation of the essential amino acid lysine. This effect could be due to the direct role of Opaque2 on either zein transcription, zeins being major storage proteins devoid of lysine, or lysine degradation through the activation of lysine ketoglutarate reductase. Moreover, we found that a polymorphism in the Opaque2 coding sequence and several polymorphisms in the CyPPDK1 promoter nonadditively interact to modify both lysine content and the protein-versus-starch balance, thus revealing the role in quantitative variation in plants of epistatic interactions between a transcriptional activator and one of its target genes.
Collapse
Affiliation(s)
- Domenica Manicacci
- University Paris-Sud, UMR 0320/UMR 8120 Génétique Végétale, F-91190 Gif sur Yvette, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
399
|
Waugh R, Jannink JL, Muehlbauer GJ, Ramsay L. The emergence of whole genome association scans in barley. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:218-22. [PMID: 19185530 DOI: 10.1016/j.pbi.2008.12.007] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2008] [Revised: 12/15/2008] [Accepted: 12/15/2008] [Indexed: 05/19/2023]
Abstract
Barley geneticists are currently using association genetics to identify and fine map traits directly in elite plant breeding material. This has been made possible by the development of a highly parallel SNP assay platform that provides sufficient marker density for genome-wide scans and linkage disequilibrium-led gene identification. By leveraging the combined resources of the barley research and breeding sectors, marker-trait associations are being identified and a renewed interest has emerged in novel strategies for barley improvement. New database and visualization tools have been developed and statistical methods adapted from human genetics to account for complexities in the datasets. Exciting early results suggest that association genetics will assume a central role in establishing genotype-to-phenotype relationships.
Collapse
Affiliation(s)
- Robbie Waugh
- Genetics, SCRI, Invergowrie, Dundee DD2 5DA, Scotland.
| | | | | | | |
Collapse
|
400
|
Jansen RC, Tesson BM, Fu J, Yang Y, McIntyre LM. Defining gene and QTL networks. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:241-246. [PMID: 19196544 DOI: 10.1016/j.pbi.2009.01.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Revised: 01/06/2009] [Accepted: 01/06/2009] [Indexed: 05/27/2023]
Abstract
Current technologies for high-throughput molecular profiling of large numbers of genetically different individuals offer great potential for elucidating the genotype-to-phenotype relationship. Variation in molecular and phenotypic traits can be correlated to DNA sequence variation using the methods of quantitative trait locus (QTL) mapping. In addition, the correlation structure in the molecular and phenotypic traits can be informative for inferring the underlying molecular networks. For this, new methods are emerging to distinguish among causality, reactivity, or independence of traits based upon logic involving underlying QTL. These methods are becoming increasingly popular in plant genetic studies as well as in studies on many other organisms.
Collapse
Affiliation(s)
- Ritsert C Jansen
- Groningen Bioinformatics Centre, University of Groningen, The Netherlands
| | | | | | | | | |
Collapse
|