1
|
Genotyping Platforms for Genome-Wide Association Studies: Options and Practical Considerations. Methods Mol Biol 2022; 2481:29-42. [PMID: 35641757 DOI: 10.1007/978-1-0716-2237-7_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Genome-wide association studies (GWAS) in crops requires genotyping platforms that are capable of producing accurate high density genotyping data on hundreds of plants in a cost-effective manner. Currently there are multiple commercial platforms available that are being effectively used across crops. These platforms include genotyping arrays such as the Illumina Infinium arrays and the Applied Biosystems Axiom Arrays along with a variety of resequencing methods. These methods are being used to genotype tens of thousands of markers up to millions of markers on GWAS panels. They are being used on crops with simple genomes to crops with very complex, large, polyploid genomes. Depending on the crop and the goal of the GWAS, there are several options and practical considerations to take into account when selecting a genotyping technology to ensure that the right coverage, accuracy, and cost for the study is achieved.
Collapse
|
2
|
A bumper crop of SNPs in soybean through high-density genotyping-by-sequencing (HD-GBS). PLANT BIOTECHNOLOGY JOURNAL 2021; 19:860-862. [PMID: 33476468 PMCID: PMC8131051 DOI: 10.1111/pbi.13551] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 01/14/2021] [Indexed: 05/03/2023]
|
3
|
Comparing a Mixed Model Approach to Traditional Stability Estimators for Mapping Genotype by Environment Interactions and Yield Stability in Soybean [ Glycine max (L.) Merr.]. FRONTIERS IN PLANT SCIENCE 2021; 12:630175. [PMID: 33868333 PMCID: PMC8044453 DOI: 10.3389/fpls.2021.630175] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 03/11/2021] [Indexed: 06/12/2023]
Abstract
Identifying genetic loci associated with yield stability has helped plant breeders and geneticists begin to understand the role and influence of genotype by environment (GxE) interactions in soybean [Glycine max (L.) Merr.] productivity, as well as other crops. Quantifying a genotype's range of performance across testing locations has been developed over decades with dozens of methodologies available. This includes directly modeling GxE interactions as part of an overall model for yield, as well as methods which generate overall yield "stability" values from multi-environment trial data. Correspondence between these methods as it pertains to the outcomes of genome wide association studies (GWAS) has not been well defined. In this study, the GWAS results for yield and yield stability were compared in 213 soybean lines across 11 environments to determine their utility and potential intersection. Both univariate and multivariate conventional stability estimates were considered alongside a mixed model for yield that fit marker by environment interactions as a random effect. One-hundred and six total QTL were discovered across all mapping results, however, genetic loci that were significant in the mixed model for grain yield that fit marker by environment interactions were completely distinct from those that were significant when mapping using traditional stability measures as a phenotype. Furthermore, 73.21% of QTL discovered in the mixed model were determined to cause a crossover interaction effect which cause genotype rank changes between environments. Overall, the QTL discovered via explicitly mapping GxE interactions also explained more yield variance that those QTL associated with differences in traditional stability estimates making their theoretical impact on selection greater. A lack of intersecting results between mapping approaches highlights the importance of examining stability in multiple contexts when attempting to manipulate GxE interactions in soybean.
Collapse
|
4
|
Generating High Density, Low Cost Genotype Data in Soybean [ Glycine max (L.) Merr.]. G3 (BETHESDA, MD.) 2019; 9:2153-2160. [PMID: 31072870 PMCID: PMC6643887 DOI: 10.1534/g3.119.400093] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 05/01/2019] [Indexed: 11/18/2022]
Abstract
Obtaining genome-wide genotype information for millions of SNPs in soybean [Glycine max (L.) Merr.] often involves completely resequencing a line at 5X or greater coverage. Currently, hundreds of soybean lines have been resequenced at high depth levels with their data deposited in the NCBI Short Read Archive. This publicly available dataset may be leveraged as an imputation reference panel in combination with skim (low coverage) sequencing of new soybean genotypes to economically obtain high-density SNP information. Ninety-nine soybean lines resequenced at an average of 17.1X were used to generate a reference panel, with over 10 million SNPs called using GATK's Haplotype Caller tool. Whole genome resequencing at approximately 1X depth was performed on 114 previously ungenotyped experimental soybean lines. Coverages down to 0.1X were analyzed by randomly subsetting raw reads from the original 1X sequence data. SNPs discovered in the reference panel were genotyped in the experimental lines after aligning to the soybean reference genome, and missing markers imputed using Beagle 4.1. Sequencing depth of the experimental lines could be reduced to 0.3X while still retaining an accuracy of 97.8%. Accuracy was inversely related to minor allele frequency, and highly correlated with marker linkage disequilibrium. The high accuracy of skim sequencing combined with imputation provides a low cost method for obtaining dense genotypic information that can be used for various genomics applications in soybean.
Collapse
|
5
|
Identifying and exploring significant genomic regions associated with soybean yield, seed fatty acids, protein and oil. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s12892-017-0020-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
6
|
Genome-wide Association Mapping of Qualitatively Inherited Traits in a Germplasm Collection. THE PLANT GENOME 2017; 10. [PMID: 28724068 DOI: 10.3835/plantgenome2016.06.0054] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 02/15/2017] [Indexed: 06/07/2023]
Abstract
Genome-wide association (GWA) has been used as a tool for dissecting the genetic architecture of quantitatively inherited traits. We demonstrate here that GWA can also be highly useful for detecting many major genes governing categorically defined phenotype variants that exist for qualitatively inherited traits in a germplasm collection. Genome-wide association mapping was applied to categorical phenotypic data available for 10 descriptive traits in a collection of ∼13,000 soybean [ (L.) Merr.] accessions that had been genotyped with a 50,000 single nucleotide polymorphism (SNP) chip. A GWA on a panel of accessions of this magnitude can offer substantial statistical power and mapping resolution, and we found that GWA mapping resulted in the identification of strong SNP signals for 24 classical genes as well as several heretofore unknown genes controlling the phenotypic variants in those traits. Because some of these genes had been cloned, we were able to show that the narrow GWA mapping SNP signal regions that we detected for the phenotypic variants had chromosomal bp spans that, with just one exception, overlapped the bp region of the cloned genes, despite local variation in SNP number and nonuniform SNP distribution in the chip set.
Collapse
|
7
|
Molecular Characterization of Resistance to Soybean Rust (Phakopsora pachyrhizi Syd. & Syd.) in Soybean Cultivar DT 2000 (PI 635999). PLoS One 2016; 11:e0164493. [PMID: 27935940 PMCID: PMC5147787 DOI: 10.1371/journal.pone.0164493] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 09/26/2016] [Indexed: 11/18/2022] Open
Abstract
Resistance to soybean rust (SBR), caused by Phakopsora pachyrhizi Syd. & Syd., has been identified in many soybean germplasm accessions and is conferred by either dominant or recessive genes that have been mapped to six independent loci (Rpp1 -Rpp6), but No U.S. cultivars are resistant to SBR. The cultivar DT 2000 (PI 635999) has resistance to P. pachyrhizi isolates and field populations from the United States as well as Vietnam. A F6:7 recombinant inbred line (RIL) population derived from Williams 82 × DT 2000 was used to identify genomic regions associated with resistance to SBR in the field in Ha Noi, Vietnam, and in Quincy, Florida, in 2008. Bulked segregant analysis (BSA) was conducted using the soybean single nucleotide polymorphism (SNP) USLP 1.0 panel along with simple sequence repeat (SSR) markers to detect regions of the genome associated with resistance. BSA identified four BARC_SNP markers near the Rpp3 locus on chromosome (Chr.) 6. Genetic analysis identified an additional genomic region around the Rpp4 locus on Chr. 18 that was significantly associated with variation in the area under disease progress curve (AUDPC) values and sporulation in Vietnam. Molecular markers tightly linked to the DT 2000 resistance alleles on Chrs. 6 and 18 will be useful for marker-assisted selection and backcrossing in order to pyramid these genes with other available SBR resistance genes to develop new varieties with enhanced and durable resistance to SBR.
Collapse
|
8
|
Multi-Population Selective Genotyping to Identify Soybean [Glycine max (L.) Merr.] Seed Protein and Oil QTLs. G3 (BETHESDA, MD.) 2016; 6:1635-48. [PMID: 27172185 PMCID: PMC4889660 DOI: 10.1534/g3.116.027656] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 03/25/2016] [Indexed: 02/03/2023]
Abstract
Plant breeders continually generate ever-higher yielding cultivars, but also want to improve seed constituent value, which is mainly protein and oil, in soybean [Glycine max (L.) Merr.]. Identification of genetic loci governing those two traits would facilitate that effort. Though genome-wide association offers one such approach, selective genotyping of multiple biparental populations offers a complementary alternative, and was evaluated here, using 48 F2:3 populations (n = ∼224 plants) created by mating 48 high protein germplasm accessions to cultivars of similar maturity, but with normal seed protein content. All F2:3 progeny were phenotyped for seed protein and oil, but only 22 high and 22 low extreme progeny in each F2:3 phenotypic distribution were genotyped with a 1536-SNP chip (ca 450 bimorphic SNPs detected per mating). A significant quantitative trait locus (QTL) on one or more chromosomes was detected for protein in 35 (73%), and for oil in 25 (52%), of the 48 matings, and these QTL exhibited additive effects of ≥ 4 g kg(-1) and R(2) values of 0.07 or more. These results demonstrated that a multiple-population selective genotyping strategy, when focused on matings between parental phenotype extremes, can be used successfully to identify germplasm accessions possessing large-effect QTL alleles. Such accessions would be of interest to breeders to serve as parental donors of those alleles in cultivar development programs, though 17 of the 48 accessions were not unique in terms of SNP genotype, indicating that diversity among high protein accessions in the germplasm collection is less than what might ordinarily be assumed.
Collapse
|
9
|
Construction of high resolution genetic linkage maps to improve the soybean genome sequence assembly Glyma1.01. BMC Genomics 2016; 17:33. [PMID: 26739042 PMCID: PMC4704267 DOI: 10.1186/s12864-015-2344-0] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2015] [Accepted: 12/21/2015] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND A landmark in soybean research, Glyma1.01, the first whole genome sequence of variety Williams 82 (Glycine max L. Merr.) was completed in 2010 and is widely used. However, because the assembly was primarily built based on the linkage maps constructed with a limited number of markers and recombinant inbred lines (RILs), the assembled sequence, especially in some genomic regions with sparse numbers of anchoring markers, needs to be improved. Molecular markers are being used by researchers in the soybean community, however, with the updating of the Glyma1.01 build based on the high-resolution linkage maps resulting from this research, the genome positions of these markers need to be mapped. RESULTS Two high density genetic linkage maps were constructed based on 21,478 single nucleotide polymorphism loci mapped in the Williams 82 x G. soja (Sieb. & Zucc.) PI479752 population with 1083 RILs and 11,922 loci mapped in the Essex x Williams 82 population with 922 RILs. There were 37 regions or single markers where marker order in the two populations was in agreement but was not consistent with the physical position in the Glyma1.01 build. In addition, 28 previously unanchored scaffolds were positioned. Map data were used to identify false joins in the Glyma1.01 assembly and the corresponding scaffolds were broken and reassembled to the new assembly, Wm82.a2.v1. Based upon the plots of the genetic on physical distance of the loci, the euchromatic and heterochromatic regions along each chromosome in the new assembly were delimited. Genomic positions of the commonly used markers contained in BARCSOYSSR_1.0 database and the SoySNP50K BeadChip were updated based upon the Wm82.a2.v1 assembly. CONCLUSIONS The information will facilitate the study of recombination hot spots in the soybean genome, identification of genes or quantitative trait loci controlling yield, seed quality and resistance to biotic or abiotic stresses as well as other genetic or genomic research.
Collapse
|
10
|
SNP Assay Development for Linkage Map Construction, Anchoring Whole-Genome Sequence, and Other Genetic and Genomic Applications in Common Bean. G3 (BETHESDA, MD.) 2015; 5:2285-90. [PMID: 26318155 PMCID: PMC4632048 DOI: 10.1534/g3.115.020594] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 08/27/2015] [Indexed: 11/28/2022]
Abstract
A total of 992,682 single-nucleotide polymorphisms (SNPs) was identified as ideal for Illumina Infinium II BeadChip design after sequencing a diverse set of 17 common bean (Phaseolus vulgaris L) varieties with the aid of next-generation sequencing technology. From these, two BeadChips each with >5000 SNPs were designed. The BARCBean6K_1 BeadChip was selected for the purpose of optimizing polymorphism among market classes and, when possible, SNPs were targeted to sequence scaffolds in the Phaseolus vulgaris 14× genome assembly with sequence lengths >10 kb. The BARCBean6K_2 BeadChip was designed with the objective of anchoring additional scaffolds and to facilitate orientation of large scaffolds. Analysis of 267 F2 plants from a cross of varieties Stampede × Red Hawk with the two BeadChips resulted in linkage maps with a total of 7040 markers including 7015 SNPs. With the linkage map, a total of 432.3 Mb of sequence from 2766 scaffolds was anchored to create the Phaseolus vulgaris v1.0 assembly, which accounted for approximately 89% of the 487 Mb of available sequence scaffolds of the Phaseolus vulgaris v0.9 assembly. A core set of 6000 SNPs (BARCBean6K_3 BeadChip) with high genotyping quality and polymorphism was selected based on the genotyping of 365 dry bean and 134 snap bean accessions with the BARCBean6K_1 and BARCBean6K_2 BeadChips. The BARCBean6K_3 BeadChip is a useful tool for genetics and genomics research and it is widely used by breeders and geneticists in the United States and abroad.
Collapse
|
11
|
Fingerprinting Soybean Germplasm and Its Utility in Genomic Research. G3 (BETHESDA, MD.) 2015; 5:1999-2006. [PMID: 26224783 PMCID: PMC4592982 DOI: 10.1534/g3.115.019000] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Accepted: 07/23/2015] [Indexed: 12/31/2022]
Abstract
The United States Department of Agriculture, Soybean Germplasm Collection includes 18,480 domesticated soybean and 1168 wild soybean accessions introduced from 84 countries or developed in the United States. This collection was genotyped with the SoySNP50K BeadChip containing greater than 50K single-nucleotide polymorphisms. Redundant accessions were identified in the collection, and distinct genetic backgrounds of soybean from different geographic origins were observed that could be a unique resource for soybean genetic improvement. We detected a dramatic reduction of genetic diversity based on linkage disequilibrium and haplotype structure analyses of the wild, landrace, and North American cultivar populations and identified candidate regions associated with domestication and selection imposed by North American breeding. We constructed the first soybean haplotype block maps in the wild, landrace, and North American cultivar populations and observed that most recombination events occurred in the regions between haplotype blocks. These haplotype maps are crucial for association mapping aimed at the identification of genes controlling traits of economic importance. A case-control association test delimited potential genomic regions along seven chromosomes that most likely contain genes controlling seed weight in domesticated soybean. The resulting dataset will facilitate germplasm utilization, identification of genes controlling important traits, and will accelerate the creation of soybean varieties with improved seed yield and quality.
Collapse
|
12
|
QTL for seed protein and amino acids in the Benning × Danbaekkong soybean population. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:839-50. [PMID: 25673144 DOI: 10.1007/s00122-015-2474-4] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 01/31/2015] [Indexed: 05/08/2023]
Abstract
KEY MESSAGE We identified QTL associated with protein and amino acids in a soybean mapping population that was grown in five environments. These QTL could be used in MAS to improve these traits. Soybean, rather than nitrogen-containing forages, is the primary source of quality protein in feed formulations for domestic swine, poultry, and dairy industries. As a sole dietary source of protein, soybean is deficient in the amino acids lysine (Lys), threonine (Thr), methionine (Met), and cysteine (Cys). Increasing these amino acids would benefit the feed industry. The objective of the present study was to identify quantitative trait loci (QTL) associated with crude protein (cp) and amino acids in the 'Benning' × 'Danbaekkong' population. The population was grown in five southern USA environments. Amino acid concentrations as a fraction of cp (Lys/cp, Thr/cp, Met/cp, Cys/cp, and Met + Cys/cp) were determined by near-infrared reflectance spectroscopy. Four QTL associated with the variation in crude protein were detected on chromosomes (Chr) 14, 15, 17, and 20, of which, a QTL on Chr 20 explained 55 % of the phenotypic variation. In the same chromosomal region, QTL for Lys/cp, Thr/cp, Met/cp, Cys/cp and Met + Cys/cp were detected. At these QTL, the Danbaekkong allele resulted in reduced levels of these amino acids and increased protein concentration. Two additional QTL for Lys/cp were detected on Chr 08 and 20, and three QTL for Thr/cp on Chr 01, 09, and 17. Three QTL were identified on Chr 06, 09 and 10 for Met/cp, and one QTL was found for Cys/cp on Chr 10. The study provides information concerning the relationship between crude protein and levels of essential amino acids and may allow for the improvement of these traits in soybean using marker-assisted selection.
Collapse
|
13
|
Identification and validation of quantitative trait loci for seed yield, oil and protein contents in two recombinant inbred line populations of soybean. Mol Genet Genomics 2014; 289:935-49. [PMID: 24861102 DOI: 10.1007/s00438-014-0865-x] [Citation(s) in RCA: 64] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Accepted: 05/06/2014] [Indexed: 11/30/2022]
Abstract
Soybean seeds contain high levels of oil and protein, and are the important sources of vegetable oil and plant protein for human consumption and livestock feed. Increased seed yield, oil and protein contents are the main objectives of soybean breeding. The objectives of this study were to identify and validate quantitative trait loci (QTLs) associated with seed yield, oil and protein contents in two recombinant inbred line populations, and to evaluate the consistency of QTLs across different environments, studies and genetic backgrounds. Both the mapping population (SD02-4-59 × A02-381100) and validation population (SD02-911 × SD00-1501) were phenotyped for the three traits in multiple environments. Genetic analysis indicated that oil and protein contents showed high heritabilities while yield exhibited a lower heritability in both populations. Based on a linkage map constructed previously with the mapping population and using composite interval mapping and/or interval mapping analysis, 12 QTLs for seed yield, 16 QTLs for oil content and 11 QTLs for protein content were consistently detected in multiple environments and/or the average data over all environments. Of the QTLs detected in the mapping population, five QTLs for seed yield, eight QTLs for oil content and five QTLs for protein content were confirmed in the validation population by single marker analysis in at least one environment and the average data and by ANOVA over all environments. Eight of these validated QTLs were newly identified. Compared with the other studies, seven QTLs for seed yield, eight QTLs for oil content and nine QTLs for protein content further verified the previously reported QTLs. These QTLs will be useful for breeding higher yield and better quality cultivars, and help effectively and efficiently improve yield potential and nutritional quality in soybean.
Collapse
|
14
|
A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 2014; 46:707-13. [PMID: 24908249 PMCID: PMC7048698 DOI: 10.1038/ng.3008] [Citation(s) in RCA: 690] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 05/15/2014] [Indexed: 01/13/2023]
Abstract
Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.
Collapse
|
15
|
A genome-wide association study of seed protein and oil content in soybean. BMC Genomics 2014; 15:1. [PMID: 24382143 PMCID: PMC3890527 DOI: 10.1186/1471-2164-15-1] [Citation(s) in RCA: 327] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 12/21/2013] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Association analysis is an alternative to conventional family-based methods to detect the location of gene(s) or quantitative trait loci (QTL) and provides relatively high resolution in terms of defining the genome position of a gene or QTL. Seed protein and oil concentration are quantitative traits which are determined by the interaction among many genes with small to moderate genetic effects and their interaction with the environment. In this study, a genome-wide association study (GWAS) was performed to identify quantitative trait loci (QTL) controlling seed protein and oil concentration in 298 soybean germplasm accessions exhibiting a wide range of seed protein and oil content. RESULTS A total of 55,159 single nucleotide polymorphisms (SNPs) were genotyped using various methods including Illumina Infinium and GoldenGate assays and 31,954 markers with minor allele frequency >0.10 were used to estimate linkage disequilibrium (LD) in heterochromatic and euchromatic regions. In euchromatic regions, the mean LD (r2) rapidly declined to 0.2 within 360 Kbp, whereas the mean LD declined to 0.2 at 9,600 Kbp in heterochromatic regions. The GWAS results identified 40 SNPs in 17 different genomic regions significantly associated with seed protein. Of these, the five SNPs with the highest associations and seven adjacent SNPs were located in the 27.6-30.0 Mbp region of Gm20. A major seed protein QTL has been previously mapped to the same location and potential candidate genes have recently been identified in this region. The GWAS results also detected 25 SNPs in 13 different genomic regions associated with seed oil. Of these markers, seven SNPs had a significant association with both protein and oil. CONCLUSIONS This research indicated that GWAS not only identified most of the previously reported QTL controlling seed protein and oil, but also resulted in narrower genomic regions than the regions reported as containing these QTL. The narrower GWAS-defined genome regions will allow more precise marker-assisted allele selection and will expedite positional cloning of the causal gene(s).
Collapse
|
16
|
Genetic Mapping and Confirmation of Quantitative Trait Loci for Seed Protein and Oil Contents and Seed Weight in Soybean. CROP SCIENCE 2013; 53:765-774. [PMID: 0 DOI: 10.2135/cropsci2012.03.0153] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
|
17
|
Development and evaluation of SoySNP50K, a high-density genotyping array for soybean. PLoS One 2013; 8:e54985. [PMID: 23372807 PMCID: PMC3555945 DOI: 10.1371/journal.pone.0054985] [Citation(s) in RCA: 319] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 12/18/2012] [Indexed: 12/13/2022] Open
Abstract
The objective of this research was to identify single nucleotide polymorphisms (SNPs) and to develop an Illumina Infinium BeadChip that contained over 50,000 SNPs from soybean (Glycine max L. Merr.). A total of 498,921,777 reads 35-45 bp in length were obtained from DNA sequence analysis of reduced representation libraries from several soybean accessions which included six cultivated and two wild soybean (G. soja Sieb. et Zucc.) genotypes. These reads were mapped to the soybean whole genome sequence and 209,903 SNPs were identified. After applying several filters, a total of 146,161 of the 209,903 SNPs were determined to be ideal candidates for Illumina Infinium II BeadChip design. To equalize the distance between selected SNPs, increase assay success rate, and minimize the number of SNPs with low minor allele frequency, an iteration algorithm based on a selection index was developed and used to select 60,800 SNPs for Infinium BeadChip design. Of the 60,800 SNPs, 50,701 were targeted to euchromatic regions and 10,000 to heterochromatic regions of the 20 soybean chromosomes. In addition, 99 SNPs were targeted to unanchored sequence scaffolds. Of the 60,800 SNPs, a total of 52,041 passed Illumina's manufacturing phase to produce the SoySNP50K iSelect BeadChip. Validation of the SoySNP50K chip with 96 landrace genotypes, 96 elite cultivars and 96 wild soybean accessions showed that 47,337 SNPs were polymorphic and generated successful SNP allele calls. In addition, 40,841 of the 47,337 SNPs (86%) had minor allele frequencies ≥ 10% among the landraces, elite cultivars and the wild soybean accessions. A total of 620 and 42 candidate regions which may be associated with domestication and recent selection were identified, respectively. The SoySNP50K iSelect SNP beadchip will be a powerful tool for characterizing soybean genetic diversity and linkage disequilibrium, and for constructing high resolution linkage maps to improve the soybean whole genome sequence assembly.
Collapse
|
18
|
Identification of positive yield QTL alleles from exotic soybean germplasm in two backcross populations. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:1353-69. [PMID: 22869284 DOI: 10.1007/s00122-012-1944-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 07/15/2012] [Indexed: 05/05/2023]
Abstract
Increasing seed yield is an important breeding goal of soybean [Glycine max (L.) Merr.] improvement efforts. Due to the small number of ancestors and subsequent breeding and selection, the genetic base of current soybean cultivars in North America is narrow. The objective of this study was to map quantitative trait loci (QTL) in two backcross populations developed using soybean plant introductions as donor parents. The first population included 116 BC(2)F(3)-derived lines developed using "Elgin" as the recurrent parent and PI 436684 as the donor parent (E population). The second population included 93 BC(3)F(3)-derived lines developed with "Williams 82" as the recurrent parent and PI 90566-1 as the donor parent (W population). The two populations were evaluated with 1,536 SNP markers and during 2 years for seed yield and other agronomic traits. Genotypic and phenotypic data were analyzed using the programs MapQTL and QTLNetwork to identify major QTL and epistatic QTL. In the E population, two yield QTL were identified by both MapQTL and QTLNetwork, and the PI 436684 alleles were associated with yield increases. In the W population, a QTL allele from PI 90566-1 accounted for 30 % of the yield variation; however, the PI region was also associated with later maturity and shorter plant height. No epistasis for seed yield was identified in either population. No yield QTL was previously reported at the regions where these QTL map indicating that exotic germplasm can be a source of new alleles that can improve soybean yield.
Collapse
|
19
|
Molecular mapping of soybean rust resistance in soybean accession PI 561356 and SNP haplotype analysis of the Rpp1 region in diverse germplasm. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:1339-52. [PMID: 22837016 DOI: 10.1007/s00122-012-1932-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2012] [Accepted: 06/28/2012] [Indexed: 05/08/2023]
Abstract
Soybean rust (SBR), caused by Phakopsora pachyrhizi Sydow, is one of the most economically important and destructive diseases of soybean [Glycine max (L.) Merr.] and the discovery of novel SBR resistance genes is needed because of virulence diversity in the pathogen. The objectives of this research were to map SBR resistance in plant introduction (PI) 561356 and to identify single nucleotide polymorphism (SNP) haplotypes within the region on soybean chromosome 18 where the SBR resistance gene Rpp1 maps. One-hundred F(2:3) lines derived from a cross between PI 561356 and the susceptible experimental line LD02-4485 were genotyped with genetic markers and phenotyped for resistance to P. pachyrhizi isolate ZM01-1. The segregation ratio of reddish brown versus tan lesion type in the population supported that resistance was controlled by a single dominant gene. The gene was mapped to a 1-cM region on soybean chromosome 18 corresponding to the same interval as Rpp1. A haplotype analysis of diverse germplasm across a 213-kb interval that included Rpp1 revealed 21 distinct haplotypes of which 4 were present among 5 SBR resistance sources that have a resistance gene in the Rpp1 region. Four major North American soybean ancestors belong to the same SNP haplotype as PI 561356 and seven belong to the same haplotype as PI 594538A, the Rpp1-b source. There were no North American soybean ancestors belonging to the SNP haplotypes found in PI 200492, the source of Rpp1, or PI 587886 and PI 587880A, additional sources with SBR resistance mapping to the Rpp1 region.
Collapse
|
20
|
Structural variants in the soybean genome localize to clusters of biotic stress-response genes. PLANT PHYSIOLOGY 2012; 159:1295-308. [PMID: 22696021 PMCID: PMC3425179 DOI: 10.1104/pp.112.194605] [Citation(s) in RCA: 125] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2012] [Accepted: 06/12/2012] [Indexed: 05/19/2023]
Abstract
Genome-wide structural and gene content variations are hypothesized to drive important phenotypic variation within a species. Structural and gene content variations were assessed among four soybean (Glycine max) genotypes using array hybridization and targeted resequencing. Many chromosomes exhibited relatively low rates of structural variation (SV) among genotypes. However, several regions exhibited both copy number and presence-absence variation, the most prominent found on chromosomes 3, 6, 7, 16, and 18. Interestingly, the regions most enriched for SV were specifically localized to gene-rich regions that harbor clustered multigene families. The most abundant classes of gene families associated with these regions were the nucleotide-binding and receptor-like protein classes, both of which are important for plant biotic defense. The colocalization of SV with plant defense response signal transduction pathways provides insight into the mechanisms of soybean resistance gene evolution and may inform the development of new approaches to resistance gene cloning.
Collapse
|
21
|
Genome-Wide Association Analysis Identifies Candidate Genes Associated with Iron Deficiency Chlorosis in Soybean. THE PLANT GENOME 2011; 4:154-164. [PMID: 0 DOI: 10.3835/plantgenome2011.04.0011] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
|
22
|
Mutational analysis of the major soybean UreF paralogue involved in urease activation. JOURNAL OF EXPERIMENTAL BOTANY 2011; 62:3599-608. [PMID: 21430294 PMCID: PMC3130180 DOI: 10.1093/jxb/err054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2010] [Revised: 01/29/2011] [Accepted: 01/31/2011] [Indexed: 05/08/2023]
Abstract
The soybean genome duplicated ∼14 and 45 million years ago and has many paralogous genes, including those in urease activation (emplacement of Ni and CO(2) in the active site). Activation requires the UreD and UreF proteins, each encoded by two paralogues. UreG, a third essential activation protein, is encoded by the single-copy Eu3, and eu3 mutants lack activity of both urease isozymes. eu2 has the same urease-negative phenotype, consistent with Eu2 being a single-copy gene, possibly encoding a Ni carrier. Unexpectedly, two eu2 alleles co-segregated with missense mutations in the chromosome 2 UreF paralogue (Ch02UreF), suggesting lack of expression/function of Ch14UreF. However, Ch02UreF and Ch14UreF transcripts accumulate at the same level. Further, it had been shown that expression of the Ch14UreF ORF complemented a fungal ureF mutant. A third, nonsense (Q2*) allelic mutant, eu2-c, exhibited 5- to 10-fold more residual urease activity than missense eu2-a or eu2-b, though eu2-c should lack all Ch02UreF protein. It is hypothesized that low-level activation by Ch14UreF is 'spoiled' by the altered missense Ch02UreF proteins ('epistatic dominant-negative'). In agreement with active 'spoiling' by eu2-b-encoded Ch02UreF (G31D), eu2-b/eu2-c heterozygotes had less than half the urease activity of eu2-c/eu2-c siblings. Ch02UreF (G31D) could spoil activation by Chr14UreF because of higher affinity for the activation complex, or because Ch02UreF (G31D) is more abundant than Ch14UreF. Here, the latter is favoured, consistent with a reported in-frame AUG in the 5' leader of Chr14UreF transcript. Translational inhibition could represent a form of 'functional divergence' of duplicated genes.
Collapse
|
23
|
Identification of a second Asian soybean rust resistance gene in Hyuuga soybean. PHYTOPATHOLOGY 2011; 101:535-43. [PMID: 21244223 DOI: 10.1094/phyto-09-10-0257] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
ABSTRACT Asian soybean rust (ASR) is an economically significant disease caused by the fungus Phakopsora pachyrhizi. The soybean genes Rpp3 and Rpp?(Hyuuga) confer resistance to specific isolates of the pathogen. Both genes map to chromosome 6 (Gm06) (linkage group [LG] C2). We recently identified 12 additional soybean accessions that harbor ASR resistance mapping to Gm06, within 5 centimorgans of Rpp3 and Rpp?(Hyuuga). To further characterize genotypes with resistance on Gm06, we used a set of eight P. pachyrhizi isolates collected from geographically diverse areas to inoculate plants and evaluate them for differential phenotypic responses. Three isolates elicited different responses from soybean accessions PI 462312 (Ankur) (Rpp3) and PI 506764 (Hyuuga) (Rpp?[Hyuuga]). In all, 11 of the new accessions yielded responses identical to either PI 462312 or Hyuuga and 1 of the new accessions, PI 417089B (Kuro daizu), differed from all others. Additional screening of Hyuuga-derived recombinant inbred lines indicated that Hyuuga carries two resistance genes, one at the Rpp3 locus on Gm06 and a second, unlinked ASR resistance gene mapping to Gm03 (LG-N) near Rpp5. These findings reveal a natural case of gene pyramiding for ASR resistance in Hyuuga and underscore the importance of utilizing multiple isolates of P. pachyrhizi when screening for ASR resistance.
Collapse
|
24
|
The composition and origins of genomic variation among individuals of the soybean reference cultivar Williams 82. PLANT PHYSIOLOGY 2011; 155:645-55. [PMID: 21115807 PMCID: PMC3032456 DOI: 10.1104/pp.110.166736] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2010] [Accepted: 11/24/2010] [Indexed: 05/18/2023]
Abstract
Soybean (Glycine max) is a self-pollinating species that has relatively low nucleotide polymorphism rates compared with other crop species. Despite the low rate of nucleotide polymorphisms, a wide range of heritable phenotypic variation exists. There is even evidence for heritable phenotypic variation among individuals within some cultivars. Williams 82, the soybean cultivar used to produce the reference genome sequence, was derived from backcrossing a Phytophthora root rot resistance locus from the donor parent Kingwa into the recurrent parent Williams. To explore the genetic basis of intracultivar variation, we investigated the nucleotide, structural, and gene content variation of different Williams 82 individuals. Williams 82 individuals exhibited variation in the number and size of introgressed Kingwa loci. In these regions of genomic heterogeneity, the reference Williams 82 genome sequence consists of a mosaic of Williams and Kingwa haplotypes. Genomic structural variation between Williams and Kingwa was maintained between the Williams 82 individuals within the regions of heterogeneity. Additionally, the regions of heterogeneity exhibited gene content differences between Williams 82 individuals. These findings show that genetic heterogeneity in Williams 82 primarily originated from the differential segregation of polymorphic chromosomal regions following the backcross and single-seed descent generations of the breeding process. We conclude that soybean haplotypes can possess a high rate of structural and gene content variation, and the impact of intracultivar genetic heterogeneity may be significant. This detailed characterization will be useful for interpreting soybean genomic data sets and highlights important considerations for research communities that are developing or utilizing a reference genome sequence.
Collapse
|
25
|
An integrative approach to genomic introgression mapping. PLANT PHYSIOLOGY 2010; 154:3-12. [PMID: 20656899 PMCID: PMC2938162 DOI: 10.1104/pp.110.158949] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 07/21/2010] [Indexed: 05/20/2023]
Abstract
Near-isogenic lines (NILs) are valuable genetic resources for many crop species, including soybean (Glycine max). The development of new molecular platforms promises to accelerate the mapping of genetic introgressions in these materials. Here, we compare some existing and emerging methodologies for genetic introgression mapping: single-feature polymorphism analysis, Illumina GoldenGate single nucleotide polymorphism (SNP) genotyping, and de novo SNP discovery via RNA-Seq analysis of next-generation sequence data. We used these methods to map the introgressed regions in an iron-inefficient soybean NIL and found that the three mapping approaches are complementary when utilized in combination. The comparative RNA-Seq approach offers several additional advantages, including the greatest mapping resolution, marker depth, and de novo marker utility for downstream fine-mapping analysis. We applied the comparative RNA-Seq method to map genetic introgressions in an additional pair of NILs exhibiting differential seed protein content. Furthermore, we attempted to optimize the comparative RNA-Seq approach by assessing the impact of sequence depth, SNP identification methodology, and post hoc analyses on SNP discovery rates. We conclude that the comparative RNA-Seq approach can be optimized with sufficient sampling and by utilizing a post hoc correction accounting for gene density variation that controls for false discoveries.
Collapse
|
26
|
High-throughput SNP discovery and assay development in common bean. BMC Genomics 2010; 11:475. [PMID: 20712881 PMCID: PMC3091671 DOI: 10.1186/1471-2164-11-475] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2010] [Accepted: 08/16/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Next generation sequencing has significantly increased the speed at which single nucleotide polymorphisms (SNPs) can be discovered and subsequently used as molecular markers for research. Unfortunately, for species such as common bean (Phaseolus vulgaris L.) which do not have a whole genome sequence available, the use of next generation sequencing for SNP discovery is much more difficult and costly. To this end we developed a method which couples sequences obtained from the Roche 454-FLX system (454) with the Illumina Genome Analyzer (GA) for high-throughput SNP discovery. RESULTS Using a multi-tier reduced representation library we discovered a total of 3,487 SNPs of which 2,795 contained sufficient flanking genomic sequence for SNP assay development. Using Sanger sequencing to determine the validation rate of these SNPs, we found that 86% are likely to be true SNPs. Furthermore, we designed a GoldenGate assay which contained 1,050 of the 3,487 predicted SNPs. A total of 827 of the 1,050 SNPs produced a working GoldenGate assay (79%). CONCLUSIONS Through combining two next generation sequencing techniques we have developed a method that allows high-throughput SNP discovery in any diploid organism without the need of a whole genome sequence or the creation of normalized cDNA libraries. The need to only perform one 454 run and one GA sequencer run allows high-throughput SNP discovery with sufficient sequence for assay development to be performed in organisms, such as common bean, which have limited genomic resources.
Collapse
|
27
|
Fine mapping of the soybean aphid-resistance gene Rag2 in soybean PI 200538. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 121:599-610. [PMID: 20454773 DOI: 10.1007/s00122-010-1333-6] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2009] [Accepted: 03/26/2010] [Indexed: 05/19/2023]
Abstract
The discovery of biotype diversity of soybean aphid (SA: Aphis glycines Matsumura) in North America emphasizes the necessity to identify new aphid-resistance genes. The soybean [Glycine max (L.) Merr.] plant introduction (PI) 200538 is a promising source of SA resistance because it shows a high level of resistance to a SA biotype that can overcome the SA-resistance gene Rag1 from 'Dowling'. The SA-resistance gene Rag2 was previously mapped from PI 200538 to a 10-cM marker interval on soybean chromosome 13 [formerly linkage group (LG) F]. The objective of this study was to fine map Rag2. This fine mapping was carried out using lines derived from 5,783 F(2) plants at different levels of backcrossing that were screened with flanking genetic markers for the presence of recombination in the Rag2 interval. Fifteen single nucleotide polymorphism (SNP) markers and two dominant polymerase chain reaction-based markers near Rag2 were developed by re-sequencing target intervals and sequence-tagged sites. These efforts resulted in the mapping of Rag2 to a 54-kb interval on the Williams 82 8x assembly (Glyma1). This Williams 82 interval contains seven predicted genes, which includes one nucleotide-binding site-leucine-rich repeat gene. SNP marker and candidate gene information identified in this study will be an important resource in marker-assisted selection for aphid resistance and for cloning the gene.
Collapse
|
28
|
Structural and functional divergence of a 1-Mb duplicated region in the soybean (Glycine max) genome and comparison to an orthologous region from Phaseolus vulgaris. THE PLANT CELL 2010; 22:2545-61. [PMID: 20729383 PMCID: PMC2947175 DOI: 10.1105/tpc.110.074229] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/22/2010] [Revised: 07/21/2010] [Accepted: 07/30/2010] [Indexed: 05/03/2023]
Abstract
Soybean (Glycine max) has undergone at least two rounds of polyploidization, resulting in a paleopolyploid genome that is a mosaic of homoeologous regions. To determine the structural and functional impact of these duplications, we sequenced two ~1-Mb homoeologous regions of soybean, Gm8 and Gm15, derived from the most recent ~13 million year duplication event and the orthologous region from common bean (Phaseolus vulgaris), Pv5. We observed inversions leading to major structural variation and a bias between the two chromosome segments as Gm15 experienced more gene movement (gene retention rate of 81% in Gm15 versus 91% in Gm8) and a nearly twofold increase in the deletion of long terminal repeat (LTR) retrotransposons via solo LTR formation. Functional analyses of Gm15 and Gm8 revealed decreases in gene expression and synonymous substitution rates for Gm15, for instance, a 38% increase in transcript levels from Gm8 relative to Gm15. Transcriptional divergence of homoeologs was found based on expression patterns among seven tissues and developmental stages. Our results indicate asymmetric evolution between homoeologous regions of soybean as evidenced by structural changes and expression variances of homoeologous genes.
Collapse
|
29
|
|
30
|
Fine mapping the soybean aphid resistance gene Rag1 in soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:1063-71. [PMID: 20035316 DOI: 10.1007/s00122-009-1234-8] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 11/30/2009] [Indexed: 05/11/2023]
Abstract
The soybean aphid (Aphis glycines Matsumura) is an important soybean [Glycine max (L.) Merr.] pest in North America. The dominant aphid resistance gene Rag1 was previously mapped from the cultivar 'Dowling' to a 12 cM marker interval on soybean chromosome 7 (formerly linkage group M). The development of additional genetic markers mapping closer to Rag1 was needed to accurately position the gene to improve the effectiveness of marker-assisted selection (MAS) and to eventually clone it. The objectives of this study were to identify single nucleotide polymorphisms (SNPs) near Rag1 and to position these SNPs relative to Rag1. To generate a fine map of the Rag1 interval, 824 BC(4)F(2) and 1,000 BC(4)F(3) plants segregating for the gene were screened with markers flanking Rag1. Plants with recombination events close to the gene were tested with SNPs identified in previous studies along with new SNPs identified from the preliminary Williams 82 draft soybean genome shotgun sequence using direct re-sequencing and gene-scanning melt-curve analysis. Progeny of these recombinant plants were evaluated for aphid resistance. These efforts resulted in the mapping of Rag1 between the two SNP markers 46169.7 and 21A, which corresponds to a physical distance on the Williams 82 8x draft assembly (Glyma1.01) of 115 kilobase pair (kb). Several candidate genes for Rag1 are present within the 115-kb interval. The markers identified in this study that are closely linked to Rag1 will be a useful resource in MAS for this important aphid resistance gene.
Collapse
|
31
|
High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 2010; 11:38. [PMID: 20078886 PMCID: PMC2817691 DOI: 10.1186/1471-2164-11-38] [Citation(s) in RCA: 221] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 01/15/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds. RESULTS A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%. CONCLUSION We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8x whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.
Collapse
|
32
|
Abstract
Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Collapse
|
33
|
High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2008; 116:945-52. [PMID: 18278477 DOI: 10.1007/s00122-008-0726-2] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 01/28/2008] [Indexed: 05/05/2023]
Abstract
Large numbers of single nucleotide polymorphism (SNP) markers are now available for a number of crop species. However, the high-throughput methods for multiplexing SNP assays are untested in complex genomes, such as soybean, that have a high proportion of paralogous genes. The Illumina GoldenGate assay is capable of multiplexing from 96 to 1,536 SNPs in a single reaction over a 3-day period. We tested the GoldenGate assay in soybean to determine the success rate of converting verified SNPs into working assays. A custom 384-SNP GoldenGate assay was designed using SNPs that had been discovered through the resequencing of five diverse accessions that are the parents of three recombinant inbred line (RIL) mapping populations. The 384 SNPs that were selected for this custom assay were predicted to segregate in one or more of the RIL mapping populations. Allelic data were successfully generated for 89% of the SNP loci (342 of the 384) when it was used in the three RIL mapping populations, indicating that the complex nature of the soybean genome had little impact on conversion of the discovered SNPs into usable assays. In addition, 80% of the 342 mapped SNPs had a minor allele frequency >10% when this assay was used on a diverse sample of Asian landrace germplasm accessions. The high success rate of the GoldenGate assay makes this a useful technique for quickly creating high density genetic maps in species where SNP markers are rapidly becoming available.
Collapse
|
34
|
A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics 2007; 176:685-96. [PMID: 17339218 PMCID: PMC1893076 DOI: 10.1534/genetics.107.070821] [Citation(s) in RCA: 261] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 02/16/2007] [Indexed: 11/18/2022] Open
Abstract
The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome. Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence. From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes. In the resulting 2.44 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta= 0.000997. The analysis of the observed genetic distances between adjacent genes vs. the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered. Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5-10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM. The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement.
Collapse
|
35
|
Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics 2007; 175:1937-44. [PMID: 17287533 PMCID: PMC1855121 DOI: 10.1534/genetics.106.069740] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2006] [Accepted: 02/01/2007] [Indexed: 11/18/2022] Open
Abstract
Prospects for utilizing whole-genome association analysis in autogamous plant populations appear promising due to the reported high levels of linkage disequilibrium (LD). To determine the optimal strategies for implementing association analysis in soybean (Glycine max L. Merr.), we analyzed the structure of LD in three regions of the genome varying in length from 336 to 574 kb. This analysis was conducted in four distinct groups of soybean germplasm: 26 accessions of the wild ancestor of soybean (Glycine soja Seib. et Zucc.); 52 Asian G. max Landraces, the immediate results of domestication from G. soja; 17 Asian Landrace introductions that became the ancestors of North American (N. Am.) cultivars, and 25 Elite Cultivars from N. Am. In G. soja, LD did not extend past 100 kb; however, in the three cultivated G. max groups, LD extended from 90 to 574 kb, likely due to the impacts of domestication and increased self-fertilization. The three genomic regions were highly variable relative to the extent of LD within the three cultivated soybean populations. G. soja appears to be ideal for fine mapping of genes, but due to the highly variable levels of LD in the Landraces and the Elite Cultivars, whole-genome association analysis in soybean may be more difficult than first anticipated.
Collapse
|
36
|
BARCSoySNP23: a panel of 23 selected SNPs for soybean cultivar identification. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2007; 114:885-99. [PMID: 17219205 DOI: 10.1007/s00122-006-0487-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2005] [Accepted: 12/16/2006] [Indexed: 05/11/2023]
Abstract
This report describes a set of 23 informative SNPs (BARCSoySNP23) distributed on 19 of the 20 soybean linkage groups that can be used for soybean cultivar identification. Selection of the SNPs to include in this set was made based upon the information provided by each SNP for distinguishing a diverse set of soybean genotypes as well as the linkage map position of each SNP. The genotypes included the ancestors of North American cultivars, modern North American cultivars and a group of Korean cultivars. The procedure used to identify this subset of highly informative SNP markers resulted in a significant increase in the power of identification versus any other randomly selected set of equal number. This conclusion was supported by a simulation which indicated that the 23-SNP panel can uniquely distinguish 2,200 soybean cultivars, whereas sets of randomly selected 23-SNP panels allowed the unique identification of only about 50 cultivars. The 23-SNP panel can efficiently distinguish each of the genotypes within four maturity group sets of additional cultivars/lines that have identical classical pigmentation and morphological traits. Comparatively, the 13 trinucleotide SSR set published earlier (BARCSoySSR13) has more power on a per locus basis because of the multi-allelic nature of SSRs. However, the assay of bi-allelic SNP loci can be multi-plexed using non-gel based techniques allowing for rapid determination of the SNP alleles present in soybean genotypes, thereby compensating for their relatively low information content. Both BARCSoySNP23 and BARCSoySSR13 were highly congruent relative to identifying genotypes and for estimating population genetic differences.
Collapse
|
37
|
Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci U S A 2006; 103:16666-16671. [PMID: 17068128 DOI: 10.1073/pnas.060437910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
Soybean has undergone several genetic bottlenecks. These include domestication in Asia to produce numerous Asian landraces, introduction of relatively few landraces to North America, and then selective breeding over the past 75 years. It is presumed that these three human-mediated events have reduced genetic diversity. We sequenced 111 fragments from 102 genes in four soybean populations representing the populations before and after genetic bottlenecks. We show that soybean has lost many rare sequence variants and has undergone numerous allele frequency changes throughout its history. Although soybean genetic diversity has been eroded by human selection after domestication, it is notable that modern cultivars have retained 72% of the sequence diversity present in the Asian landraces but lost 79% of rare alleles (frequency </=0.10) found in the Asian landraces. Simulations indicated that the diversity lost through the genetic bottlenecks of introduction and plant breeding was mostly due to the small number of Asian introductions and not the artificial selection subsequently imposed by selective breeding. The bottleneck with the most impact was domestication; when the low sequence diversity present in the wild species was halved, 81% of the rare alleles were lost, and 60% of the genes exhibited evidence of significant allele frequency changes.
Collapse
|
38
|
Abstract
Soybean has undergone several genetic bottlenecks. These include domestication in Asia to produce numerous Asian landraces, introduction of relatively few landraces to North America, and then selective breeding over the past 75 years. It is presumed that these three human-mediated events have reduced genetic diversity. We sequenced 111 fragments from 102 genes in four soybean populations representing the populations before and after genetic bottlenecks. We show that soybean has lost many rare sequence variants and has undergone numerous allele frequency changes throughout its history. Although soybean genetic diversity has been eroded by human selection after domestication, it is notable that modern cultivars have retained 72% of the sequence diversity present in the Asian landraces but lost 79% of rare alleles (frequency =0.10) found in the Asian landraces. Simulations indicated that the diversity lost through the genetic bottlenecks of introduction and plant breeding was mostly due to the small number of Asian introductions and not the artificial selection subsequently imposed by selective breeding. The bottleneck with the most impact was domestication; when the low sequence diversity present in the wild species was halved, 81% of the rare alleles were lost, and 60% of the genes exhibited evidence of significant allele frequency changes.
Collapse
|
39
|
Abstract
Background Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. Results We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . Conclusion SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.
Collapse
|
40
|
Abstract
BACKGROUND Single nucleotide polymorphisms (SNP) constitute more than 90% of the genetic variation, and hence can account for most trait differences among individuals in a given species. Polymorphism detection software PolyBayes and PolyPhred give high false positive SNP predictions even with stringent parameter values. We developed a machine learning (ML) method to augment PolyBayes to improve its prediction accuracy. ML methods have also been successfully applied to other bioinformatics problems in predicting genes, promoters, transcription factor binding sites and protein structures. RESULTS The ML program C4.5 was applied to a set of features in order to build a SNP classifier from training data based on human expert decisions (True/False). The training data were 27,275 candidate SNP generated by sequencing 1973 STS (sequence tag sites) (12 Mb) in both directions from 6 diverse homozygous soybean cultivars and PolyBayes analysis. Test data of 18,390 candidate SNP were generated similarly from 1359 additional STS (8 Mb). SNP from both sets were classified by experts. After training the ML classifier, it agreed with the experts on 97.3% of test data compared with 7.8% agreement between PolyBayes and experts. The PolyBayes positive predictive values (PPV) (i.e., fraction of candidate SNP being real) were 7.8% for all predictions and 16.7% for those with 100% posterior probability of being real. Using ML improved the PPV to 84.8%, a 5- to 10-fold increase. While both ML and PolyBayes produced a similar number of true positives, the ML program generated only 249 false positives as compared to 16,955 for PolyBayes. The complexity of the soybean genome may have contributed to high false SNP predictions by PolyBayes and hence results may differ for other genomes. CONCLUSION A machine learning (ML) method was developed as a supplementary feature to the polymorphism detection software for improving prediction accuracies. The results from this study indicate that a trained ML classifier can significantly reduce human intervention and in this case achieved a 5-10 fold enhanced productivity. The optimized feature set and ML framework can also be applied to all polymorphism discovery software. ML support software is written in Perl and can be easily integrated into an existing SNP discovery pipeline.
Collapse
|
41
|
Seed quality QTL in a prominent soybean population. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2004; 109:552-61. [PMID: 15221142 DOI: 10.1007/s00122-004-1661-5] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2003] [Accepted: 03/08/2004] [Indexed: 05/24/2023]
Abstract
Soybean [Glycine max (L.) Merr.] is a versatile crop due to its multitude of uses as a high protein meal and vegetable oil. Soybean seed traits such as seed protein and oil concentration and seed size are important quantitative traits. The objective of this study was to identify representative protein, oil, and seed size quantitative trait loci (QTL) in soybean. A recombinant inbred line (RIL) population consisting of 131 F6-derived lines was created from two prominent ancestors of North American soybeans ('Essex' and 'Williams') and the RILs were grown in six environments. One hundred simple sequence repeat (SSR) markers spaced throughout the genome were mapped in this population. There were a total of four protein, six oil, and seven seed size QTL found in this population. The QTL found in this study may assist breeders in marker-assisted selection (MAS) to retain current positive QTL in modern soybeans while simultaneously pyramiding additional QTL from new germplasm.
Collapse
|
42
|
Abstract
Single-nucleotide polymorphisms (SNPs) provide an abundant source of DNA polymorphisms in a number of eukaryotic species. Information on the frequency, nature, and distribution of SNPs in plant genomes is limited. Thus, our objectives were (1) to determine SNP frequency in coding and noncoding soybean (Glycine max L. Merr.) DNA sequence amplified from genomic DNA using PCR primers designed to complete genes, cDNAs, and random genomic sequence; (2) to characterize haplotype variation in these sequences; and (3) to provide initial estimates of linkage disequilibrium (LD) in soybean. Approximately 28.7 kbp of coding sequence, 37.9 kbp of noncoding perigenic DNA, and 9.7 kbp of random noncoding genomic DNA were sequenced in each of 25 diverse soybean genotypes. Over the >76 kbp, mean nucleotide diversity expressed as Watterson's theta was 0.00097. Nucleotide diversity was 0.00053 and 0.00111 in coding and in noncoding perigenic DNA, respectively, lower than estimates in the autogamous model species Arabidopsis thaliana. Haplotype analysis of SNP-containing fragments revealed a deficiency of haplotypes vs. the number that would be anticipated at linkage equilibrium. In 49 fragments with three or more SNPs, five haplotypes were present in one fragment while four or less were present in the remaining 48, thereby supporting the suggestion of relatively limited genetic variation in cultivated soybean. Squared allele-frequency correlations (r(2)) among haplotypes at 54 loci with two or more SNPs indicated low genome-wide LD. The low level of LD and the limited haplotype diversity suggested that the genome of any given soybean accession is a mosaic of three or four haplotypes. To facilitate SNP discovery and the development of a transcript map, subsets of four to six diverse genotypes, whose sequence analysis would permit the discovery of at least 75% of all SNPs present in the 25 genotypes as well as 90% of the common (frequency >0.10) SNPs, were identified.
Collapse
|
43
|
Mapping the Fas locus controlling stearic acid content in soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2003; 106:615-9. [PMID: 12595989 DOI: 10.1007/s00122-002-1086-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2002] [Accepted: 07/08/2002] [Indexed: 05/24/2023]
Abstract
Increasing the stearic acid content to improve soybean [ Glycine max (L) Merr] oil quality is a desirable breeding objective for food-processing applications. Although a saturated fatty acid, stearic acid has been shown to reduce total levels of blood cholesterol and offers the potential for the production of solid fat products (such as margarine) without hydrogenation. This would result in the reduction of the level of trans fat in food products and alleviate some current health concerns. A segregating F(2) population was developed from the cross between Dare, a normal stearic acid content cultivar, and FAM94-41, a high stearic acid content line. This population was used to assess linkage between the Fas locus and simple sequence repeat (SSR) markers. Three SSR markers, Satt070, Satt474 and Satt556, were identified to be associated with stearic acid (P < 0.0001, r(2) > 0.61). A linkage map consisting of the three SSR markers and the Fas locus was then constructed in map order, Fas, Satt070, Satt474 and Satt556, with a LOD score of 3.0. Identification of these markers may be useful in molecular marker-assisted breeding programs targeting modifications in soybean fatty acids.
Collapse
|