1
|
Identification of candidate genes involved in early iron deficiency chlorosis signaling in soybean (Glycine max) roots and leaves. BMC Genomics 2014; 15:702. [PMID: 25149281 PMCID: PMC4161901 DOI: 10.1186/1471-2164-15-702] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2014] [Accepted: 08/12/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Iron is an essential micronutrient for all living things, required in plants for photosynthesis, respiration and metabolism. A lack of bioavailable iron in soil leads to iron deficiency chlorosis (IDC), causing a reduction in photosynthesis and interveinal yellowing of leaves. Soybeans (Glycine max (L.) Merr.) grown in high pH soils often suffer from IDC, resulting in substantial yield losses. Iron efficient soybean cultivars maintain photosynthesis and have higher yields under IDC-promoting conditions than inefficient cultivars. RESULTS To capture signaling between roots and leaves and identify genes acting early in the iron efficient cultivar Clark, we conducted a RNA-Seq study at one and six hours after replacing iron sufficient hydroponic media (100 μM iron(III) nitrate nonahydrate) with iron deficient media (50 μM iron(III) nitrate nonahydrate). At one hour of iron stress, few genes were differentially expressed in leaves but many were already changing expression in roots. By six hours, more genes were differentially expressed in the leaves, and a massive shift was observed in the direction of gene expression in both roots and leaves. Further, there was little overlap in differentially expressed genes identified in each tissue and time point. CONCLUSIONS Genes involved in hormone signaling, regulation of DNA replication and iron uptake utilization are key aspects of the early iron-efficiency response. We observed dynamic gene expression differences between roots and leaves, suggesting the involvement of many transcription factors in eliciting rapid changes in gene expression. In roots, genes involved iron uptake and development of Casparian strips were induced one hour after iron stress. In leaves, genes involved in DNA replication and sugar signaling responded to iron deficiency. The differentially expressed genes (DEGs) and signaling components identified here represent new targets for soybean improvement.
Collapse
|
2
|
Replication protein A subunit 3 and the iron efficiency response in soybean. PLANT, CELL & ENVIRONMENT 2014; 37:213-34. [PMID: 23742135 DOI: 10.1111/pce.12147] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Revised: 05/09/2013] [Accepted: 05/28/2013] [Indexed: 05/20/2023]
Abstract
In soybean [Glycine max (L.) Merr.], iron deficiency results in interveinal chlorosis and decreased photosynthetic capacity, leading to stunting and yield loss. In this study, gene expression analyses investigated the role of soybean replication protein A (RPA) subunits during iron stress. Nine RPA homologs were significantly differentially expressed in response to iron stress in the near isogenic lines (NILs) Clark (iron efficient) and Isoclark (iron inefficient). RPA homologs exhibited opposing expression patterns in the two NILs, with RPA expression significantly repressed during iron deficiency in Clark but induced in Isoclark. We used virus induced gene silencing (VIGS) to repress GmRPA3 expression in the iron inefficient line Isoclark and mirror expression in Clark. GmRPA3-silenced plants had improved IDC symptoms and chlorophyll content under iron deficient conditions and also displayed stunted growth regardless of iron availability. RNA-Seq comparing gene expression between GmRPA3-silenced and empty vector plants revealed massive transcriptional reprogramming with differential expression of genes associated with defense, immunity, aging, death, protein modification, protein synthesis, photosynthesis and iron uptake and transport genes. Our findings suggest the iron efficient genotype Clark is able to induce energy controlling pathways, possibly regulated by SnRK1/TOR, to promote nutrient recycling and stress responses in iron deficient conditions.
Collapse
|
3
|
Transcriptome analyses and virus induced gene silencing identify genes in the Rpp4-mediated Asian soybean rust resistance pathway. FUNCTIONAL PLANT BIOLOGY : FPB 2013; 40:1029-1047. [PMID: 32481171 DOI: 10.1071/fp12296] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 01/12/2013] [Indexed: 05/24/2023]
Abstract
Rpp4 (Resistance to Phakopsora pachyrhizi 4) confers resistance to Phakopsora pachyrhizi Sydow, the causal agent of Asian soybean rust (ASR). By combining expression profiling and virus induced gene silencing (VIGS), we are developing a genetic framework for Rpp4-mediated resistance. We measured gene expression in mock-inoculated and P. pachyrhizi-infected leaves of resistant soybean accession PI459025B (Rpp4) and the susceptible cultivar (Williams 82) across a 12-day time course. Unexpectedly, two biphasic responses were identified. In the incompatible reaction, genes induced at 12h after infection (hai) were not differentially expressed at 24 hai, but were induced at 72 hai. In contrast, genes repressed at 12 hai were not differentially expressed from 24 to 144 hai, but were repressed 216 hai and later. To differentiate between basal and resistance-gene (R-gene) mediated defence responses, we compared gene expression in Rpp4-silenced and empty vector-treated PI459025B plants 14 days after infection (dai) with P. pachyrhizi. This identified genes, including transcription factors, whose differential expression is dependent upon Rpp4. To identify differentially expressed genes conserved across multiple P. pachyrhizi resistance pathways, Rpp4 expression datasets were compared with microarray data previously generated for Rpp2 and Rpp3-mediated defence responses. Fourteen transcription factors common to all resistant and susceptible responses were identified, as well as fourteen transcription factors unique to R-gene-mediated resistance responses. These genes are targets for future P. pachyrhizi resistance research.
Collapse
|
4
|
Genomic heterogeneity and structural variation in soybean near isogenic lines. FRONTIERS IN PLANT SCIENCE 2013; 4:104. [PMID: 23630538 PMCID: PMC3633938 DOI: 10.3389/fpls.2013.00104] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 04/04/2013] [Indexed: 05/29/2023]
Abstract
Near isogenic lines (NILs) are a critical genetic resource for the soybean research community. The ability to identify and characterize the genes driving the phenotypic differences between NILs is limited by the degree to which differential genetic introgressions can be resolved. Furthermore, the genetic heterogeneity extant among NIL sub-lines is an unaddressed research topic that might have implications for how genomic and phenotypic data from NILs are utilized. In this study, a recently developed high-resolution comparative genomic hybridization (CGH) platform was used to investigate the structure and diversity of genetic introgressions in two classical soybean NIL populations, respectively varying in protein content and iron deficiency chlorosis (IDC) susceptibility. There were three objectives: assess the capacity for CGH to resolve genomic introgressions, identify introgressions that are heterogeneous among NIL sub-lines, and associate heterogeneous introgressions with susceptibility to IDC. Using the CGH approach, introgression boundaries were refined and previously unknown introgressions were revealed. Furthermore, heterogeneous introgressions were identified within seven sub-lines of the IDC NIL "IsoClark." This included three distinct introgression haplotypes linked to the major iron susceptible locus on chromosome 03. A phenotypic assessment of the seven sub-lines did not reveal any differences in IDC susceptibility, indicating that the genetic heterogeneity among the lines does not have a significant impact on the primary NIL phenotype.
Collapse
|
5
|
Abstract
A set of 219 DNA clones derived from mungbean (Vigna radiata), cowpea (V. unguiculata), common bean (Phaseolus vulgaris), and soybean (Glycine max) were used to generate comparative linkage maps among mungbean, common bean, and soybean. The maps allowed an assessment of linkage conservation and collinearity among the three genomes. Mungbean and common bean, both of the subtribe Phaseolinae, exhibited a high degree of linkage conservation and preservation of marker order. Most linkage groups of mungbean consisted of only one or two linkage blocks from common bean (and vice versa). The situation was significantly different with soybean, a member of the subtribe Glycininae. Mungbean and common bean linkage groups were generally mosaics of short soybean linkage blocks, each only a few centimorgans in length. These results suggest that it would be fruitful to join maps of mungbean and common bean, while knowledge of conserved genomic blocks would be useful in increasing marker density in specific genomic regions for all three genera. These comparative maps may also contribute to enhanced understanding of legume evolution.
Collapse
|
6
|
Abstract
We constructed a soybean bacterial artificial chromosome (BAC) library suitable for map-based cloning and physical mapping in soybean. This library consists of approximately 40 000 clones (4-5 genome equivalents) stored individually in 384-well microtiter dishes. A random sampling of 224 clones yielded an average insert size of 150 kb, giving a 98% probability of recovering any specific sequence. We screened the library for seven single or very low copy genie or genomic sequences using the polymerase chain reaction (PCR) and found between one and seven BACs for each of the seven sequences. When testing the library with a portion of the soybean psbA chloroplast gene, we found less than 1% chloroplast DNA representation. We also screened the library for eight different classes of disease resistance gene analogs (RGAs) and identified BACs containing all RGAs except class 8. We arranged nine of the class 1 RGA BACs and six of the class 3 RGA BACs into individual contigs based on fingerprint patterns observed after Southern probing of restriction digests of the member BACs with a class-specific sequence. This resulted in the partial localization of the different multigene family sequences without precise definition of their exact positions. Using PCR-based end rescue techniques and RFLP mapping of BAC ends, we mapped individual BACs of each contig onto linkage group J of the soybean public map. The class 1 contig mapped to the region on linkage group J that contains several disease resistance genes. The class 1 contig extended approximately 400 kb. The arrangement of the BACs within this contig has been confirmed using PCR. One end of the class 1 contig core BAC mapped to two positions on linkage group J and cosegregated with two class 1 RGA loci, suggesting that this segment is within an area of regional duplication.
Collapse
|
7
|
Large homogeneous genome regions (isochores) in soybean [glycine max (L.) merr]. Front Genet 2012; 3:98. [PMID: 22934101 PMCID: PMC3365285 DOI: 10.3389/fgene.2012.00098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 05/14/2012] [Indexed: 11/13/2022] Open
Abstract
The landscape of plant genomes, while slowly being characterized and defined, is still composed primarily of regions of undefined function. Many eukaryotic genomes contain isochore regions, mosaics of homogeneous GC content that can abruptly change from one neighboring isochore to the next. Isochores are broken into families that are characterized by their GC levels. We identified 4,339 compositionally distinct domains and 331 of these were identified as long homogeneous genome regions (LHGRs). We assigned these to four families based on finite mixture models of GC content. We then characterized each family with respect to exon length, gene content, and transposable elements. The LHGR pattern of soybeans is unique in that while the majority of the genes within LHGRs are found within a single LHGR family with a narrow GC range (Family B), that family is not the highest in GC content as seen in vertebrates and invertebrates. Instead Family B has a mean GC content of 35%. The range of GC content for all LHGRs is 16–59% GC which is a larger range than what is typical of vertebrates. This is the first study in which LHGRs have been identified in soybeans and the functions of the genes within the LHGRs have been analyzed.
Collapse
|
8
|
Identification of candidate genes underlying an iron efficiency quantitative trait locus in soybean. PLANT PHYSIOLOGY 2012; 158:1745-54. [PMID: 22319075 PMCID: PMC3320182 DOI: 10.1104/pp.111.189860] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 01/29/2012] [Indexed: 05/19/2023]
Abstract
Prevalent on calcareous soils in the United States and abroad, iron deficiency is among the most common and severe nutritional stresses in plants. In soybean (Glycine max) commercial plantings, the identification and use of iron-efficient genotypes has proven to be the best form of managing this soil-related plant stress. Previous studies conducted in soybean identified a significant iron efficiency quantitative trait locus (QTL) explaining more than 70% of the phenotypic variation for the trait. In this research, we identified candidate genes underlying this QTL through molecular breeding, mapping, and transcriptome sequencing. Introgression mapping was performed using two related near-isogenic lines in which a region located on soybean chromosome 3 required for iron efficiency was identified. The region corresponds to the previously reported iron efficiency QTL. The location was further confirmed through QTL mapping conducted in this study. Transcriptome sequencing and quantitative real-time-polymerase chain reaction identified two genes encoding transcription factors within the region that were significantly induced in soybean roots under iron stress. The two induced transcription factors were identified as homologs of the subgroup lb basic helix-loop-helix (bHLH) genes that are known to regulate the strategy I response in Arabidopsis (Arabidopsis thaliana). Resequencing of these differentially expressed genes unveiled a significant deletion within a predicted dimerization domain. We hypothesize that this deletion disrupts the Fe-DEFICIENCY-INDUCED TRANSCRIPTION FACTOR (FIT)/bHLH heterodimer that has been shown to induce known iron acquisition genes.
Collapse
|
9
|
Evolutionary and comparative analyses of the soybean genome. BREEDING SCIENCE 2012; 61:437-44. [PMID: 23136483 PMCID: PMC3406793 DOI: 10.1270/jsbbs.61.437] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 09/19/2011] [Indexed: 05/08/2023]
Abstract
The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods.
Collapse
|
10
|
Abstract
Genomic architecture appears to be a largely unexplored component of gene expression. That architecture can be related to chromatin domains, transposable element neighborhoods, epigenetic modifications of the genome, and more. Although surely not the end of the story, we are learning that when it comes to gene expression, size is also important. We have been surprised to find that certain patterns of expression, tissue specific versus constitutive, or high expression versus low expression, are often associated with physical attributes of the gene and genome. Multiple studies have shown an inverse relationship between gene expression patterns and various physical parameters of the genome such as intron size, exon size, intron number, and size of intergenic regions. An increase in expression level and breadth often correlates with a decrease in the size of physical attributes of the gene. Three models have been proposed to explain these relationships. Contradictory results were found in several organisms when expression level and expression breadth were analyzed independently. However, when both factors were combined in a single study a novel relationship was revealed. At low levels of expression, an increase in expression breadth correlated with an increase in genic, intergenic, and intragenic sizes. Contrastingly, at high levels of expression, an increase in expression breadth inversely correlated with the size of the gene. In this article we explore the several hypotheses regarding genome physical parameters and gene expression.
Collapse
|
11
|
Changes in twelve homoeologous genomic regions in soybean following three rounds of polyploidy. THE PLANT CELL 2011; 23:3129-36. [PMID: 21917551 PMCID: PMC3203428 DOI: 10.1105/tpc.111.089573] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
With the advent of high-throughput sequencing, the availability of genomic sequence for comparative genomics is increasing exponentially. Numerous completed plant genome sequences enable characterization of patterns of the retention and evolution of genes within gene families due to multiple polyploidy events, gene loss and fractionation, and differential evolutionary pressures over time and across different gene families. In this report, we trace the changes that have occurred in 12 surviving homoeologous genomic regions from three rounds of polyploidy that contributed to the current Glycine max genome: a genome triplication before the origin of the rosids (~130 to 240 million years ago), a genome duplication early in the legumes (~58 million years ago), and a duplication in the Glycine lineage (~13 million years ago). Patterns of gene retention following the genome triplication event generally support predictions of the Gene Balance Hypothesis. Finally, we find that genes in networks with a high level of connectivity are more strongly conserved than those with low connectivity and that the enrichment of these highly connected genes in the 12 highly conserved homoeologous segments may in part explain their retention over more than 100 million years and repeated polyploidy events.
Collapse
|
12
|
Abstract
Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions.
Collapse
|
13
|
An integrative approach to genomic introgression mapping. PLANT PHYSIOLOGY 2010; 154:3-12. [PMID: 20656899 PMCID: PMC2938162 DOI: 10.1104/pp.110.158949] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2010] [Accepted: 07/21/2010] [Indexed: 05/20/2023]
Abstract
Near-isogenic lines (NILs) are valuable genetic resources for many crop species, including soybean (Glycine max). The development of new molecular platforms promises to accelerate the mapping of genetic introgressions in these materials. Here, we compare some existing and emerging methodologies for genetic introgression mapping: single-feature polymorphism analysis, Illumina GoldenGate single nucleotide polymorphism (SNP) genotyping, and de novo SNP discovery via RNA-Seq analysis of next-generation sequence data. We used these methods to map the introgressed regions in an iron-inefficient soybean NIL and found that the three mapping approaches are complementary when utilized in combination. The comparative RNA-Seq approach offers several additional advantages, including the greatest mapping resolution, marker depth, and de novo marker utility for downstream fine-mapping analysis. We applied the comparative RNA-Seq method to map genetic introgressions in an additional pair of NILs exhibiting differential seed protein content. Furthermore, we attempted to optimize the comparative RNA-Seq approach by assessing the impact of sequence depth, SNP identification methodology, and post hoc analyses on SNP discovery rates. We conclude that the comparative RNA-Seq approach can be optimized with sufficient sampling and by utilizing a post hoc correction accounting for gene density variation that controls for false discoveries.
Collapse
|
14
|
Applying small-scale DNA signatures as an aid in assembling soybean chromosome sequences. Adv Bioinformatics 2010; 2010:976792. [PMID: 20827309 PMCID: PMC2933861 DOI: 10.1155/2010/976792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Accepted: 06/28/2010] [Indexed: 11/18/2022] Open
Abstract
Previous work has established a genomic signature based on relative counts of the 16 possible dinucleotides. Until now, it has been generally accepted that the dinucleotide signature is characteristic of a genome and is relatively homogeneous across a genome. However, we found some local regions of the soybean genome with a signature differing widely from that of the rest of the genome. Those regions were mostly centromeric and pericentromeric, and enriched for repetitive sequences. We found that DNA binding energy also presented large-scale patterns across soybean chromosomes. These two patterns were helpful during assembly and quality control of soybean whole genome shotgun scaffold sequences into chromosome pseudomolecules.
Collapse
|
15
|
RNA-Seq Atlas of Glycine max: a guide to the soybean transcriptome. BMC PLANT BIOLOGY 2010; 10:160. [PMID: 20687943 PMCID: PMC3017786 DOI: 10.1186/1471-2229-10-160] [Citation(s) in RCA: 438] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2010] [Accepted: 08/05/2010] [Indexed: 05/18/2023]
Abstract
BACKGROUND Next generation sequencing is transforming our understanding of transcriptomes. It can determine the expression level of transcripts with a dynamic range of over six orders of magnitude from multiple tissues, developmental stages or conditions. Patterns of gene expression provide insight into functions of genes with unknown annotation. RESULTS The RNA Seq-Atlas presented here provides a record of high-resolution gene expression in a set of fourteen diverse tissues. Hierarchical clustering of transcriptional profiles for these tissues suggests three clades with similar profiles: aerial, underground and seed tissues. We also investigate the relationship between gene structure and gene expression and find a correlation between gene length and expression. Additionally, we find dramatic tissue-specific gene expression of both the most highly-expressed genes and the genes specific to legumes in seed development and nodule tissues. Analysis of the gene expression profiles of over 2,000 genes with preferential gene expression in seed suggests there are more than 177 genes with functional roles that are involved in the economically important seed filling process. Finally, the Seq-atlas also provides a means of evaluating existing gene model annotations for the Glycine max genome. CONCLUSIONS This RNA-Seq atlas extends the analyses of previous gene expression atlases performed using Affymetrix GeneChip technology and provides an example of new methods to accommodate the increase in transcriptome data obtained from next generation sequencing. Data contained within this RNA-Seq atlas of Glycine max can be explored at http://www.soybase.org/soyseq.
Collapse
|
16
|
Evolutionary conservation, diversity and specificity of LTR-retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 63:584-98. [PMID: 20525006 DOI: 10.1111/j.1365-313x.2010.04263.x] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 long terminal repeat-retrotransposon (LTR-RT) families comprising 32370 elements in soybean (Glycine max (L.) Merr.). Approximately 87% of these elements were located in recombination-suppressed pericentromeric regions, where the ratio (1.26) of solo LTRs to intact elements (S/I) is significantly lower than that of chromosome arms (1.62). Further analysis revealed a significant positive correlation between S/I and LTR sizes, indicating that larger LTRs facilitate solo LTR formation. Phylogenetic analysis revealed seven Copia and five Gypsy evolutionary lineages that were present before the divergence of eudicot and monocot species, but the scales and timeframes within which they proliferated vary dramatically across families, lineages and species, and notably, a Copia lineage has been lost in soybean. Analysis of the physical association of LTR-RTs with centromere satellite repeats identified two putative centromere retrotransposon (CR) families of soybean, which were grouped into the CR (e.g. CRR and CRM) lineage found in grasses, indicating that the 'functional specification' of CR pre-dates the bifurcation of eudicots and monocots. However, a number of families of the CR lineage are not concentrated in centromeres, suggesting that their CR roles may now be defunct. Our data also suggest that the envelope-like genes in the putative Copia retrovirus-like family are probably derived from the Gypsy retrovirus-like lineage, and thus we propose the hypothesis of a single ancient origin of envelope-like genes in flowering plants.
Collapse
|
17
|
Applications and methods utilizing the Simple Semantic Web Architecture and Protocol (SSWAP) for bioinformatics resource discovery and disparate data and service integration. BioData Min 2010; 3:3. [PMID: 20525377 PMCID: PMC2894815 DOI: 10.1186/1756-0381-3-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 06/04/2010] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Scientific data integration and computational service discovery are challenges for the bioinformatic community. This process is made more difficult by the separate and independent construction of biological databases, which makes the exchange of data between information resources difficult and labor intensive. A recently described semantic web protocol, the Simple Semantic Web Architecture and Protocol (SSWAP; pronounced "swap") offers the ability to describe data and services in a semantically meaningful way. We report how three major information resources (Gramene, SoyBase and the Legume Information System [LIS]) used SSWAP to semantically describe selected data and web services. METHODS We selected high-priority Quantitative Trait Locus (QTL), genomic mapping, trait, phenotypic, and sequence data and associated services such as BLAST for publication, data retrieval, and service invocation via semantic web services. Data and services were mapped to concepts and categories as implemented in legacy and de novo community ontologies. We used SSWAP to express these offerings in OWL Web Ontology Language (OWL), Resource Description Framework (RDF) and eXtensible Markup Language (XML) documents, which are appropriate for their semantic discovery and retrieval. We implemented SSWAP services to respond to web queries and return data. These services are registered with the SSWAP Discovery Server and are available for semantic discovery at http://sswap.info. RESULTS A total of ten services delivering QTL information from Gramene were created. From SoyBase, we created six services delivering information about soybean QTLs, and seven services delivering genetic locus information. For LIS we constructed three services, two of which allow the retrieval of DNA and RNA FASTA sequences with the third service providing nucleic acid sequence comparison capability (BLAST). CONCLUSIONS The need for semantic integration technologies has preceded available solutions. We report the feasibility of mapping high priority data from local, independent, idiosyncratic data schemas to common shared concepts as implemented in web-accessible ontologies. These mappings are then amenable for use in semantic web services. Our implementation of approximately two dozen services means that biological data at three large information resources (Gramene, SoyBase, and LIS) is available for programmatic access, semantic searching, and enhanced interaction between the separate missions of these resources.
Collapse
|
18
|
|
19
|
Complementary genetic and genomic approaches help characterize the linkage group I seed protein QTL in soybean. BMC PLANT BIOLOGY 2010; 10:41. [PMID: 20199683 PMCID: PMC2848761 DOI: 10.1186/1471-2229-10-41] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/03/2010] [Indexed: 05/19/2023]
Abstract
BACKGROUND The nutritional and economic value of many crops is effectively a function of seed protein and oil content. Insight into the genetic and molecular control mechanisms involved in the deposition of these constituents in the developing seed is needed to guide crop improvement. A quantitative trait locus (QTL) on Linkage Group I (LG I) of soybean (Glycine max (L.) Merrill) has a striking effect on seed protein content. RESULTS A soybean near-isogenic line (NIL) pair contrasting in seed protein and differing in an introgressed genomic segment containing the LG I protein QTL was used as a resource to demarcate the QTL region and to study variation in transcript abundance in developing seed. The LG I QTL region was delineated to less than 8.4 Mbp of genomic sequence on chromosome 20. Using Affymetrix Soy GeneChip and high-throughput Illumina whole transcriptome sequencing platforms, 13 genes displaying significant seed transcript accumulation differences between NILs were identified that mapped to the 8.4 Mbp LG I protein QTL region. CONCLUSIONS This study identifies gene candidates at the LG I protein QTL for potential involvement in the regulation of protein content in the soybean seed. The results demonstrate the power of complementary approaches to characterize contrasting NILs and provide genome-wide transcriptome insight towards understanding seed biology and the soybean genome.
Collapse
|
20
|
SoyTEdb: a comprehensive database of transposable elements in the soybean genome. BMC Genomics 2010; 11:113. [PMID: 20163715 PMCID: PMC2830986 DOI: 10.1186/1471-2164-11-113] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 02/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop. DESCRIPTION Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95%) of these elements (particularly a few hundred low-copy-number families) are first described in this study. CONCLUSION SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually curated transposable element database for any individual plant genome completely sequenced to date. Transposable elements previously identified in legumes, the third largest family of flowering plants, are relatively scarce. Thus this database will facilitate structural, evolutionary, functional, and epigenetic analyses of transposable elements in soybean and other legume species.
Collapse
|
21
|
High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence. BMC Genomics 2010; 11:38. [PMID: 20078886 PMCID: PMC2817691 DOI: 10.1186/1471-2164-11-38] [Citation(s) in RCA: 221] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 01/15/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds. RESULTS A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%. CONCLUSION We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8x whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism.
Collapse
|
22
|
Abstract
Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Collapse
|
23
|
Bifurcation and enhancement of autonomous-nonautonomous retrotransposon partnership through LTR Swapping in soybean. THE PLANT CELL 2010; 22:48-61. [PMID: 20081112 PMCID: PMC2828711 DOI: 10.1105/tpc.109.068775] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2009] [Revised: 10/11/2009] [Accepted: 10/23/2009] [Indexed: 05/02/2023]
Abstract
Long terminal repeat (LTR) retrotransposons, the most abundant genomic components in flowering plants, are classifiable into autonomous and nonautonomous elements based on their structural completeness and transposition capacity. It has been proposed that selection is the major force for maintaining sequence (e.g., LTR) conservation between nonautonomous elements and their autonomous counterparts. Here, we report the structural, evolutionary, and expression characterization of a giant retrovirus-like soybean (Glycine max) LTR retrotransposon family, SNARE. This family contains two autonomous subfamilies, SARE(A) and SARE(B), that appear to have evolved independently since the soybean genome tetraploidization event approximately 13 million years ago, and a nonautonomous subfamily, SNRE, that originated from SARE(A). Unexpectedly, a subset of the SNRE elements, which amplified from a single founding SNRE element within the last approximately 3 million years, have been dramatically homogenized with either SARE(A) or SARE(B) primarily in the LTR regions and bifurcated into distinct subgroups corresponding to the two autonomous subfamilies. We uncovered evidence of region-specific swapping of nonautonomous elements with autonomous elements that primarily generated various nonautonomous recombinants with LTR sequences from autonomous elements of different evolutionary lineages, thus revealing a molecular mechanism for the enhancement of preexisting partnership and the establishment of new partnership between autonomous and nonautonomous elements.
Collapse
|
24
|
SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res 2010; 38:D843-6. [PMID: 20008513 PMCID: PMC2808871 DOI: 10.1093/nar/gkp798] [Citation(s) in RCA: 344] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Revised: 09/04/2009] [Accepted: 09/10/2009] [Indexed: 11/25/2022] Open
Abstract
SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The quantitative trait loci (QTL) represent more than 18 years of QTL mapping of more than 90 unique traits. SoyBase also contains the well-annotated 'Williams 82' genomic sequence and associated data mining tools. The genetic and sequence views of the soybean chromosomes and the extensive data on traits and phenotypes are extensively interlinked. This allows entry to the database using almost any kind of available information, such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the repository for controlled vocabularies for soybean growth, development and trait terms, which are also linked to the more general plant ontologies. SoyBase can be accessed at http://soybase.org.
Collapse
|
25
|
Integrating microarray analysis and the soybean genome to understand the soybeans iron deficiency response. BMC Genomics 2009; 10:376. [PMID: 19678937 PMCID: PMC2907705 DOI: 10.1186/1471-2164-10-376] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 08/13/2009] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Soybeans grown in the upper Midwestern United States often suffer from iron deficiency chlorosis, which results in yield loss at the end of the season. To better understand the effect of iron availability on soybean yield, we identified genes in two near isogenic lines with changes in expression patterns when plants were grown in iron sufficient and iron deficient conditions. RESULTS Transcriptional profiles of soybean (Glycine max, L. Merr) near isogenic lines Clark (PI548553, iron efficient) and IsoClark (PI547430, iron inefficient) grown under Fe-sufficient and Fe-limited conditions were analyzed and compared using the Affymetrix GeneChip Soybean Genome Array. There were 835 candidate genes in the Clark (PI548553) genotype and 200 candidate genes in the IsoClark (PI547430) genotype putatively involved in soybean's iron stress response. Of these candidate genes, fifty-eight genes in the Clark genotype were identified with a genetic location within known iron efficiency QTL and 21 in the IsoClark genotype. The arrays also identified 170 single feature polymorphisms (SFPs) specific to either Clark or IsoClark. A sliding window analysis of the microarray data and the 7X genome assembly coupled with an iterative model of the data showed the candidate genes are clustered in the genome. An analysis of 5' untranslated regions in the promoter of candidate genes identified 11 conserved motifs in 248 differentially expressed genes, all from the Clark genotype, representing 129 clusters identified earlier, confirming the cluster analysis results. CONCLUSION These analyses have identified the first genes with expression patterns that are affected by iron stress and are located within QTL specific to iron deficiency stress. The genetic location and promoter motif analysis results support the hypothesis that the differentially expressed genes are co-regulated. The combined results of all analyses lead us to postulate iron inefficiency in soybean is a result of a mutation in a transcription factor(s), which controls the expression of genes required in inducing an iron stress response.
Collapse
|
26
|
Identification and analyses of candidate genes for rpp4-mediated resistance to Asian soybean rust in soybean. PLANT PHYSIOLOGY 2009; 150:295-307. [PMID: 19251904 PMCID: PMC2675740 DOI: 10.1104/pp.108.134551] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2008] [Accepted: 02/24/2009] [Indexed: 05/19/2023]
Abstract
Asian soybean rust is a formidable threat to soybean (Glycine max) production in many areas of the world, including the United States. Only five sources of resistance have been identified (Resistance to Phakopsora pachyrhizi1 [Rpp1], Rpp2, Rpp3, Rpp4, and Rpp5). Rpp4 was previously identified in the resistant genotype PI459025B and mapped within 2 centimorgans of Satt288 on soybean chromosome 18 (linkage group G). Using simple sequence repeat markers, we developed a bacterial artificial chromosome contig for the Rpp4 locus in the susceptible cv Williams82 (Wm82). Sequencing within this region identified three Rpp4 candidate disease resistance genes (Rpp4C1-Rpp4C3 [Wm82]) with greatest similarity to the lettuce (Lactuca sativa) RGC2 family of coiled coil-nucleotide binding site-leucine rich repeat disease resistance genes. Constructs containing regions of the Wm82 Rpp4 candidate genes were used for virus-induced gene silencing experiments to silence resistance in PI459025B, confirming that orthologous genes confer resistance. Using primers developed from conserved sequences in the Wm82 Rpp4 candidate genes, we identified five Rpp4 candidate genes (Rpp4C1-Rpp4C5 [PI459025B]) from the resistant genotype. Additional markers developed from the Wm82 Rpp4 bacterial artificial chromosome contig further defined the region containing Rpp4 and eliminated Rpp4C1 (PI459025B) and Rpp4C3 (PI459025B) as candidate genes. Sequencing of reverse transcription-polymerase chain reaction products revealed that Rpp4C4 (PI459025B) was highly expressed in the resistant genotype, while expression of the other candidate genes was nearly undetectable. These data support Rpp4C4 (PI459025B) as the single candidate gene for Rpp4-mediated resistance to Asian soybean rust.
Collapse
|
27
|
Establishment of a soybean (Glycine max Merr. L) transposon-based mutagenesis repository. PLANTA 2009; 229:279-89. [PMID: 18855007 DOI: 10.1007/s00425-008-0827-9] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2008] [Accepted: 09/14/2008] [Indexed: 05/26/2023]
Abstract
Soybean is a major crop species providing valuable feedstock for food, feed and biofuel. In recent years, considerable progress has been made in developing genomic resources for soybean, including on-going efforts to sequence the genome. These efforts have identified a large number of soybean genes, most with unknown function. Therefore, a major research priority is determining the function of these genes, especially those involved in agronomic performance and seed traits. One means to study gene function is through mutagenesis and the study of the resulting phenotypes. Transposon-tagging has been used successfully in both model and crop plants to support studies of gene function. In this report, we describe efforts to generate a transposon-based mutant collection of soybean. The Ds transposon system was used to create activation-tagging, gene and enhancer trap elements. Currently, the repository houses approximately 900 soybean events, with flanking sequence data derived from 200 of these events. Analysis of the insertions revealed approximately 70% disrupted known genes, with the majority matching sequences derived from either Glycine max or Medicago truncatula sequences. Among the mutants generated, one resulted in male-sterility and was shown to disrupt the strictosidine synthase gene. This example clearly demonstrates that it is possible to disrupt soybean gene function by insertional mutagenesis and to derive useful mutants by this approach in spite of the tetraploid nature of the soybean genome.
Collapse
|
28
|
Fractionation of synteny in a genomic region containing tandemly duplicated genes across glycine max, Medicago truncatula, and Arabidopsis thaliana. J Hered 2008; 99:390-5. [PMID: 18316321 DOI: 10.1093/jhered/esn010] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2023] Open
Abstract
Extended comparison of gene sequences found on homeologous soybean Bacterial Artificial Chromosomes to Medicago truncatula and Arabidopsis thaliana genomic sequences demonstrated a network of synteny within conserved regions interrupted by gene addition and/or deletions. Consolidation of gene order among all 3 species provides a picture of ancestral gene order. The observation supports a genome history of fractionation resulting from gene loss/addition and rearrangement. In all 3 species, clusters of N-hydroxycinnamoyl/benzoyltransferase genes were identified in tandemly duplicated clusters. Parsimony-based gene trees suggest that the genes within the arrays have independently undergone tandem duplication in each species.
Collapse
|
29
|
High-throughput genotyping with the GoldenGate assay in the complex genome of soybean. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2008; 116:945-52. [PMID: 18278477 DOI: 10.1007/s00122-008-0726-2] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 01/28/2008] [Indexed: 05/05/2023]
Abstract
Large numbers of single nucleotide polymorphism (SNP) markers are now available for a number of crop species. However, the high-throughput methods for multiplexing SNP assays are untested in complex genomes, such as soybean, that have a high proportion of paralogous genes. The Illumina GoldenGate assay is capable of multiplexing from 96 to 1,536 SNPs in a single reaction over a 3-day period. We tested the GoldenGate assay in soybean to determine the success rate of converting verified SNPs into working assays. A custom 384-SNP GoldenGate assay was designed using SNPs that had been discovered through the resequencing of five diverse accessions that are the parents of three recombinant inbred line (RIL) mapping populations. The 384 SNPs that were selected for this custom assay were predicted to segregate in one or more of the RIL mapping populations. Allelic data were successfully generated for 89% of the SNP loci (342 of the 384) when it was used in the three RIL mapping populations, indicating that the complex nature of the soybean genome had little impact on conversion of the discovered SNPs into usable assays. In addition, 80% of the 342 mapped SNPs had a minor allele frequency >10% when this assay was used on a diverse sample of Asian landrace germplasm accessions. The high success rate of the GoldenGate assay makes this a useful technique for quickly creating high density genetic maps in species where SNP markers are rapidly becoming available.
Collapse
|
30
|
Sequence level analysis of recently duplicated regions in soybean [Glycine max (L.) Merr.] genome. DNA Res 2008; 15:93-102. [PMID: 18334514 PMCID: PMC2650623 DOI: 10.1093/dnares/dsn001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Accepted: 01/16/2008] [Indexed: 11/18/2022] Open
Abstract
A single recessive gene, rxp, on linkage group (LG) D2 controls bacterial leaf-pustule resistance in soybean. We identified two homoeologous contigs (GmA and GmA') composed of five bacterial artificial chromosomes (BACs) during the selection of BAC clones around Rxp region. With the recombinant inbred line population from the cross of Pureunkong and Jinpumkong 2, single-nucleotide polymorphism and simple sequence repeat marker genotyping were able to locate GmA' on LG A1. On the basis of information in the Soybean Breeders Toolbox and our results, parts of LG A1 and LG D2 share duplicated regions. Alignment and annotation revealed that many homoeologous regions contained kinases and proteins related to signal transduction pathway. Interestingly, inserted sequences from GmA and GmA' had homology with transposase and integrase. Estimation of evolutionary events revealed that speciation of soybean from Medicago and the recent divergence of two soybean homoeologous regions occurred at 60 and 12 million years ago, respectively. Distribution of synonymous substitution patterns, K(s), yielded a first secondary peak (mode K(s) = 0.10-0.15) followed by two smaller bulges were displayed between soybean homologous regions. Thus, diploidized paleopolyploidy of soybean genome was again supported by our study.
Collapse
|
31
|
Microsatellite discovery from BAC end sequences and genetic mapping to anchor the soybean physical and genetic maps. Genome 2008; 51:294-302. [PMID: 18356965 DOI: 10.1139/g08-010] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Whole-genome sequencing of the soybean (Glycine max (L.) Merr. 'Williams 82') has made it important to integrate its physical and genetic maps. To facilitate this integration of maps, we screened 3290 microsatellites (SSRs) identified from BAC end sequences of clones comprising the 'Williams 82' physical map. SSRs were screened against 3 mapping populations. We found the AAT and ACT motifs produced the greatest frequency of length polymorphisms, ranging from 17.2% to 32.3% and from 11.8% to 33.3%, respectively. Other useful motifs include the dinucleotide repeats AG, AT, and AG, with frequency of length polymorphisms ranging from 11.2% to 18.4% (AT), 12.4% to 20.6% (AG), and 11.3% to 16.4% (GT). Repeat lengths less than 16 bp were generally less useful than repeat lengths of 40-60 bp. Two hundred and sixty-five SSRs were genetically mapped in at least one population. Of the 265 mapped SSRs, 60 came from BAC singletons not yet placed into contigs of the physical map. One hundred and ten originated in BACs located in contigs for which no genetic map location was previously known. Ninety-five SSRs came from BACs within contigs for which one or more other BACs had already been mapped. For these fingerprinted contigs (FPC) a high percentage of the mapped markers showed inconsistent map locations. A strategy is introduced by which physical and genetic map inconsistencies can be resolved using the preliminary 4x assembly of the whole genome sequence of soybean.
Collapse
|
32
|
Microsatellite discovery from BAC end sequences and genetic mapping to anchor the soybean physical and genetic maps. Genome 2008. [PMID: 18356965 DOI: 10.1139/g08‐010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Whole-genome sequencing of the soybean (Glycine max (L.) Merr. 'Williams 82') has made it important to integrate its physical and genetic maps. To facilitate this integration of maps, we screened 3290 microsatellites (SSRs) identified from BAC end sequences of clones comprising the 'Williams 82' physical map. SSRs were screened against 3 mapping populations. We found the AAT and ACT motifs produced the greatest frequency of length polymorphisms, ranging from 17.2% to 32.3% and from 11.8% to 33.3%, respectively. Other useful motifs include the dinucleotide repeats AG, AT, and AG, with frequency of length polymorphisms ranging from 11.2% to 18.4% (AT), 12.4% to 20.6% (AG), and 11.3% to 16.4% (GT). Repeat lengths less than 16 bp were generally less useful than repeat lengths of 40-60 bp. Two hundred and sixty-five SSRs were genetically mapped in at least one population. Of the 265 mapped SSRs, 60 came from BAC singletons not yet placed into contigs of the physical map. One hundred and ten originated in BACs located in contigs for which no genetic map location was previously known. Ninety-five SSRs came from BACs within contigs for which one or more other BACs had already been mapped. For these fingerprinted contigs (FPC) a high percentage of the mapped markers showed inconsistent map locations. A strategy is introduced by which physical and genetic map inconsistencies can be resolved using the preliminary 4x assembly of the whole genome sequence of soybean.
Collapse
|
33
|
Microarray analysis of iron deficiency chlorosis in near-isogenic soybean lines. BMC Genomics 2007; 8:476. [PMID: 18154662 PMCID: PMC2253546 DOI: 10.1186/1471-2164-8-476] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2007] [Accepted: 12/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Iron is one of fourteen mineral elements required for proper plant growth and development of soybean (Glycine max L. Merr.). Soybeans grown on calcareous soils, which are prevalent in the upper Midwest of the United States, often exhibit symptoms indicative of iron deficiency chlorosis (IDC). Yield loss has a positive linear correlation with increasing severity of chlorotic symptoms. As soybean is an important agronomic crop, it is essential to understand the genetics and physiology of traits affecting plant yield. Soybean cultivars vary greatly in their ability to respond successfully to iron deficiency stress. Microarray analyses permit the identification of genes and physiological processes involved in soybean's response to iron stress. RESULTS RNA isolated from the roots of two near isogenic lines, which differ in iron efficiency, PI 548533 (Clark; iron efficient) and PI 547430 (IsoClark; iron inefficient), were compared on a spotted microarray slide containing 9,728 cDNAs from root specific EST libraries. A comparison of RNA transcripts isolated from plants grown under iron limiting hydroponic conditions for two weeks revealed 43 genes as differentially expressed. A single linkage clustering analysis of these 43 genes showed 57% of them possessed high sequence similarity to known stress induced genes. A control experiment comparing plants grown under adequate iron hydroponic conditions showed no differences in gene expression between the two near isogenic lines. Expression levels of a subset of the differentially expressed genes were also compared by real time reverse transcriptase PCR (RT-PCR). The RT-PCR experiments confirmed differential expression between the iron efficient and iron inefficient plants for 9 of 10 randomly chosen genes examined. To gain further insight into the iron physiological status of the plants, the root iron reductase activity was measured in both iron efficient and inefficient genotypes for plants grown under iron sufficient and iron limited conditions. Iron inefficient plants failed to respond to decreased iron availability with increased activity of Fe reductase. CONCLUSION These experiments have identified genes involved in the soybean iron deficiency chlorosis response under iron deficient conditions. Single linkage cluster analysis suggests iron limited soybeans mount a general stress response as well as a specialized iron deficiency stress response. Root membrane bound reductase capacity is often correlated with iron efficiency. Under iron-limited conditions, the iron efficient plant had high root bound membrane reductase capacity while the iron inefficient plants reductase levels remained low, further limiting iron uptake through the root. Many of the genes up-regulated in the iron inefficient NIL are involved in known stress induced pathways. The most striking response of the iron inefficient genotype to iron deficiency stress was the induction of a profusion of signaling and regulatory genes, presumably in an attempt to establish and maintain cellular homeostasis. Genes were up-regulated that point toward an increased transport of molecules through membranes. Genes associated with reactive oxidative species and an ROS-defensive enzyme were also induced. The up-regulation of genes involved in DNA repair and RNA stability reflect the inhospitable cellular environment resulting from iron deficiency stress. Other genes were induced that are involved in protein and lipid catabolism; perhaps as an effort to maintain carbon flow and scavenge energy. The under-expression of a key glycolitic gene may result in the iron-inefficient genotype being energetically challenged to maintain a stable cellular environment. These experiments have identified candidate genes and processes for further experimentation to increase our understanding of soybeans' response to iron deficiency stress.
Collapse
|
34
|
Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing. BMC Genomics 2007; 8:330. [PMID: 17880721 PMCID: PMC2077340 DOI: 10.1186/1471-2164-8-330] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 09/19/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Soybean, Glycine max (L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly. RESULTS Seventeen BACs representing approximately 2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences. CONCLUSION This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues.
Collapse
|
35
|
Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing. BMC Genomics 2007. [PMID: 17880721 DOI: 10.1186/1471‐2164‐8‐330] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Soybean, Glycine max (L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly. RESULTS Seventeen BACs representing approximately 2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences. CONCLUSION This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues.
Collapse
|
36
|
Recovering from iron deficiency chlorosis in near-isogenic soybeans: a microarray study. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2007; 45:287-92. [PMID: 17466527 DOI: 10.1016/j.plaphy.2007.03.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Iron deficiency chlorosis (IDC) in soybeans has proven to be a perennial problem in the calcareous soils of the U.S. upper Midwest. A historically difficult trait to study in fields, the use of hydroponics in a controlled greenhouse environment has provided a mechanism to study genetic variation while limiting environmental complications. IDC susceptible plants growing in calcareous soils and in iron-controlled hydroponic experiments often exhibit a characteristic chlorotic phenotype early in the growing season but are able to re-green later in the season. To examine the changes in gene expression of these plants, near-isogenic lines, iron efficient PI548553 (Clark) and iron inefficient PI547430 (IsoClark), developed for their response to iron deficiency stress [USDA, ARS, National Genetic Resources Program, Germplasm Resources Information Network - GRIN. (Online Database) National Germplasm Resources Laboratory, Beltsville, MD, 2004. Available: http://www.ars.grin.gov/cgi-bin/npgs/html/acc_search.pl?accid=PI+547430. [22] were grown in iron-deficient hydroponic conditions for one week, then transferred to iron sufficient conditions for another week. This induced a phenotypic response mimicking the growth of the plants in the field; initial chlorosis followed by re-greening. RNA was isolated from root tissue and transcript profiles were examined between the two near-isogenic lines using publicly available cDNA microarrays. By alleviating the iron deficiency stress our expectation was that plants would return to baseline expression levels. However, the microarray comparison identified four cDNAs that were under-expressed by a two-fold or greater difference in the iron inefficient plant compared to the iron efficient plant. This differential expression was re-examined and confirmed by real time PCR experimentation. Control experiments showed that these genes are not differentially expressed in plants grown continually under iron rich hydroponic conditions. The expression differences suggest potential residual effects of iron deficiency on plant health.
Collapse
|
37
|
A soybean transcript map: gene distribution, haplotype and single-nucleotide polymorphism analysis. Genetics 2007; 176:685-96. [PMID: 17339218 PMCID: PMC1893076 DOI: 10.1534/genetics.107.070821] [Citation(s) in RCA: 261] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 02/16/2007] [Indexed: 11/18/2022] Open
Abstract
The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome. Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence. From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes. In the resulting 2.44 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta= 0.000997. The analysis of the observed genetic distances between adjacent genes vs. the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered. Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5-10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM. The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement.
Collapse
|
38
|
Highly variable patterns of linkage disequilibrium in multiple soybean populations. Genetics 2007; 175:1937-44. [PMID: 17287533 PMCID: PMC1855121 DOI: 10.1534/genetics.106.069740] [Citation(s) in RCA: 110] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2006] [Accepted: 02/01/2007] [Indexed: 11/18/2022] Open
Abstract
Prospects for utilizing whole-genome association analysis in autogamous plant populations appear promising due to the reported high levels of linkage disequilibrium (LD). To determine the optimal strategies for implementing association analysis in soybean (Glycine max L. Merr.), we analyzed the structure of LD in three regions of the genome varying in length from 336 to 574 kb. This analysis was conducted in four distinct groups of soybean germplasm: 26 accessions of the wild ancestor of soybean (Glycine soja Seib. et Zucc.); 52 Asian G. max Landraces, the immediate results of domestication from G. soja; 17 Asian Landrace introductions that became the ancestors of North American (N. Am.) cultivars, and 25 Elite Cultivars from N. Am. In G. soja, LD did not extend past 100 kb; however, in the three cultivated G. max groups, LD extended from 90 to 574 kb, likely due to the impacts of domestication and increased self-fertilization. The three genomic regions were highly variable relative to the extent of LD within the three cultivated soybean populations. G. soja appears to be ideal for fine mapping of genes, but due to the highly variable levels of LD in the Landraces and the Elite Cultivars, whole-genome association analysis in soybean may be more difficult than first anticipated.
Collapse
|
39
|
Impacts of genetic bottlenecks on soybean genome diversity. Proc Natl Acad Sci U S A 2006; 103:16666-16671. [PMID: 17068128 DOI: 10.1073/pnas.060437910] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
Soybean has undergone several genetic bottlenecks. These include domestication in Asia to produce numerous Asian landraces, introduction of relatively few landraces to North America, and then selective breeding over the past 75 years. It is presumed that these three human-mediated events have reduced genetic diversity. We sequenced 111 fragments from 102 genes in four soybean populations representing the populations before and after genetic bottlenecks. We show that soybean has lost many rare sequence variants and has undergone numerous allele frequency changes throughout its history. Although soybean genetic diversity has been eroded by human selection after domestication, it is notable that modern cultivars have retained 72% of the sequence diversity present in the Asian landraces but lost 79% of rare alleles (frequency </=0.10) found in the Asian landraces. Simulations indicated that the diversity lost through the genetic bottlenecks of introduction and plant breeding was mostly due to the small number of Asian introductions and not the artificial selection subsequently imposed by selective breeding. The bottleneck with the most impact was domestication; when the low sequence diversity present in the wild species was halved, 81% of the rare alleles were lost, and 60% of the genes exhibited evidence of significant allele frequency changes.
Collapse
|
40
|
Abstract
Soybean has undergone several genetic bottlenecks. These include domestication in Asia to produce numerous Asian landraces, introduction of relatively few landraces to North America, and then selective breeding over the past 75 years. It is presumed that these three human-mediated events have reduced genetic diversity. We sequenced 111 fragments from 102 genes in four soybean populations representing the populations before and after genetic bottlenecks. We show that soybean has lost many rare sequence variants and has undergone numerous allele frequency changes throughout its history. Although soybean genetic diversity has been eroded by human selection after domestication, it is notable that modern cultivars have retained 72% of the sequence diversity present in the Asian landraces but lost 79% of rare alleles (frequency =0.10) found in the Asian landraces. Simulations indicated that the diversity lost through the genetic bottlenecks of introduction and plant breeding was mostly due to the small number of Asian introductions and not the artificial selection subsequently imposed by selective breeding. The bottleneck with the most impact was domestication; when the low sequence diversity present in the wild species was halved, 81% of the rare alleles were lost, and 60% of the genes exhibited evidence of significant allele frequency changes.
Collapse
|
41
|
Sequence conservation of homeologous bacterial artificial chromosomes and transcription of homeologous genes in soybean (Glycine max L. Merr.). Genetics 2006; 174:1017-28. [PMID: 16888343 PMCID: PMC1602103 DOI: 10.1534/genetics.105.055020] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2005] [Accepted: 07/19/2006] [Indexed: 11/18/2022] Open
Abstract
The paleopolyploid soybean genome was investigated by sequencing homeologous BAC clones anchored by duplicate N-hydroxycinnamoyl/benzoyltransferase (HCBT) genes. The homeologous BACs were genetically mapped to linkage groups C1 and C2. Annotation of the 173,747- and 98,760-bp BACs showed that gene conservation in both order and orientation is high between homeologous regions with only a single gene insertion/deletion and local tandem duplications differing between the regions. The nucleotide sequence conservation extends into intergenic regions as well, probably due to conserved regulatory sequences. Most of the homeologs appear to have a role in either transcription/DNA binding or cellular signaling, suggesting a potential preference for retention of duplicate genes with these functions. Reverse transcriptase-PCR analysis of homeologs showed that in the tissues sampled, most homeologs have not diverged greatly in their transcription profiles. However, four cases of changes in transcription were identified, primarily in the HCBT gene cluster. Because a mapped locus corresponds to a soybean cyst nematode (SCN) QTL, the potential role of HCBT genes in response to SCN is discussed. These results are the first sequenced-based analysis of homeologous BACs in soybean, a diploidized paleopolyploid.
Collapse
|
42
|
Genome studies and molecular genetics. Part 1: Model legumes. Exploring the structure, function and evolution of legume genomes. CURRENT OPINION IN PLANT BIOLOGY 2006; 9:95-8. [PMID: 16473039 DOI: 10.1016/j.pbi.2006.01.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/04/2006] [Accepted: 01/25/2006] [Indexed: 05/06/2023]
|
43
|
Paleopolyploidy and gene duplication in soybean and other legumes. CURRENT OPINION IN PLANT BIOLOGY 2006; 9:104-9. [PMID: 16458041 DOI: 10.1016/j.pbi.2006.01.007] [Citation(s) in RCA: 140] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Accepted: 01/23/2006] [Indexed: 05/06/2023]
Abstract
Two of the most important observations from whole-genome sequences have been the high rate of gene birth and death and the prevalence of large-scale duplication events, including polyploidy. There is also a growing appreciation that polyploidy is more than the sum of the gene duplications it creates, in part because polyploidy duplicates the members of entire regulatory networks. Thus, it may be important to distinguish paralogs that are produced by individual gene duplications from the homoeologous sequences produced by (allo)polyploidy. This is not a simple task, for several reasons, including the chromosomally cryptic nature of many duplications and the variable rates of gene evolution. Recent progress has been made in understanding patterns of gene and genome duplication in the legume family, specifically in soybean.
Collapse
|
44
|
Pericentromeric regions of soybean (Glycine max L. Merr.) chromosomes consist of retroelements and tandemly repeated DNA and are structurally and evolutionarily labile. Genetics 2005; 170:1221-30. [PMID: 15879505 PMCID: PMC1451161 DOI: 10.1534/genetics.105.041616] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2005] [Accepted: 04/01/2005] [Indexed: 11/18/2022] Open
Abstract
Little is known about the physical makeup of heterochromatin in the soybean (Glycine max L. Merr.) genome. Using DNA sequencing and molecular cytogenetics, an initial analysis of the repetitive fraction of the soybean genome is presented. BAC 076J21, derived from linkage group L, has sequences conserved in the pericentromeric heterochromatin of all 20 chromosomes. FISH analysis of this BAC and three subclones on pachytene chromosomes revealed relatively strict partitioning of the heterochromatic and euchromatic regions. Sequence analysis showed that this BAC consists primarily of repetitive sequences such as a 102-bp tandem repeat with sequence identity to a previously characterized approximately 120-bp repeat (STR120). Fragments of Calypso-like retroelements, a recently inserted SIRE1 element, and a SIRE1 solo LTR were present within this BAC. Some of these sequences are methylated and are not conserved outside of G. max and G. soja, a close relative of soybean, except for STR102, which hybridized to a restriction fragment from G. latifolia. These data present a picture of the repetitive fraction of the soybean genome that is highly concentrated in the pericentromeric regions, consisting of rapidly evolving tandem repeats with interspersed retroelements.
Collapse
|
45
|
Placing Paleopolyploidy in Relation to Taxon Divergence: A Phylogenetic Analysis in Legumes Using 39 Gene Families. Syst Biol 2005; 54:441-54. [PMID: 16012110 DOI: 10.1080/10635150590945359] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
Abstract
Young polyploid events are easily diagnosed by various methods, but older polyploid events become increasingly difficult to identify as chromosomal rearrangements, tandem gene or partial chromosome duplications, changes in substitution rates among duplicated genes, pseudogenization or locus loss, and interlocus interactions complicate the means of inferring past genetic events. Genomic data have provided valuable information about the polyploid history of numerous species, but on their own fail to show whether related species, each with a polyploid past, share a particular polyploid event. A phylogenetic approach provides a powerful method to determine this but many processes may mislead investigators. These processes can affect individual gene trees, but most likely will not affect all genes, and almost certainly will not affect all genes in the same way. Thus, a multigene approach, which combines the large-scale aspect of genomics with the resolution of phylogenetics, has the power to overcome these difficulties and allow us to infer genomic events further into the past than would otherwise be possible. Previous work using synonymous distances among gene pairs within species has shown evidence for large-scale duplications in the legumes Glycine max and Medicago truncatula. We present a case study using 39 gene families, each with three or four members in G. max and the putative orthologues in M. truncatula, rooted using Arabidopsis thaliana. We tested whether the gene duplications in these legumes occurred separately in each lineage after their divergence (Hypothesis 1), or whether they share a round of gene duplications (Hypothesis 2). Many more gene family topologies supported Hypothesis 2 over Hypothesis 1 (11 and 2, respectively), even after synonymous distance analysis revealed that some topologies were providing misleading results. Only ca. 33% of genes examined support either hypothesis, which strongly suggests that single gene family approaches may be insufficient when studying ancient events with nuclear DNA. Our results suggest that G. max and M. truncatula, along with approximately 7000 other legume species from the same clade, share an ancient round of gene duplications, either due to polyploidy or to some other process.
Collapse
|
46
|
Legumes as a model plant family. Genomics for food and feed report of the Cross-Legume Advances Through Genomics Conference. PLANT PHYSIOLOGY 2005; 137:1228-35. [PMID: 15824285 PMCID: PMC1088316 DOI: 10.1104/pp.105.060871] [Citation(s) in RCA: 120] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2005] [Revised: 02/24/2005] [Accepted: 02/28/2005] [Indexed: 05/18/2023]
|
47
|
Bridging model and crop legumes through comparative genomics. PLANT PHYSIOLOGY 2005; 137:1189-96. [PMID: 15824281 PMCID: PMC1088312 DOI: 10.1104/pp.104.058891] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2004] [Revised: 01/18/2005] [Accepted: 01/24/2005] [Indexed: 05/18/2023]
|
48
|
Abstract
UNLABELLED ESTminer is a collection of programs that use expressed sequence tag (EST) data from inbred genomes to identify unique genes within gene families. The algorithm utilizes Cap3 to perform an initial clustering of related EST sequences to produce a consensus sequence of a gene family. These consensus sequences are then used to collect all ESTs in the original EST library that are related using BLAST. A redundancy based criterion is applied to each EST to identify reliable unique gene-sequences. Using a highly inbred genome as a source of ESTs eliminates the necessity of computing covariance on each polymorphism to identify alleles of the same gene, thus making this algorithm more streamlined than other alternatives which must computationally attempt to distinguish genes from alleles. AVAILABILITY The programs were written in PERL and are freely available at http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html CONTACT nelsonrt@iastate.edu SUPPLEMENTARY INFORMATION Figures and dataset can be obtained from: http://www.soybase.org/publication_data/Nelson/ESTminer/ESTminer.html.
Collapse
|
49
|
Abstract
Using plant EST collections, we obtained 1392 potential gene duplicates across 8 plant species: Zea mays, Oryza sativa, Sorghum bicolor, Hordeum vulgare, Solanum tuberosum, Lycopersicon esculentum, Medicago truncatula, and Glycine max. We estimated the synonymous and nonsynonymous distances between each gene pair and identified two to three mixtures of normal distributions corresponding to one to three rounds of genome duplication in each species. Within the Poaceae, we found a conserved duplication event among all four species that occurred approximately 50-60 million years ago (Mya); an event that probably occurred before the major radiation of the grasses. In the Solanaceae, we found evidence for a conserved duplication event approximately 50-52 Mya. A duplication in soybean occurred approximately 44 Mya and a duplication in Medicago about 58 Mya. Comparing synonymous and nonsynonymous distances allowed us to determine that most duplicate gene pairs are under purifying, negative selection. We calculated Pearson's correlation coefficients to provide us with a measure of how gene expression patterns have changed between duplicate pairs, and compared this across evolutionary distances. This analysis showed that some duplicates seemed to retain expression patterns between pairs, whereas others showed uncorrelated expression.
Collapse
|
50
|
Features of a 103-kb gene-rich region in soybean include an inverted perfect repeat cluster of CHS genes comprising the I locus. Genome 2004; 47:819-31. [PMID: 15499396 DOI: 10.1139/g04-049] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The I locus in soybean (Glycine max) corresponds to a region of chalcone synthase (CHS) gene duplications affecting seed pigmentation. We sequenced and annotated BAC clone 104J7, which harbors a dominant i(i) allele from Glycine max 'Williams 82', to gain insight into the genetic structure of this multigenic region in addition to examining its flanking regions. The 103-kb BAC encompasses a gene-rich region with 11 putatively expressed genes. In addition to six copies of CHS, these genes include: a geranylgeranyltransferase type II beta subunit (E.C.2.5.1.60), a beta-galactosidase, a putative spermine and (or) spermidine synthase (E.C.2.5.1.16), and an unknown expressed gene. Strikingly, sequencing data revealed that the 10.91-kb CHS1, CHS3, CHS4 cluster is present as a perfect inverted repeat separated by 5.87 kb. Contiguous arrangement of CHS paralogs could lead to folding into multiple secondary structures, hypothesized to induce deletions that have previously been shown to effect CHS expression. BAC104J7 also contains several gene fragments representing a cation/hydrogen exchanger, a 40S ribosomal protein, a CBL-interacting protein kinase, and the amino terminus of a subtilisin. Chimeric ESTs were identified that may represent read-through transcription from a flanking truncated gene into a CHS cluster, generating aberrant CHS RNA molecules that could play a role in CHS gene silencing.
Collapse
|