26
|
Renny-Byfield S, Page JT, Udall JA, Sanders WS, Peterson DG, Arick MA, Grover CE, Wendel JF. Independent Domestication of Two Old World Cotton Species. Genome Biol Evol 2016; 8:1940-7. [PMID: 27289095 PMCID: PMC4943200 DOI: 10.1093/gbe/evw129] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/26/2016] [Indexed: 11/16/2022] Open
Abstract
Domesticated cotton species provide raw material for the majority of the world's textile industry. Two independent domestication events have been identified in allopolyploid cotton, one in Upland cotton (Gossypium hirsutum L.) and the other to Egyptian cotton (Gossypium barbadense L.). However, two diploid cotton species, Gossypium arboreum L. and Gossypium herbaceum L., have been cultivated for several millennia, but their status as independent domesticates has long been in question. Using genome resequencing data, we estimated the global abundance of various repetitive DNAs. We demonstrate that, despite negligible divergence in genome size, the two domesticated diploid cotton species contain different, but compensatory, repeat content and have thus experienced cryptic alterations in repeat abundance despite equivalence in genome size. Evidence of independent origin is bolstered by estimates of divergence times based on molecular evolutionary analysis of f7,000 orthologous genes, for which synonymous substitution rates suggest that G. arboreum and G. herbaceum last shared a common ancestor approximately 0.4-2.5 Ma. These data are incompatible with a shared domestication history during the emergence of agriculture and lead to the conclusion that G. arboreum and G. herbaceum were each domesticated independently.
Collapse
|
27
|
Page JT, Liechty ZS, Alexander RH, Clemons K, Hulse-Kemp AM, Ashrafi H, Van Deynze A, Stelly DM, Udall JA. DNA Sequence Evolution and Rare Homoeologous Conversion in Tetraploid Cotton. PLoS Genet 2016; 12:e1006012. [PMID: 27168520 PMCID: PMC4864293 DOI: 10.1371/journal.pgen.1006012] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2016] [Accepted: 04/06/2016] [Indexed: 01/08/2023] Open
Abstract
Allotetraploid cotton species are a vital source of spinnable fiber for textiles. The polyploid nature of the cotton genome raises many evolutionary questions as to the relationships between duplicated genomes. We describe the evolution of the cotton genome (SNPs and structural variants) with the greatly improved resolution of 34 deeply re-sequenced genomes. We also explore the evolution of homoeologous regions in the AT- and DT-genomes and especially the phenomenon of conversion between genomes. We did not find any compelling evidence for homoeologous conversion between genomes. These findings are very different from other recent reports of frequent conversion events between genomes. We also identified several distinct regions of the genome that have been introgressed between G. hirsutum and G. barbadense, which presumably resulted from breeding efforts targeting associated beneficial alleles. Finally, the genotypic data resulting from this study provides access to a wealth of diversity sorely needed in the narrow germplasm of cotton cultivars.
Collapse
|
28
|
Clouse JW, Adhikary D, Page JT, Ramaraj T, Deyholos MK, Udall JA, Fairbanks DJ, Jellen EN, Maughan PJ. The Amaranth Genome: Genome, Transcriptome, and Physical Map Assembly. THE PLANT GENOME 2016; 9. [PMID: 27898770 DOI: 10.3835/plantgenome2015.07.0062] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Amaranth ( L.) is an emerging pseudocereal native to the New World that has garnered increased attention in recent years because of its nutritional quality, in particular its seed protein and more specifically its high levels of the essential amino acid lysine. It belongs to the Amaranthaceae family, is an ancient paleopolyploid that shows disomic inheritance (2 = 32), and has an estimated genome size of 466 Mb. Here we present a high-quality draft genome sequence of the grain amaranth. The genome assembly consisted of 377 Mb in 3518 scaffolds with an N of 371 kb. Repetitive element analysis predicted that 48% of the genome is comprised of repeat sequences, of which -like elements were the most commonly classified retrotransposon. A de novo transcriptome consisting of 66,370 contigs was assembled from eight different amaranth tissue and abiotic stress libraries. Annotation of the genome identified 23,059 protein-coding genes. Seven grain amaranths (, , and ) and their putative progenitor () were resequenced. A single nucleotide polymorphism (SNP) phylogeny supported the classification of as the progenitor species of the grain amaranths. Lastly, we generated a de novo physical map for using the BioNano Genomics' Genome Mapping platform. The physical map spanned 340 Mb and a hybrid assembly using the BioNano physical maps nearly doubled the N of the assembly to 697 kb. Moreover, we analyzed synteny between amaranth and sugar beet ( L.) and estimated, using analysis, the age of the most recent polyploidization event in amaranth.
Collapse
|
29
|
Huynh MD, Page JT, Richardson BA, Udall JA. Insights into transcriptomes of big and low sagebrush. PLoS One 2015; 10:e0127593. [PMID: 26020526 PMCID: PMC4447352 DOI: 10.1371/journal.pone.0127593] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2014] [Accepted: 04/16/2015] [Indexed: 01/25/2023] Open
Abstract
We report the sequencing and assembly of three transcriptomes from Big (Artemisia tridentata ssp. wyomingensis and A. tridentata ssp. tridentata) and Low (A. arbuscula ssp. arbuscula) sagebrush. The sequence reads are available in the Sequence Read Archive of NCBI. We demonstrate the utilities of these transcriptomes for gene discovery and phylogenomic analysis. An assembly of 61,883 transcripts followed by transcript identification by the program TRAPID revealed 16 transcripts directly related to terpene synthases, proteins critical to the production of multiple secondary metabolites in sagebrush. A putative terpene synthase was identified in two of our sagebrush samples. Using paralogs with synonymous mutations we reconstructed an evolutionary time line of ancient genome duplications. By applying a constant mutation rate to the data we estimate that these three ancient duplications occurred about 18, 34 and 60 million years ago. These transcriptomes offer a foundation for future studies of sagebrush, including inferences in chemical defense and the identification of species and subspecies of sagebrush for restoration and preservation of the threatened sage-grouse.
Collapse
|
30
|
Page JT, Udall JA. Methods for mapping and categorization of DNA sequence reads from allopolyploid organisms. BMC Genet 2015; 16 Suppl 2:S4. [PMID: 25951770 PMCID: PMC4423573 DOI: 10.1186/1471-2156-16-s2-s4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genome read categorization determines the genome of origin for sequence reads from an allopolyploid organism. Different techniques have been used to perform read categorization, mostly based on homoeo-SNPs identified between extant diploid relatives of allopolyploids. We present a novel technique for read categorization implemented by the software PolyDog. We demonstrate its accuracy and improved categorization relative to other methods. We discuss the situations in which one method or another might be most appropriate.
Collapse
|
31
|
Hulse-Kemp AM, Lemm J, Plieske J, Ashrafi H, Buyyarapu R, Fang DD, Frelichowski J, Giband M, Hague S, Hinze LL, Kochan KJ, Riggs PK, Scheffler JA, Udall JA, Ulloa M, Wang SS, Zhu QH, Bag SK, Bhardwaj A, Burke JJ, Byers RL, Claverie M, Gore MA, Harker DB, Islam MS, Jenkins JN, Jones DC, Lacape JM, Llewellyn DJ, Percy RG, Pepper AE, Poland JA, Mohan Rai K, Sawant SV, Singh SK, Spriggs A, Taylor JM, Wang F, Yourstone SM, Zheng X, Lawley CT, Ganal MW, Van Deynze A, Wilson IW, Stelly DM. Development of a 63K SNP Array for Cotton and High-Density Mapping of Intraspecific and Interspecific Populations of Gossypium spp. G3 (BETHESDA, MD.) 2015; 5:1187-209. [PMID: 25908569 PMCID: PMC4478548 DOI: 10.1534/g3.115.018416] [Citation(s) in RCA: 121] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2015] [Accepted: 04/11/2015] [Indexed: 11/18/2022]
Abstract
High-throughput genotyping arrays provide a standardized resource for plant breeding communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), complex trait dissection, and studying patterns of genomic diversity among cultivars and wild accessions. We have developed the CottonSNP63K, an Illumina Infinium array containing assays for 45,104 putative intraspecific single nucleotide polymorphism (SNP) markers for use within the cultivated cotton species Gossypium hirsutum L. and 17,954 putative interspecific SNP markers for use with crosses of other cotton species with G. hirsutum. The SNPs on the array were developed from 13 different discovery sets that represent a diverse range of G. hirsutum germplasm and five other species: G. barbadense L., G. tomentosum Nuttal × Seemann, G. mustelinum Miers × Watt, G. armourianum Kearny, and G. longicalyx J.B. Hutchinson and Lee. The array was validated with 1,156 samples to generate cluster positions to facilitate automated analysis of 38,822 polymorphic markers. Two high-density genetic maps containing a total of 22,829 SNPs were generated for two F2 mapping populations, one intraspecific and one interspecific, and 3,533 SNP markers were co-occurring in both maps. The produced intraspecific genetic map is the first saturated map that associates into 26 linkage groups corresponding to the number of cotton chromosomes for a cross between two G. hirsutum lines. The linkage maps were shown to have high levels of collinearity to the JGI G. raimondii Ulbrich reference genome sequence. The CottonSNP63K array, cluster file and associated marker sequences constitute a major new resource for the global cotton research community.
Collapse
|
32
|
Page JT, Liechty ZS, Huynh MD, Udall JA. BamBam: genome sequence analysis tools for biologists. BMC Res Notes 2014; 7:829. [PMID: 25421351 PMCID: PMC4258253 DOI: 10.1186/1756-0500-7-829] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 10/24/2014] [Indexed: 12/22/2022] Open
Abstract
Background Massive computational power is needed to analyze the genomic data produced by next-generation sequencing, but extensive computational experience and specific knowledge of algorithms should not be necessary to run genomic analyses or interpret their results. Findings We present BamBam, a package of tools for genome sequence analysis. BamBam contains tools that facilitate summarizing data from BAM alignment files and identifying features such as SNPs, indels, and haplotypes represented in those alignments. Conclusions BamBam provides a powerful and convenient framework to analyze genome sequence data contained in BAM files. Electronic supplementary material The online version of this article (doi:10.1186/1756-0500-7-829) contains supplementary material, which is available to authorized users.
Collapse
|
33
|
Page JT, Liechty ZS, Huynh MD, Udall JA. BamBam: genome sequence analysis tools for biologists. BMC Res Notes 2014. [PMID: 25421351 DOI: 10.1186/1756‐0500‐7‐829] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Massive computational power is needed to analyze the genomic data produced by next-generation sequencing, but extensive computational experience and specific knowledge of algorithms should not be necessary to run genomic analyses or interpret their results. FINDINGS We present BamBam, a package of tools for genome sequence analysis. BamBam contains tools that facilitate summarizing data from BAM alignment files and identifying features such as SNPs, indels, and haplotypes represented in those alignments. CONCLUSIONS BamBam provides a powerful and convenient framework to analyze genome sequence data contained in BAM files.
Collapse
|
34
|
Yurchenko OP, Park S, Ilut DC, Inmon JJ, Millhollon JC, Liechty Z, Page JT, Jenks MA, Chapman KD, Udall JA, Gore MA, Dyer JM. Genome-wide analysis of the omega-3 fatty acid desaturase gene family in Gossypium. BMC PLANT BIOLOGY 2014; 14:312. [PMID: 25403726 PMCID: PMC4245742 DOI: 10.1186/s12870-014-0312-5] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 10/28/2014] [Indexed: 05/20/2023]
Abstract
BACKGROUND The majority of commercial cotton varieties planted worldwide are derived from Gossypium hirsutum, which is a naturally occurring allotetraploid produced by interspecific hybridization of A- and D-genome diploid progenitor species. While most cotton species are adapted to warm, semi-arid tropical and subtropical regions, and thus perform well in these geographical areas, cotton seedlings are sensitive to cold temperature, which can significantly reduce crop yields. One of the common biochemical responses of plants to cold temperatures is an increase in omega-3 fatty acids, which protects cellular function by maintaining membrane integrity. The purpose of our study was to identify and characterize the omega-3 fatty acid desaturase (FAD) gene family in G. hirsutum, with an emphasis on identifying omega-3 FADs involved in cold temperature adaptation. RESULTS Eleven omega-3 FAD genes were identified in G. hirsutum, and characterization of the gene family in extant A and D diploid species (G. herbaceum and G. raimondii, respectively) allowed for unambiguous genome assignment of all homoeologs in tetraploid G. hirsutum. The omega-3 FAD family of cotton includes five distinct genes, two of which encode endoplasmic reticulum-type enzymes (FAD3-1 and FAD3-2) and three that encode chloroplast-type enzymes (FAD7/8-1, FAD7/8-2, and FAD7/8-3). The FAD3-2 gene was duplicated in the A genome progenitor species after the evolutionary split from the D progenitor, but before the interspecific hybridization event that gave rise to modern tetraploid cotton. RNA-seq analysis revealed conserved, gene-specific expression patterns in various organs and cell types and semi-quantitative RT-PCR further revealed that FAD7/8-1 was specifically induced during cold temperature treatment of G. hirsutum seedlings. CONCLUSIONS The omega-3 FAD gene family in cotton was characterized at the genome-wide level in three species, showing relatively ancient establishment of the gene family prior to the split of A and D diploid progenitor species. The FAD genes are differentially expressed in various organs and cell types, including fiber, and expression of the FAD7/8-1 gene was induced by cold temperature. Collectively, these data define the genetic and functional genomic properties of this important gene family in cotton and provide a foundation for future efforts to improve cotton abiotic stress tolerance through molecular breeding approaches.
Collapse
|
35
|
Yurchenko OP, Park S, Ilut DC, Inmon JJ, Millhollon JC, Liechty Z, Page JT, Jenks MA, Chapman KD, Udall JA, Gore MA, Dyer JM. Genome-wide analysis of the omega-3 fatty acid desaturase gene family in Gossypium. BMC PLANT BIOLOGY 2014; 14:312. [PMID: 25403726 DOI: 10.1186/s12870-014-0312-315] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 06/26/2014] [Accepted: 10/28/2014] [Indexed: 05/24/2023]
Abstract
BACKGROUND The majority of commercial cotton varieties planted worldwide are derived from Gossypium hirsutum, which is a naturally occurring allotetraploid produced by interspecific hybridization of A- and D-genome diploid progenitor species. While most cotton species are adapted to warm, semi-arid tropical and subtropical regions, and thus perform well in these geographical areas, cotton seedlings are sensitive to cold temperature, which can significantly reduce crop yields. One of the common biochemical responses of plants to cold temperatures is an increase in omega-3 fatty acids, which protects cellular function by maintaining membrane integrity. The purpose of our study was to identify and characterize the omega-3 fatty acid desaturase (FAD) gene family in G. hirsutum, with an emphasis on identifying omega-3 FADs involved in cold temperature adaptation. RESULTS Eleven omega-3 FAD genes were identified in G. hirsutum, and characterization of the gene family in extant A and D diploid species (G. herbaceum and G. raimondii, respectively) allowed for unambiguous genome assignment of all homoeologs in tetraploid G. hirsutum. The omega-3 FAD family of cotton includes five distinct genes, two of which encode endoplasmic reticulum-type enzymes (FAD3-1 and FAD3-2) and three that encode chloroplast-type enzymes (FAD7/8-1, FAD7/8-2, and FAD7/8-3). The FAD3-2 gene was duplicated in the A genome progenitor species after the evolutionary split from the D progenitor, but before the interspecific hybridization event that gave rise to modern tetraploid cotton. RNA-seq analysis revealed conserved, gene-specific expression patterns in various organs and cell types and semi-quantitative RT-PCR further revealed that FAD7/8-1 was specifically induced during cold temperature treatment of G. hirsutum seedlings. CONCLUSIONS The omega-3 FAD gene family in cotton was characterized at the genome-wide level in three species, showing relatively ancient establishment of the gene family prior to the split of A and D diploid progenitor species. The FAD genes are differentially expressed in various organs and cell types, including fiber, and expression of the FAD7/8-1 gene was induced by cold temperature. Collectively, these data define the genetic and functional genomic properties of this important gene family in cotton and provide a foundation for future efforts to improve cotton abiotic stress tolerance through molecular breeding approaches.
Collapse
|
36
|
Hulse-Kemp AM, Ashrafi H, Zheng X, Wang F, Hoegenauer KA, Maeda ABV, Yang SS, Stoffel K, Matvienko M, Clemons K, Udall JA, Van Deynze A, Jones DC, Stelly DM. Development and bin mapping of gene-associated interspecific SNPs for cotton (Gossypium hirsutum L.) introgression breeding efforts. BMC Genomics 2014. [PMID: 25359292 DOI: 10.1186/1471‐2164‐15‐945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Cotton (Gossypium spp.) is the largest producer of natural fibers for textile and is an important crop worldwide. Crop production is comprised primarily of G. hirsutum L., an allotetraploid. However, elite cultivars express very small amounts of variation due to the species monophyletic origin, domestication and further bottlenecks due to selection. Conversely, wild cotton species harbor extensive genetic diversity of prospective utility to improve many beneficial agronomic traits, fiber characteristics, and resistance to disease and drought. Introgression of traits from wild species can provide a natural way to incorporate advantageous traits through breeding to generate higher-producing cotton cultivars and more sustainable production systems. Interspecific introgression efforts by conventional methods are very time-consuming and costly, but can be expedited using marker-assisted selection. RESULTS Using transcriptome sequencing we have developed the first gene-associated single nucleotide polymorphism (SNP) markers for wild cotton species G. tomentosum, G. mustelinum, G. armourianum and G. longicalyx. Markers were also developed for a secondary cultivated species G. barbadense cv. 3-79. A total of 62,832 non-redundant SNP markers were developed from the five wild species which can be utilized for interspecific germplasm introgression into cultivated G. hirsutum and are directly associated with genes. Over 500 of the G. barbadense markers have been validated by whole-genome radiation hybrid mapping. Overall 1,060 SNPs from the five different species have been screened and shown to produce acceptable genotyping assays. CONCLUSIONS This large set of 62,832 SNPs relative to cultivated G. hirsutum will allow for the first high-density mapping of genes from five wild species that affect traits of interest, including beneficial agronomic and fiber characteristics. Upon mapping, the markers can be utilized for marker-assisted introgression of new germplasm into cultivated cotton and in subsequent breeding of agronomically adapted types, including cultivar development.
Collapse
|
37
|
Hulse-Kemp AM, Ashrafi H, Zheng X, Wang F, Hoegenauer KA, Maeda ABV, Yang SS, Stoffel K, Matvienko M, Clemons K, Udall JA, Van Deynze A, Jones DC, Stelly DM. Development and bin mapping of gene-associated interspecific SNPs for cotton (Gossypium hirsutum L.) introgression breeding efforts. BMC Genomics 2014; 15:945. [PMID: 25359292 PMCID: PMC4298081 DOI: 10.1186/1471-2164-15-945] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2014] [Accepted: 10/03/2014] [Indexed: 11/18/2022] Open
Abstract
Background Cotton (Gossypium spp.) is the largest producer of natural fibers for textile and is an important crop worldwide. Crop production is comprised primarily of G. hirsutum L., an allotetraploid. However, elite cultivars express very small amounts of variation due to the species monophyletic origin, domestication and further bottlenecks due to selection. Conversely, wild cotton species harbor extensive genetic diversity of prospective utility to improve many beneficial agronomic traits, fiber characteristics, and resistance to disease and drought. Introgression of traits from wild species can provide a natural way to incorporate advantageous traits through breeding to generate higher-producing cotton cultivars and more sustainable production systems. Interspecific introgression efforts by conventional methods are very time-consuming and costly, but can be expedited using marker-assisted selection. Results Using transcriptome sequencing we have developed the first gene-associated single nucleotide polymorphism (SNP) markers for wild cotton species G. tomentosum, G. mustelinum, G. armourianum and G. longicalyx. Markers were also developed for a secondary cultivated species G. barbadense cv. 3–79. A total of 62,832 non-redundant SNP markers were developed from the five wild species which can be utilized for interspecific germplasm introgression into cultivated G. hirsutum and are directly associated with genes. Over 500 of the G. barbadense markers have been validated by whole-genome radiation hybrid mapping. Overall 1,060 SNPs from the five different species have been screened and shown to produce acceptable genotyping assays. Conclusions This large set of 62,832 SNPs relative to cultivated G. hirsutum will allow for the first high-density mapping of genes from five wild species that affect traits of interest, including beneficial agronomic and fiber characteristics. Upon mapping, the markers can be utilized for marker-assisted introgression of new germplasm into cultivated cotton and in subsequent breeding of agronomically adapted types, including cultivar development. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-945) contains supplementary material, which is available to authorized users.
Collapse
|
38
|
Guan X, Nah G, Song Q, Udall JA, Stelly DM, Chen ZJ. Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton. BMC Res Notes 2014; 7:493. [PMID: 25099166 PMCID: PMC4267057 DOI: 10.1186/1756-0500-7-493] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 07/29/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The most widely cultivated cotton (Gossypium hirsutum L., AD-genome) is derived from tetraploidization between A- and D-genome species. G. arboreum L. (A-genome) and G. raimondii Ulbr. (D-genome) are two of closely-related extant progenitors. Gene expression studies in allotetraploid cotton are complicated by the homoeologous loci of A- and D-genome origins. To develop genomic resources for gene expression and cotton breeding, we sequenced and assembled expressed sequence tags (ESTs) derived from G. arboreum and G. raimondii. RESULTS Roche/454 FLX sequencing technology was employed to sequence normalized cDNA libraries prepared from leaves, roots, bolls, ovules, and fibers in G. arboreum and G. raimondii, respectively. Sequencing reads from two independent libraries in each species were combined to assemble high-quality EST contigs. The combined sequencing reads included 1,699,776 from A-genome and 1,464,815 from D-genome, which were clustered into 89,588 contigs in the A-genome and 65,542 contigs in the D-genome. These contigs represented ~80% of EST collections in Cotton Gene Index 11 (CGI11, March 2011). Compared to the D-genome transcript database, 27,537 and 10,452 contigs were unique transcripts in A and D genomes, respectively. Further analysis using self-blastn reduced the unigene contig number by 52% in A-genome and 57% in D-genome, suggesting that 50% or more of contigs are paralogs or isoforms within each species. The majority of EST contigs (73-81%) were conserved between A- and D-genomes, whereas 27% and 19% contigs were specific to A- and D-genomes, respectively. Using these ESTs, we generated a total of 75,754 genome-specific single nucleotide polymorphism (SNP) (gSNPs or GNPs) or homoeologous-specific SNPs (hSNPs) of 10,885 contigs or genes between A and D genomes, indicating a possibility of separating allelic expression for those genes in allotetraploid cotton. CONCLUSIONS Expressed genes are highly redundant within each diploid progenitor and between A and D progenitor species, suggesting that diploid progenitors in cotton are likely ancient tetraploids. This large set of A- and D-genome ESTs and GNPs will be valuable resources for genome annotation, gene expression, and crop improvement in allotetraploid cotton.
Collapse
|
39
|
Naoumkina M, Thyssen G, Fang DD, Hinchliffe DJ, Florane C, Yeater KM, Page JT, Udall JA. The Li2 mutation results in reduced subgenome expression bias in elongating fibers of allotetraploid cotton (Gossypium hirsutum L.). PLoS One 2014; 9:e90830. [PMID: 24598808 PMCID: PMC3944810 DOI: 10.1371/journal.pone.0090830] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 02/04/2014] [Indexed: 12/18/2022] Open
Abstract
Next generation sequencing (RNA-seq) technology was used to evaluate the effects of the Ligon lintless-2 (Li2) short fiber mutation on transcriptomes of both subgenomes of allotetraploid cotton (Gossypium hirsutum L.) as compared to its near-isogenic wild type. Sequencing was performed on 4 libraries from developing fibers of Li2 mutant and wild type near-isogenic lines at the peak of elongation followed by mapping and PolyCat categorization of RNA-seq data to the reference D5 genome (G. raimondii) for homeologous gene expression analysis. The majority of homeologous genes, 83.6% according to the reference genome, were expressed during fiber elongation. Our results revealed: 1) approximately two times more genes were induced in the AT subgenome comparing to the DT subgenome in wild type and mutant fiber; 2) the subgenome expression bias was significantly reduced in the Li2 fiber transcriptome; 3) Li2 had a significantly greater effect on the DT than on the AT subgenome. Transcriptional regulators and cell wall homeologous genes significantly affected by the Li2 mutation were reviewed in detail. This is the first report to explore the effects of a single mutation on homeologous gene expression in allotetraploid cotton. These results provide deeper insights into the evolution of allotetraploid cotton gene expression and cotton fiber development.
Collapse
|
40
|
Renny-Byfield S, Gallagher JP, Grover CE, Szadkowski E, Page JT, Udall JA, Wang X, Paterson AH, Wendel JF. Ancient gene duplicates in Gossypium (cotton) exhibit near-complete expression divergence. Genome Biol Evol 2014; 6:559-71. [PMID: 24558256 PMCID: PMC3971588 DOI: 10.1093/gbe/evu037] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/17/2014] [Indexed: 12/25/2022] Open
Abstract
Whole genome duplication (WGD) is widespread in flowering plants and is a driving force in angiosperm diversification. The redundancy introduced by WGD allows the evolution of novel gene interactions and functions, although the patterns and processes of diversification are poorly understood. We identified ∼ 2,000 pairs of paralogous genes in Gossypium raimondii (cotton) resulting from an approximately 60 My old 5- to 6-fold ploidy increase. Gene expression analyses revealed that, in G. raimondii, 99.4% of the gene pairs exhibit differential expression in at least one of the three tissues (petal, leaf, and seed), with 93% to 94% exhibiting differential expression on a per-tissue basis. For 1,666 (85%) pairs, differential expression was observed in all tissues. These observations were mirrored in a time series of G. raimondii seed, and separately in leaf, petal, and seed of G. arboreum, indicating expression level diversification before species divergence. A generalized linear model revealed 92.4% of the paralog pairs exhibited expression divergence, with most exhibiting significant gene and tissue interactions indicating complementary expression patterns in different tissues. These data indicate massive, near-complete expression level neo- and/or subfunctionalization among ancient gene duplicates, suggesting these processes are essential in their maintenance over ∼ 60 Ma.
Collapse
|
41
|
Tyagi P, Gore MA, Bowman DT, Campbell BT, Udall JA, Kuraparthy V. Genetic diversity and population structure in the US Upland cotton (Gossypium hirsutum L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2014; 127:283-95. [PMID: 24170350 DOI: 10.1007/s00122-013-2217-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2013] [Accepted: 10/14/2013] [Indexed: 05/09/2023]
Abstract
Genetic diversity and population structure in the US Upland cotton was established and core sets of allelic richness were identified for developing association mapping populations in cotton. Elite plant breeding programs could likely benefit from the unexploited standing genetic variation of obsolete cultivars without the yield drag typically associated with wild accessions. A set of 381 accessions comprising 378 Upland (Gossypium hirsutum L.) and 3 G. barbadense L. accessions of the United States cotton belt were genotyped using 120 genome-wide SSR markers to establish the genetic diversity and population structure in tetraploid cotton. These accessions represent more than 100 years of Upland cotton breeding in the United States. Genetic diversity analysis identified a total of 546 alleles across 141 marker loci. Twenty-two percent of the alleles in Upland accessions were unique, specific to a single accession. Population structure analysis revealed extensive admixture and identified five subgroups corresponding to Southeastern, Midsouth, Southwest, and Western zones of cotton growing areas in the United States, with the three accessions of G. barbadense forming a separate cluster. Phylogenetic analysis supported the subgroups identified by STRUCTURE. Average genetic distance between G. hirsutum accessions was 0.195 indicating low levels of genetic diversity in Upland cotton germplasm pool. The results from both population structure and phylogenetic analysis were in agreement with pedigree information, although there were a few exceptions. Further, core sets of different sizes representing different levels of allelic richness in Upland cotton were identified. Establishment of genetic diversity, population structure, and identification of core sets from this study could be useful for genetic and genomic analysis and systematic utilization of the standing genetic variation in Upland cotton.
Collapse
|
42
|
Soliai MM, Meyer SE, Udall JA, Elzinga DE, Hermansen RA, Bodily PM, Hart AA, Coleman CE. De novo genome assembly of the fungal plant pathogen Pyrenophora semeniperda. PLoS One 2014; 9:e87045. [PMID: 24475219 PMCID: PMC3903604 DOI: 10.1371/journal.pone.0087045] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 12/18/2013] [Indexed: 12/31/2022] Open
Abstract
Pyrenophora semeniperda (anamorph Drechslera campulata) is a necrotrophic fungal seed pathogen that has a wide host range within the Poaceae. One of its hosts is cheatgrass (Bromus tectorum), a species exotic to the United States that has invaded natural ecosystems of the Intermountain West. As a natural pathogen of cheatgrass, P. semeniperda has potential as a biocontrol agent due to its effectiveness at killing seeds within the seed bank; however, few genetic resources exist for the fungus. Here, the genome of P. semeniperda isolate assembled from sequence reads of 454 pyrosequencing is presented. The total assembly is 32.5 Mb and includes 11,453 gene models encoding putative proteins larger than 24 amino acids. The models represent a variety of putative genes that are involved in pathogenic pathways typically found in necrotrophic fungi. In addition, extensive rearrangements, including inter- and intrachromosomal rearrangements, were found when the P. semeniperda genome was compared to P. tritici-repentis, a related fungal species.
Collapse
|
43
|
Rambani A, Page JT, Udall JA. Polyploidy and the petal transcriptome of Gossypium. BMC PLANT BIOLOGY 2014; 14:3. [PMID: 24393201 PMCID: PMC3890615 DOI: 10.1186/1471-2229-14-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 10/08/2013] [Indexed: 05/02/2023]
Abstract
Background Genes duplicated by polyploidy (homoeologs) may be differentially expressed in plant tissues. Recent research using DNA microarrays and RNAseq data have described a cacophony of complex expression patterns during development of cotton fibers, petals, and leaves. Because of its highly canalized development, petal tissue has been used as a model tissue for gene expression in cotton. Recent advances in cotton genome annotation and assembly now permit an enhanced analysis of duplicate gene deployment in petals from allopolyploid cotton. Results Homoeologous gene expression levels were quantified in diploid and tetraploid flower petals of Gossypium using the Gossypium raimondii genome sequence as a reference. In the polyploid, most homoeologous genes were expressed at equal levels, though a subset had an expression bias of AT and DT copies. The direction of gene expression bias was conserved in natural and recent polyploids of cotton. Conservation of direction of bias and additional comparisons between the diploids and tetraploids suggested different regulation mechanisms of gene expression. We described three phases in the evolution of cotton genomes that contribute to gene expression in the polyploid nucleus. Conclusions Compared to previous studies, a surprising level of expression homeostasis was observed in the expression patterns of polyploid genomes. Conserved expression bias in polyploid petals may have resulted from cis-acting modifications that occurred prior to polyploidization. Some duplicated genes were intriguing exceptions to general trends. Mechanisms of gene regulation for these and other genes in the cotton genome warrants further investigation.
Collapse
|
44
|
Raney JA, Reynolds DJ, Elzinga DB, Page J, A. Udall J, Jellen EN, Bonfacio A, Fairbanks DJ, Maughan PJ. Transcriptome Analysis of Drought Induced Stress in <i>Chenopodium quinoa</i>. ACTA ACUST UNITED AC 2014. [DOI: 10.4236/ajps.2014.53047] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
45
|
Bowman MJ, Park W, Bauer PJ, Udall JA, Page JT, Raney J, Scheffler BE, Jones DC, Campbell BT. RNA-Seq transcriptome profiling of upland cotton (Gossypium hirsutum L.) root tissue under water-deficit stress. PLoS One 2013; 8:e82634. [PMID: 24324815 PMCID: PMC3855774 DOI: 10.1371/journal.pone.0082634] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2013] [Accepted: 11/04/2013] [Indexed: 11/19/2022] Open
Abstract
An RNA-Seq experiment was performed using field grown well-watered and naturally rain fed cotton plants to identify differentially expressed transcripts under water-deficit stress. Our work constitutes the first application of the newly published diploid D5 Gossypium raimondii sequence in the study of tetraploid AD1 upland cotton RNA-seq transcriptome analysis. A total of 1,530 transcripts were differentially expressed between well-watered and water-deficit stressed root tissues, in patterns that confirm the accuracy of this technique for future studies in cotton genomics. Additionally, putative sequence based genome localization of differentially expressed transcripts detected A2 genome specific gene expression under water-deficit stress. These data will facilitate efforts to understand the complex responses governing transcriptomic regulatory mechanisms and to identify candidate genes that may benefit applied plant breeding programs.
Collapse
|
46
|
Flagel LE, Wendel JF, Udall JA. Duplicate gene evolution, homoeologous recombination, and transcriptome characterization in allopolyploid cotton. BMC Genomics 2012. [PMID: 22768919 DOI: 10.1186/1471‐2164‐13‐302] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Modern allotetraploid cotton contains an "A" and "D" genome from an ancestral polyploidy event that occurred approximately 1-2 million years ago. Diploid A- and D-genome species can be compared to the A- and D-genomes found within these allotetraploids to make evolutionary inferences about polyploidy. In this paper we present a comprehensive EST assembly derived from diploid and model allotetraploid cottons and demonstrate several evolutionary inferences regarding genic evolution that can be drawn from these data. RESULTS We generated a set of cotton expressed sequence tags (ESTs), comprising approximately 4.4 million Sanger and next-generation (454) transcripts supplemented by approximately 152 million Illumina reads from diploid and allotetraploid cottons. From the EST alignments we inferred 259,192 genome-specific single nucleotide polymorphisms (SNPs). Molecular evolutionary analyses of protein-coding regions demonstrate that the rate of nucleotide substitution has increased among both allotetraploid genomes relative to the diploids, and that the ratio of nonsynonymous to synonymous substitutions has increased in one of the two polyploid lineages we sampled. We also use these SNPs to show that a surprisingly high percentage of duplicate genes (~7 %) show a signature of non-independent evolution in the allotetraploid nucleus, having experienced one or more episodes of nonreciprocal homoeologous recombination (NRHR). CONCLUSIONS In this study we characterize the functional and mutational properties of the cotton transcriptome, produce a large genome-specific SNP database, and detect illegitimate genetic exchanges between duplicate genomes sharing a common allotetraploid nucleus. Our findings have important implications for our understanding of the consequences of polyploidy and duplicate gene evolution. We demonstrate that cotton genes have experienced an increased rate of molecular evolution following duplication by polyploidy, and that polyploidy has enabled considerable levels of nonreciprocal exchange between homoeologous genes.
Collapse
|
47
|
Flagel LE, Wendel JF, Udall JA. Duplicate gene evolution, homoeologous recombination, and transcriptome characterization in allopolyploid cotton. BMC Genomics 2012; 13:302. [PMID: 22768919 PMCID: PMC3427041 DOI: 10.1186/1471-2164-13-302] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 07/06/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Modern allotetraploid cotton contains an "A" and "D" genome from an ancestral polyploidy event that occurred approximately 1-2 million years ago. Diploid A- and D-genome species can be compared to the A- and D-genomes found within these allotetraploids to make evolutionary inferences about polyploidy. In this paper we present a comprehensive EST assembly derived from diploid and model allotetraploid cottons and demonstrate several evolutionary inferences regarding genic evolution that can be drawn from these data. RESULTS We generated a set of cotton expressed sequence tags (ESTs), comprising approximately 4.4 million Sanger and next-generation (454) transcripts supplemented by approximately 152 million Illumina reads from diploid and allotetraploid cottons. From the EST alignments we inferred 259,192 genome-specific single nucleotide polymorphisms (SNPs). Molecular evolutionary analyses of protein-coding regions demonstrate that the rate of nucleotide substitution has increased among both allotetraploid genomes relative to the diploids, and that the ratio of nonsynonymous to synonymous substitutions has increased in one of the two polyploid lineages we sampled. We also use these SNPs to show that a surprisingly high percentage of duplicate genes (~7 %) show a signature of non-independent evolution in the allotetraploid nucleus, having experienced one or more episodes of nonreciprocal homoeologous recombination (NRHR). CONCLUSIONS In this study we characterize the functional and mutational properties of the cotton transcriptome, produce a large genome-specific SNP database, and detect illegitimate genetic exchanges between duplicate genomes sharing a common allotetraploid nucleus. Our findings have important implications for our understanding of the consequences of polyploidy and duplicate gene evolution. We demonstrate that cotton genes have experienced an increased rate of molecular evolution following duplication by polyploidy, and that polyploidy has enabled considerable levels of nonreciprocal exchange between homoeologous genes.
Collapse
|
48
|
Bushakra JM, Stephens MJ, Atmadjaja AN, Lewers KS, Symonds VV, Udall JA, Chagné D, Buck EJ, Gardiner SE. Construction of black (Rubus occidentalis) and red (R. idaeus) raspberry linkage maps and their comparison to the genomes of strawberry, apple, and peach. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 125:311-27. [PMID: 22398438 DOI: 10.1007/s00122-012-1835-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 02/17/2012] [Indexed: 05/23/2023]
Abstract
The genus Rubus belongs to the Rosaceae and is comprised of 600-800 species distributed world-wide. To date, genetic maps of the genus consist largely of non-transferable markers such as amplified fragment length polymorphisms. An F(1) population developed from a cross between an advanced breeding selection of Rubus occidentalis (96395S1) and R. idaeus 'Latham' was used to construct a new genetic map consisting of DNA sequence-based markers. The genetic linkage maps presented here are constructed of 131 markers on at least one of the two parental maps. The majority of the markers are orthologous, including 14 Rosaceae conserved orthologous set markers, and 60 new gene-based markers developed for raspberry. Thirty-four published raspberry simple sequence repeat markers were used to align the new maps to published raspberry maps. The 96395S1 genetic map consists of six linkage groups (LG) and covers 309 cM with an average of 10 cM between markers; the 'Latham' genetic map consists of seven LG and covers 561 cM with an average of 5 cM between markers. We used BLAST analysis to align the orthologous sequences used to design primer pairs for Rubus genetic mapping with the genome sequences of Fragaria vesca 'Hawaii 4', Malus × domestica 'Golden Delicious', and Prunus 'Lovell'. The alignment of the orthologous markers designed here suggests that the genomes of Rubus and Fragaria have a high degree of synteny and that synteny decreases with phylogenetic distance. Our results give unprecedented insights into the genome evolution of raspberry from the putative ancestral genome of the single ancestor common to Rosaceae.
Collapse
|
49
|
Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA. Development and mapping of SNP assays in allotetraploid cotton. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012; 124:1201-14. [PMID: 22252442 PMCID: PMC3324690 DOI: 10.1007/s00122-011-1780-8] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Accepted: 12/22/2011] [Indexed: 05/06/2023]
Abstract
A narrow germplasm base and a complex allotetraploid genome have made the discovery of single nucleotide polymorphism (SNP) markers difficult in cotton (Gossypium hirsutum). To generate sequence for SNP discovery, we conducted a genome reduction experiment (EcoRI, BafI double digest, followed by adapter ligation, biotin-streptavidin purification, and agarose gel separation) on two accessions of G. hirsutum and two accessions of G. barbadense. From the genome reduction experiment, a total of 2.04 million genomic sequence reads were assembled into contigs with an N(50) of 508 bp and analyzed for SNPs. A previously generated assembly of expressed sequence tags (ESTs) provided an additional source for SNP discovery. Using highly conservative parameters (minimum coverage of 8× at each SNP and 20% minor allele frequency), a total of 11,834 and 1,679 non-genic SNPs were identified between accessions of G. hirsutum and G. barbadense in genome reduction assemblies, respectively. An additional 4,327 genic SNPs were also identified between accessions of G. hirsutum in the EST assembly. KBioscience KASPar assays were designed for a portion of the intra-specific G. hirsutum SNPs. From 704 non-genic and 348 genic markers developed, a total of 367 (267 non-genic, 100 genic) mapped in a segregating F(2) population (Acala Maxxa × TX2094) using the Fluidigm EP1 system. A G. hirsutum genetic linkage map of 1,688 cM was constructed based entirely on these new SNP markers. Of the genic-based SNPs, we were able to identify within which genome ('A' or 'D') each SNP resided using diploid species sequence data. Genetic maps generated by these newly identified markers are being used to locate quantitative, economically important regions within the cotton genome.
Collapse
|
50
|
Byers RL, Harker DB, Yourstone SM, Maughan PJ, Udall JA. Development and mapping of SNP assays in allotetraploid cotton. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2012. [PMID: 22252442 DOI: 10.1007/s00122‐011‐1780‐8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A narrow germplasm base and a complex allotetraploid genome have made the discovery of single nucleotide polymorphism (SNP) markers difficult in cotton (Gossypium hirsutum). To generate sequence for SNP discovery, we conducted a genome reduction experiment (EcoRI, BafI double digest, followed by adapter ligation, biotin-streptavidin purification, and agarose gel separation) on two accessions of G. hirsutum and two accessions of G. barbadense. From the genome reduction experiment, a total of 2.04 million genomic sequence reads were assembled into contigs with an N(50) of 508 bp and analyzed for SNPs. A previously generated assembly of expressed sequence tags (ESTs) provided an additional source for SNP discovery. Using highly conservative parameters (minimum coverage of 8× at each SNP and 20% minor allele frequency), a total of 11,834 and 1,679 non-genic SNPs were identified between accessions of G. hirsutum and G. barbadense in genome reduction assemblies, respectively. An additional 4,327 genic SNPs were also identified between accessions of G. hirsutum in the EST assembly. KBioscience KASPar assays were designed for a portion of the intra-specific G. hirsutum SNPs. From 704 non-genic and 348 genic markers developed, a total of 367 (267 non-genic, 100 genic) mapped in a segregating F(2) population (Acala Maxxa × TX2094) using the Fluidigm EP1 system. A G. hirsutum genetic linkage map of 1,688 cM was constructed based entirely on these new SNP markers. Of the genic-based SNPs, we were able to identify within which genome ('A' or 'D') each SNP resided using diploid species sequence data. Genetic maps generated by these newly identified markers are being used to locate quantitative, economically important regions within the cotton genome.
Collapse
|