26
|
Wang X, Freeling M. The Brassica genome. FRONTIERS IN PLANT SCIENCE 2013; 4:148. [PMID: 23755053 PMCID: PMC3667235 DOI: 10.3389/fpls.2013.00148] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Accepted: 05/01/2013] [Indexed: 06/02/2023]
|
27
|
Turco G, Schnable JC, Pedersen B, Freeling M. Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses. FRONTIERS IN PLANT SCIENCE 2013; 4:170. [PMID: 23874343 PMCID: PMC3708275 DOI: 10.3389/fpls.2013.00170] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 05/13/2013] [Indexed: 05/07/2023]
Abstract
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.
Collapse
|
28
|
Freeling M, Woodhouse MR, Subramaniam S, Turco G, Lisch D, Schnable JC. Fractionation mutagenesis and similar consequences of mechanisms removing dispensable or less-expressed DNA in plants. CURRENT OPINION IN PLANT BIOLOGY 2012; 15:131-9. [PMID: 22341793 DOI: 10.1016/j.pbi.2012.01.015] [Citation(s) in RCA: 123] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Revised: 12/07/2011] [Accepted: 01/21/2012] [Indexed: 05/06/2023]
Abstract
Unlike in mammals, plants rapidly delete functionless, nonrepetitive DNA from their genomes. Following paleopolyploidies, duplicate genes are deleted by intrachromosomal recombination. This may explain how flowering plants have survived multiple whole genome duplications. Genes are disproportionately lost from one parental subgenome, the subgenome that is less expressed in the polyploid. The origin of this unbalanced expression between genomes remains unknown. The consequences of the tradeoffs between transposon repression and gene expression represent one potential explanation of genome dominance. If so, the same mechanisms may act in heterosis: genome dominance is like inbreeding depression. Regulatory DNA deletion following polyploidy combined with abundant RNA-seq expression datasets are being used to generate testable hypothesizes regarding the function of specific cis-regulatory sequences.
Collapse
|
29
|
Schnable JC, Freeling M, Lyons E. Genome-wide analysis of syntenic gene deletion in the grasses. Genome Biol Evol 2012; 4:265-77. [PMID: 22275519 PMCID: PMC3318446 DOI: 10.1093/gbe/evs009] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The grasses, Poaceae, are one of the largest and most successful angiosperm families. Like many radiations of flowering plants, the divergence of the major grass lineages was preceded by a whole-genome duplication (WGD), although these events are not rare for flowering plants. By combining identification of syntenic gene blocks with measures of gene pair divergence and different frequencies of ancient gene loss, we have separated the two subgenomes present in modern grasses. Reciprocal loss of duplicated genes or genomic regions has been hypothesized to reproductively isolate populations and, thus, speciation. However, in contrast to previous studies in yeast and teleost fishes, we found very little evidence of reciprocal loss of homeologous genes between the grasses, suggesting that post-WGD gene loss may not be the cause of the grass radiation. The sets of homeologous and orthologous genes and predicted locations of deleted genes identified in this study, as well as links to the CoGe comparative genomics web platform for analyzing pan-grass syntenic regions, are provided along with this paper as a resource for the grass genetics community.
Collapse
|
30
|
Spangler JB, Subramaniam S, Freeling M, Feltus FA. Evidence of function for conserved noncoding sequences in Arabidopsis thaliana. THE NEW PHYTOLOGIST 2012; 193:241-252. [PMID: 21955124 DOI: 10.1111/j.1469-8137.2011.03916.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
• Whole genome duplication events provide a lineage with a large reservoir of genes that can be molded by evolutionary forces into phenotypes that fit alternative environments. A well-studied whole genome duplication, the α-event, occurred in an ancestor of the model plant Arabidopsis thaliana. Retained segments of the α-event have been defined in recent years in the form of duplicate protein coding sequences (α-pairs) and associated conserved noncoding DNA sequences (CNSs). Our aim was to identify any association between CNSs and α-pair co-functionality at the gene expression level. • Here, we tested for correlation between CNS counts and α-pair co-expression and expression intensity across nine expression datasets: aerial tissue, flowers, leaves, roots, rosettes, seedlings, seeds, shoots and whole plants. • We provide evidence for a putative regulatory role of the CNSs. The association of CNSs with α-pair co-expression and expression intensity varied by gene function, subgene position and the presence of transcription factor binding motifs. A range of possible CNS regulatory mechanisms, including intron-mediated enhancement, messenger RNA fold stability and transcriptional regulation, are discussed. • This study provides a framework to understand how CNS motifs are involved in the maintenance of gene expression after a whole genome duplication event.
Collapse
|
31
|
Schnable JC, Wang X, Pires JC, Freeling M. Escape from preferential retention following repeated whole genome duplications in plants. FRONTIERS IN PLANT SCIENCE 2012; 3:94. [PMID: 22639677 PMCID: PMC3355610 DOI: 10.3389/fpls.2012.00094] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2012] [Accepted: 04/24/2012] [Indexed: 05/21/2023]
Abstract
The well supported gene dosage hypothesis predicts that genes encoding proteins engaged in dose-sensitive interactions cannot be reduced back to single copies once all interacting partners are simultaneously duplicated in a whole genome duplication. The genomes of extant flowering plants are the result of many sequential rounds of whole genome duplication, yet the fraction of genomes devoted to encoding complex molecular machines does not increase as fast as expected through multiple rounds of whole genome duplications. Using parallel interspecies genomic comparisons in the grasses and crucifers, we demonstrate that genes retained as duplicates following a whole genome duplication have only a 50% chance of being retained as duplicates in a second whole genome duplication. Genes which fractionated to a single copy following a second whole genome duplication tend to be the member of a gene pair with less complex promoters, lower levels of expression, and to be under lower levels of purifying selection. We suggest the copy with lower levels of expression and less purifying selection contributes less to effective gene-product dosage and therefore is under less dosage constraint in future whole genome duplications, providing an explanation for why flowering plant genomes are not overrun with subunits of large dose-sensitive protein complexes.
Collapse
|
32
|
Woodhouse MR, Tang H, Freeling M. Different gene families in Arabidopsis thaliana transposed in different epochs and at different frequencies throughout the rosids. THE PLANT CELL 2011; 23:4241-53. [PMID: 22180627 PMCID: PMC3269863 DOI: 10.1105/tpc.111.093567] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Certain types of gene families, such as those encoding most families of transcription factors, maintain their chromosomal syntenic positions throughout angiosperm evolutionary time. Other nonsyntenic gene families are prone to deletion, tandem duplication, and transposition. Here, we describe the chromosomal positional history of all genes in Arabidopsis thaliana throughout the rosid superorder. We introduce a public database where researchers can look up the positional history of their favorite A. thaliana gene or gene family. Finally, we show that specific gene families transposed at specific points in evolutionary time, particularly after whole-genome duplication events in the Brassicales, and suggest that genes in mobile gene families are under different selection pressure than syntenic genes.
Collapse
|
33
|
Zhang W, Wu Y, Schnable JC, Zeng Z, Freeling M, Crawford GE, Jiang J. High-resolution mapping of open chromatin in the rice genome. Genome Res 2011; 22:151-62. [PMID: 22110044 DOI: 10.1101/gr.131342.111] [Citation(s) in RCA: 175] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Gene expression is controlled by the complex interaction of transcription factors binding to promoters and other regulatory DNA elements. One common characteristic of the genomic regions associated with regulatory proteins is a pronounced sensitivity to DNase I digestion. We generated genome-wide high-resolution maps of DNase I hypersensitive (DH) sites from both seedling and callus tissues of rice (Oryza sativa). Approximately 25% of the DH sites from both tissues were found in putative promoters, indicating that the vast majority of the gene regulatory elements in rice are not located in promoter regions. We found 58% more DH sites in the callus than in the seedling. For DH sites detected in both the seedling and callus, 31% displayed significantly different levels of DNase I sensitivity within the two tissues. Genes that are differentially expressed in the seedling and callus were frequently associated with DH sites in both tissues. The DNA sequences contained within the DH sites were hypomethylated, consistent with what is known about active gene regulatory elements. Interestingly, tissue-specific DH sites located in the promoters showed a higher level of DNA methylation than the average DNA methylation level of all the DH sites located in the promoters. A distinct elevation of H3K27me3 was associated with intergenic DH sites. These results suggest that epigenetic modifications play a role in the dynamic changes of the numbers and DNase I sensitivity of DH sites during development.
Collapse
|
34
|
Eichten SR, Swanson-Wagner RA, Schnable JC, Waters AJ, Hermanson PJ, Liu S, Yeh CT, Jia Y, Gendler K, Freeling M, Schnable PS, Vaughn MW, Springer NM. Heritable epigenetic variation among maize inbreds. PLoS Genet 2011; 7:e1002372. [PMID: 22125494 PMCID: PMC3219600 DOI: 10.1371/journal.pgen.1002372] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2011] [Accepted: 09/20/2011] [Indexed: 11/26/2022] Open
Abstract
Epigenetic variation describes heritable differences that are not attributable to changes in DNA sequence. There is the potential for pure epigenetic variation that occurs in the absence of any genetic change or for more complex situations that involve both genetic and epigenetic differences. Methylation of cytosine residues provides one mechanism for the inheritance of epigenetic information. A genome-wide profiling of DNA methylation in two different genotypes of Zea mays (ssp. mays), an organism with a complex genome of interspersed genes and repetitive elements, allowed the identification and characterization of examples of natural epigenetic variation. The distribution of DNA methylation was profiled using immunoprecipitation of methylated DNA followed by hybridization to a high-density tiling microarray. The comparison of the DNA methylation levels in the two genotypes, B73 and Mo17, allowed for the identification of approximately 700 differentially methylated regions (DMRs). Several of these DMRs occur in genomic regions that are apparently identical by descent in B73 and Mo17 suggesting that they may be examples of pure epigenetic variation. The methylation levels of the DMRs were further studied in a panel of near-isogenic lines to evaluate the stable inheritance of the methylation levels and to assess the contribution of cis- and trans- acting information to natural epigenetic variation. The majority of DMRs that occur in genomic regions without genetic variation are controlled by cis-acting differences and exhibit relatively stable inheritance. This study provides evidence for naturally occurring epigenetic variation in maize, including examples of pure epigenetic variation that is not conditioned by genetic differences. The epigenetic differences are variable within maize populations and exhibit relatively stable trans-generational inheritance. The detected examples of epigenetic variation, including some without tightly linked genetic variation, may contribute to complex trait variation. Heritable variation within a species provides the basis for natural and artificial selection. A substantial portion of heritable variation is based on alterations in DNA sequence among individuals and is termed genetic variation. There is also evidence for epigenetic variation, which refers to heritable differences that are not caused by DNA sequence changes. Methylation of cytosine residues provides one molecular mechanism for epigenetic variation in many eukaryotic species. The genome-wide distribution of DNA methylation was assessed in two different inbred genotypes of maize to identify differentially methylated regions that may contribute to epigenetic variation. There are hundreds of genomic regions that have differences in DNA methylation levels in these two different genotypes, including methylation differences in regions without genetic variation. By studying the inheritance of the differential methylation in near-isogenic progeny of the two inbred lines, it is possible to demonstrate relatively stable inheritance of epigenetic variation, even in the absence of DNA sequence changes. The epigenetic variation among individuals of the same species may provide important contributions to phenotypic variation within a species even in the absence of genetic differences.
Collapse
|
35
|
Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Bai Y, Mun JH, Bancroft I, Cheng F, Huang S, Li X, Hua W, Wang J, Wang X, Freeling M, Pires JC, Paterson AH, Chalhoub B, Wang B, Hayward A, Sharpe AG, Park BS, Weisshaar B, Liu B, Li B, Liu B, Tong C, Song C, Duran C, Peng C, Geng C, Koh C, Lin C, Edwards D, Mu D, Shen D, Soumpourou E, Li F, Fraser F, Conant G, Lassalle G, King GJ, Bonnema G, Tang H, Wang H, Belcram H, Zhou H, Hirakawa H, Abe H, Guo H, Wang H, Jin H, Parkin IAP, Batley J, Kim JS, Just J, Li J, Xu J, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, Min J, Poulain J, Wang J, Hatakeyama K, Wu K, Wang L, Fang L, Trick M, Links MG, Zhao M, Jin M, Ramchiary N, Drou N, Berkman PJ, Cai Q, Huang Q, Li R, Tabata S, Cheng S, Zhang S, Zhang S, Huang S, Sato S, Sun S, Kwon SJ, Choi SR, Lee TH, Fan W, Zhao X, Tan X, Xu X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Wang Z, Xiong Z, Zhang Z. The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 2011; 43:1035-9. [PMID: 21873998 DOI: 10.1038/ng.919] [Citation(s) in RCA: 1262] [Impact Index Per Article: 97.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2011] [Accepted: 08/03/2011] [Indexed: 11/09/2022]
Abstract
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.
Collapse
|
36
|
Tang H, Lyons E, Pedersen B, Schnable JC, Paterson AH, Freeling M. Screening synteny blocks in pairwise genome comparisons through integer programming. BMC Bioinformatics 2011; 12:102. [PMID: 21501495 PMCID: PMC3088904 DOI: 10.1186/1471-2105-12-102] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 04/18/2011] [Indexed: 12/01/2022] Open
Abstract
Background It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. Results We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons). Conclusions The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available http://github.com/tanghaibao/quota-alignment. QUOTA-ALIGN program is also integrated as a major component in SynMap http://genomevolution.com/CoGe/SynMap.pl, offering easier access to thousands of genomes for non-programmers.
Collapse
|
37
|
Schnable JC, Freeling M. Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize. PLoS One 2011; 6:e17855. [PMID: 21423772 PMCID: PMC3053395 DOI: 10.1371/journal.pone.0017855] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2011] [Accepted: 02/10/2011] [Indexed: 12/31/2022] Open
Abstract
Not all genes are created equal. Despite being supported by sequence conservation and expression data, knockout homozygotes of many genes show no visible effects, at least under laboratory conditions. We have identified a set of maize (Zea mays L.) genes which have been the subject of a disproportionate share of publications recorded at MaizeGDB. We manually anchored these "classical" maize genes to gene models in the B73 reference genome, and identified syntenic orthologs in other grass genomes. In addition to proofing the most recent version 2 maize gene models, we show that a subset of these genes, those that were identified by morphological phenotype prior to cloning, are retained at syntenic locations throughout the grasses at much higher levels than the average expressed maize gene, and are preferentially found on the maize1 subgenome even with a duplicate copy is still retained on the opposite subgenome. Maize1 is the subgenome that experienced less gene loss following the whole genome duplication in maize lineage 5-12 million years ago and genes located on this subgenome tend to be expressed at higher levels in modern maize. Links to the web based software that supported our syntenic analyses in the grasses should empower further research and support teaching involving the history of maize genetic research. Our findings exemplify the concept of "grasses as a single genetic system," where what is learned in one grass may be applied to another.
Collapse
|
38
|
Lyons E, Freeling M, Kustu S, Inwood W. Using genomic sequencing for classical genetics in E. coli K12. PLoS One 2011; 6:e16717. [PMID: 21364914 PMCID: PMC3045373 DOI: 10.1371/journal.pone.0016717] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2010] [Accepted: 12/23/2010] [Indexed: 02/07/2023] Open
Abstract
We here develop computational methods to facilitate use of 454 whole genome shotgun sequencing to identify mutations in Escherichia coli K12. We had Roche sequence eight related strains derived as spontaneous mutants in a background without a whole genome sequence. They provided difference tables based on assembling each genome to reference strain E. coli MG1655 (NC_000913). Due to the evolutionary distance to MG1655, these contained a large number of both false negatives and positives. By manual analysis of the dataset, we detected all the known mutations (24 at nine locations) and identified and genetically confirmed new mutations necessary and sufficient for the phenotypes we had selected in four strains. We then had Roche assemble contigs de novo, which we further assembled to full-length pseudomolecules based on synteny with MG1655. This hybrid method facilitated detection of insertion mutations and allowed annotation from MG1655. After removing one genome with less than the optimal 20- to 30-fold sequence coverage, we identified 544 putative polymorphisms that included all of the known and selected mutations apart from insertions. Finally, we detected seven new mutations in a total of only 41 candidates by comparing single genomes to composite data for the remaining six and using a ranking system to penalize homopolymer sequencing and misassembly errors. An additional benefit of the analysis is a table of differences between MG1655 and a physiologically robust E. coli wild-type strain NCM3722. Both projects were greatly facilitated by use of comparative genomics tools in the CoGe software package (http://genomevolution.org/).
Collapse
|
39
|
Pedersen BS, Tang H, Freeling M. Gobe: an interactive, web-based tool for comparative genomic visualization. Bioinformatics 2011; 27:1015-6. [DOI: 10.1093/bioinformatics/btr056] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
40
|
Schnable JC, Pedersen BS, Subramaniam S, Freeling M. Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses. FRONTIERS IN PLANT SCIENCE 2011; 2:2. [PMID: 22645525 PMCID: PMC3355796 DOI: 10.3389/fpls.2011.00002] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2011] [Accepted: 02/19/2011] [Indexed: 05/08/2023]
Abstract
Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein-protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein-protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose-sensitive protein-DNA interactions between the regulatory regions of CNS-rich genes - nicknamed bigfoot genes - and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
Collapse
|
41
|
Woodward JB, Abeydeera ND, Paul D, Phillips K, Rapala-Kozik M, Freeling M, Begley TP, Ealick SE, McSteen P, Scanlon MJ. A maize thiamine auxotroph is defective in shoot meristem maintenance. THE PLANT CELL 2010; 22:3305-17. [PMID: 20971897 PMCID: PMC2990124 DOI: 10.1105/tpc.110.077776] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Revised: 08/27/2010] [Accepted: 09/25/2010] [Indexed: 05/18/2023]
Abstract
Plant shoots undergo organogenesis throughout their life cycle via the perpetuation of stem cell pools called shoot apical meristems (SAMs). SAM maintenance requires the coordinated equilibrium between stem cell division and differentiation and is regulated by integrated networks of gene expression, hormonal signaling, and metabolite sensing. Here, we show that the maize (Zea mays) mutant bladekiller1-R (blk1-R) is defective in leaf blade development and meristem maintenance and exhibits a progressive reduction in SAM size that results in premature shoot abortion. Molecular markers for stem cell maintenance and organ initiation reveal that both of these meristematic functions are progressively compromised in blk1-R mutants, especially in the inflorescence and floral meristems. Positional cloning of blk1-R identified a predicted missense mutation in a highly conserved amino acid encoded by thiamine biosynthesis2 (thi2). Consistent with chromosome dosage studies suggesting that blk1-R is a null mutation, biochemical analyses confirm that the wild-type THI2 enzyme copurifies with a thiazole precursor to thiamine, whereas the mutant enzyme does not. Heterologous expression studies confirm that THI2 is targeted to chloroplasts. All blk1-R mutant phenotypes are rescued by exogenous thiamine supplementation, suggesting that blk1-R is a thiamine auxotroph. These results provide insight into the role of metabolic cofactors, such as thiamine, during the proliferation of stem and initial cell populations.
Collapse
|
42
|
Woodhouse MR, Schnable JC, Pedersen BS, Lyons E, Lisch D, Subramaniam S, Freeling M. Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homologs. PLoS Biol 2010; 8:e1000409. [PMID: 20613864 PMCID: PMC2893956 DOI: 10.1371/journal.pbio.1000409] [Citation(s) in RCA: 195] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2009] [Accepted: 05/20/2010] [Indexed: 12/02/2022] Open
Abstract
Following genome duplication and selfish DNA expansion, maize used a heretofore unknown mechanism to shed redundant genes and functionless DNA with bias toward one of the parental genomes. Previous work in Arabidopsis showed that after an ancient tetraploidy event, genes were preferentially removed from one of the two homeologs, a process known as fractionation. The mechanism of fractionation is unknown. We sought to determine whether such preferential, or biased, fractionation exists in maize and, if so, whether a specific mechanism could be implicated in this process. We studied the process of fractionation using two recently sequenced grass species: sorghum and maize. The maize lineage has experienced a tetraploidy since its divergence from sorghum approximately 12 million years ago, and fragments of many knocked-out genes retain enough sequence similarity to be easily identifiable. Using sorghum exons as the query sequence, we studied the fate of both orthologous genes in maize following the maize tetraploidy. We show that genes are predominantly lost, not relocated, and that single-gene loss by deletion is the rule. Based on comparisons with orthologous sorghum and rice genes, we also infer that the sequences present before the deletion events were flanked by short direct repeats, a signature of intra-chromosomal recombination. Evidence of this deletion mechanism is found 2.3 times more frequently on one of the maize homeologs, consistent with earlier observations of biased fractionation. The over-fractionated homeolog is also a greater than 3-fold better target for transposon removal, but does not have an observably higher synonymous base substitution rate, nor could we find differentially placed methylation domains. We conclude that fractionation is indeed biased in maize and that intra-chromosomal or possibly a similar illegitimate recombination is the primary mechanism by which fractionation occurs. The mechanism of intra-chromosomal recombination explains the observed bias in both gene and transposon loss in the maize lineage. The existence of fractionation bias demonstrates that the frequency of deletion is modulated. Among the evolutionary benefits of this deletion/fractionation mechanism is bulk DNA removal and the generation of novel combinations of regulatory sequences and coding regions. All genomes can accumulate dispensable DNA in the form of duplications of individual genes or even partial or whole genome duplications. Genomes also can accumulate selfish DNA elements. Duplication events specifically are often followed by extensive gene loss. The maize genome is particularly extreme, having become tetraploid 10 million years ago and played host to massive transposon amplifications. We compared the genome of sorghum (which is homologous to the pre-tetraploid maize genome) with the two identifiable parental genomes retained in maize. The two maize genomes differ greatly: one of the parental genomes has lost 2.3 times more genes than the other, and the selfish DNA regions between genes were even more frequently lost, suggesting maize can distinguish between the parental genomes present in the original tetraploid. We show that genes are actually lost, not simply relocated. Deletions were rarely longer than a single gene, and occurred between repeated DNA sequences, suggesting mis-recombination as a mechanism of gene removal. We hypothesize an epigenetic mechanism of genome distinction to account for the selective loss. To the extent that the rate of base substitutions tracks time, we neither support nor refute claims of maize allotetraploidy. Finally, we explain why it makes sense that purifying selection in mammals does not operate at all like the gene and genome deletion program we describe here.
Collapse
|
43
|
Freeling M. Intragenic recombination in maize: pollen analysis methods and the effect of parental adh1 isoalleles. Genetics 2010; 83:701-17. [PMID: 17248728 PMCID: PMC1213545 DOI: 10.1093/genetics/83.4.701] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The ability to stain mature pollen grains for the presence of alcohol dehydrogenase (ADH) activity permits the quantitation of ADH( +) gametophytes at frequencies below 10(-6). This resolution allows reversion and genetic fine structure analyses. The rationale of pollen analysis follows Nelson's prototype studies with waxy. As with the waxy gene, revertant frequencies for seven Adh1-deficient ( Adh1(-)) alleles appear to be in excess of microbially derived expectations. Each of the seven Adh1(-) alleles were derived from one of three naturally occurring isoalleles. Based on Schwartz's protein level characterizations of the mutants' products, it was anticipated that the seven Adh1(-) alleles should recombine to yield ADH(+) cistrons in certain pairwise combinations. This expectation was not met. The parental "wild-type" isoalleles from which the mutants were derived appear to be structurally divergent. The discussion interprets these data in view of understanding naturally occurring cistronic variation.
Collapse
|
44
|
Woodman JC, Freeling M. Identification of a genetic element that controls the organ-specific expression of adh1 in maize. Genetics 2010; 98:357-78. [PMID: 17249088 PMCID: PMC1214445 DOI: 10.1093/genetics/98.2.357] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Allozyme balances serve as markers of quantitative behavior of electrophoretically distinguishable alleles. By the use of ADH Set I allozyme balances, it is demonstrated that all Adh1-S/Adh1-F individuals from more than 20 diverse S/F families exhibit a reciprocal correlation between Adh1 quantitative behavior in two maize organs: the scutellum and primary root. Within an electrophoretic mobility class, the Adh1 allele that is relatively underexpressed in the scutellum is relatively overexpressed in the primary root, and vice versa. Segregation tests prove that this "reciprocal effect" is the property of a cis-acting site that is closely linked to or within the Adh1 structural gene, and it is not affected by diverse genetic backgrounds. Immunological and [(3)H]-leucine incorporation experiments establish that Adh1 quantitative variants differ in ADH1.ADH1 synthetic rates in the anaerobic primary root. The reciprocal-effect phenomenon suggests that the cis-acting loci controlling Adh1 quantitative expression in each respective organ are at least in close proximity, or may share common DNA sequences. We discuss the possibility that the reciprocal-effect locus is a regulatory component of the Adh1 cistron.
Collapse
|
45
|
Kane J, Freeling M, Lyons E. The evolution of a high copy gene array in Arabidopsis. J Mol Evol 2010; 70:531-44. [PMID: 20495794 PMCID: PMC2886086 DOI: 10.1007/s00239-010-9350-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/03/2010] [Indexed: 11/29/2022]
Abstract
Local gene duplication is a prominent mechanism of gene copy number expansion. Elucidating the mechanisms by which local duplicates arise is necessary in understanding the evolution of genomes and their host organisms. Chromosome one of Arabidopsis thaliana contains an 81-gene array subdivided into 27 triplet units (t-units), with each t-unit containing three pre-transfer RNA genes. We utilized phylogenetic tree reconstructions and comparative genomics to order the events leading to the array's formation, and propose a model using unequal crossing-over as the primary mechanism of array formation. The model is supported by additional phylogenetic information from intergenic spacer sequences separating each t-unit, comparative analysis to an orthologous array of 12 t-units in the sister taxa Arabidopsis lyrata, and additional modeling using a stochastic simulation of orthologous array divergence. Lastly, comparative phylogenetic analysis demonstrates that the two orthologous t-unit arrays undergo concerted evolution within each taxa and are likely fluctuating in copy number under neutral evolutionary drift. These findings hold larger implications for future research concerning gene and genome evolution.
Collapse
|
46
|
Paterson AH, Freeling M, Tang H, Wang X. Insights from the comparison of plant genome sequences. ANNUAL REVIEW OF PLANT BIOLOGY 2010; 61:349-72. [PMID: 20441528 DOI: 10.1146/annurev-arplant-042809-112235] [Citation(s) in RCA: 117] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The next decade will see essentially completed sequences for multiple branches of virtually all angiosperm clades that include major crops and/or botanical models. These sequences will provide a powerful framework for relating genome-level events to aspects of morphological and physiological variation that have contributed to the colonization of much of the planet by angiosperms. Clarification of the fundamental angiosperm gene set, its arrangement, lineage-specific variations in gene repertoire and arrangement, and the fates of duplicated gene pairs will advance knowledge of functional and regulatory diversity and perhaps shed light on adaptation by lineages to whole-genome duplication, which is a distinguishing feature of angiosperm evolution. Better understanding of the relationships among angiosperm genomes promises to provide a firm foundation upon which to base translational genomics: the leveraging of hard-won structural and functional genomic information from crown botanical models to dissect novel and, in some cases, economically important features in many additional organisms.
Collapse
|
47
|
Freeling M, Subramaniam S. Conserved noncoding sequences (CNSs) in higher plants. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:126-32. [PMID: 19249238 DOI: 10.1016/j.pbi.2009.01.005] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2008] [Revised: 01/22/2009] [Accepted: 01/22/2009] [Indexed: 05/09/2023]
Abstract
Plant conserved noncoding sequences (CNSs)--a specific category of phylogenetic footprint--have been shown experimentally to function. No plant CNS is conserved to the extent that ultraconserved noncoding sequences are conserved in vertebrates. Plant CNSs are enriched in known transcription factor or other cis-acting binding sites, and are usually clustered around genes. Genes that encode transcription factors and/or those that respond to stimuli are particularly CNS-rich. Only rarely could this function involve small RNA binding. Some transcribed CNSs encode short translation products as a form of negative control. Approximately 4% of Arabidopsis gene content is estimated to be both CNS-rich and occupies a relatively long stretch of chromosome: Bigfoot genes (long phylogenetic footprints). We discuss a 'DNA-templated protein assembly' idea that might help explain Bigfoot gene CNSs.
Collapse
|
48
|
Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob-ur-Rahman, Ware D, Westhoff P, Mayer KFX, Messing J, Rokhsar DS. The Sorghum bicolor genome and the diversification of grasses. Nature 2009; 457:551-6. [PMID: 19189423 DOI: 10.1038/nature07723] [Citation(s) in RCA: 1638] [Impact Index Per Article: 109.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.
Collapse
|
49
|
Freeling M. Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. ANNUAL REVIEW OF PLANT BIOLOGY 2009; 60:433-53. [PMID: 19575588 DOI: 10.1146/annurev.arplant.043008.092122] [Citation(s) in RCA: 588] [Impact Index Per Article: 39.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Each mode of gene duplication (tandem, tetraploid, segmental, transpositional) retains genes in a biased manner. A reciprocal relationship exists between plant genes retained postpaleotetraploidy versus genes retained after an ancient tandem duplication. Among the models (C, neofunctionalization, balanced gene drive) and ideas that might explain this relationship, only balanced gene drive predicts reciprocity. The gene balance hypothesis explains that more "connected" genes--by protein-protein interactions in a heteromer, for example--are less likely to be retained as a tandem or transposed duplicate and are more likely to be retained postpaleotetraploidy; otherwise, selectively negative dosage effects are created. Biased duplicate retention is an instant and neutral by-product, a spandrel, of purifying selection. Balanced gene drive expanded plant gene families, including those encoding proteasomal proteins, protein kinases, motors, and transcription factors, with each paleotetraploidy, which could explain trends involving complexity. Balanced gene drive is a saltation mechanism in the mutationist tradition.
Collapse
|
50
|
Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers J, Paterson A, Lisch D, Freeling M. Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. PLANT PHYSIOLOGY 2008; 148:1772-81. [PMID: 18952863 PMCID: PMC2593677 DOI: 10.1104/pp.108.124867] [Citation(s) in RCA: 273] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2008] [Accepted: 10/19/2008] [Indexed: 05/18/2023]
Abstract
In addition to the genomes of Arabidopsis (Arabidopsis thaliana) and poplar (Populus trichocarpa), two near-complete rosid genome sequences, grape (Vitis vinifera) and papaya (Carica papaya), have been recently released. The phylogenetic relationship among these four genomes and the placement of their three independent, fractionated tetraploidies sum to a powerful comparative genomic system. CoGe, a platform of multiple whole or near-complete genome sequences, provides an integrative Web-based system to find and align syntenic chromosomal regions and visualize the output in an intuitive and interactive manner. CoGe has been customized to specifically support comparisons among the rosids. Crucial facts and definitions are presented to clearly describe the sorts of biological questions that might be answered in part using CoGe, including patterns of DNA conservation, accuracy of annotation, transposability of individual genes, subfunctionalization and/or fractionation of syntenic gene sets, and conserved noncoding sequence content. This précis of an online tutorial, CoGe with Rosids (http://tinyurl.com/4a23pk), presents sample results graphically.
Collapse
|