101
|
Yamasaki K, Kigawa T, Seki M, Shinozaki K, Yokoyama S. DNA-binding domains of plant-specific transcription factors: structure, function, and evolution. TRENDS IN PLANT SCIENCE 2013; 18:267-76. [PMID: 23040085 DOI: 10.1016/j.tplants.2012.09.001] [Citation(s) in RCA: 150] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2012] [Revised: 08/10/2012] [Accepted: 09/04/2012] [Indexed: 05/02/2023]
Abstract
The families of the plant-specific transcription factors (TFs) are defined by their characteristic DNA-binding domains (DBDs), such as AP2/ERF, B3, NAC, SBP, and WRKY. Recently, three-dimensional structures of the DBDs, including those in complexes with DNA, were determined by NMR spectroscopy and X-ray crystallography. In this review we summarize the functional and evolutionary implications arising from structure analyses. The unexpected structural similarity between B3 and the noncatalytic DBD of the restriction endonuclease EcoRII allowed us to build structural models of the B3/DNA complex. Most of the DBDs of plant-specific TFs are likely to have originated from endonucleases associated with transposable elements. After the DBDs have been established in unicellular eukaryotes, they experienced extensive plant-specific expansion, by acquiring new functions.
Collapse
Affiliation(s)
- Kazuhiko Yamasaki
- Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology-AIST, 1-1-1 Higashi, Tsukuba 305-8566, Japan.
| | | | | | | | | |
Collapse
|
102
|
Extensive natural epigenetic variation at a de novo originated gene. PLoS Genet 2013; 9:e1003437. [PMID: 23593031 PMCID: PMC3623765 DOI: 10.1371/journal.pgen.1003437] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2012] [Accepted: 02/21/2013] [Indexed: 11/20/2022] Open
Abstract
Epigenetic variation, such as heritable changes of DNA methylation, can affect gene expression and thus phenotypes, but examples of natural epimutations are few and little is known about their stability and frequency in nature. Here, we report that the gene Qua-Quine Starch (QQS) of Arabidopsis thaliana, which is involved in starch metabolism and that originated de novo recently, is subject to frequent epigenetic variation in nature. Specifically, we show that expression of this gene varies considerably among natural accessions as well as within populations directly sampled from the wild, and we demonstrate that this variation correlates negatively with the DNA methylation level of repeated sequences located within the 5′end of the gene. Furthermore, we provide extensive evidence that DNA methylation and expression variants can be inherited for several generations and are not linked to DNA sequence changes. Taken together, these observations provide a first indication that de novo originated genes might be particularly prone to epigenetic variation in their initial stages of formation. Epigenetics is defined as the study of heritable changes in gene expression that are not linked to changes in the DNA sequence. In plants, these heritable variations are often associated with differences in DNA methylation. So far, very little is known about the extent and stability of epigenetic variation in nature. In this study, we report a case of extensive epigenetic variation in natural populations of the flowering plant Arabidopsis thaliana, which concerns a gene involved in starch metabolism, named Qua-Quine Starch (QQS). We show that in the wild QQS expression varies extensively and concomitantly with DNA methylation of the gene promoter. We also demonstrate that these variations are independent of DNA sequence changes and are stably inherited for several generations. In view of the recent evolutionary origin of QQS, we speculate that genes that emerge from scratch could be particularly prone to epigenetic variation. This would in turn endow epigenetic variation with a unique adaptive role in enabling de novo originated genes to adjust their expression pattern.
Collapse
|
103
|
Bornberg-Bauer E, Albà MM. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol 2013; 23:459-66. [PMID: 23562500 DOI: 10.1016/j.sbi.2013.02.012] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/15/2013] [Accepted: 02/15/2013] [Indexed: 11/29/2022]
Abstract
During protein evolution, novel domain arrangements are continuously formed. Rearrangements are important for the creation of molecular biodiversity and for functional molecular changes which underlie developmental shifts in the bauplan of organisms. Here we review the mechanisms by which new arrangements arise and the potential benefits of rearrangements. We concentrate on how new domains emerge and why they rapidly spread across genomes, gaining higher copy numbers than older, more established domains. This spread is most likely a consequence of their high adaptive potential but is unlikely to make up on its own for the drastic loss of domains, which is observed across different taxa. We show that a significant portion of the recently emerged domains, especially those in multidomain families, are highly disordered and speculate about the significance of these findings for the evolvability of novel genetic material.
Collapse
Affiliation(s)
- Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Hüfferstrasse 1, D48149 Münster, Germany.
| | | |
Collapse
|
104
|
Neme R, Tautz D. Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution. BMC Genomics 2013; 14:117. [PMID: 23433480 PMCID: PMC3616865 DOI: 10.1186/1471-2164-14-117] [Citation(s) in RCA: 158] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Accepted: 02/15/2013] [Indexed: 02/04/2023] Open
Abstract
Background New gene emergence is so far assumed to be mostly driven by duplication and divergence of existing genes. The possibility that entirely new genes could emerge out of the non-coding genomic background was long thought to be almost negligible. With the increasing availability of fully sequenced genomes across broad scales of phylogeny, it has become possible to systematically study the origin of new genes over time and thus revisit this question. Results We have used phylostratigraphy to assess trends of gene evolution across successive phylogenetic phases, using mostly the well-annotated mouse genome as a reference. We find several significant general trends and confirm them for three other vertebrate genomes (humans, zebrafish and stickleback). Younger genes are shorter, both with respect to gene length, as well as to open reading frame length. They contain also fewer exons and have fewer recognizable domains. Average exon length, on the other hand, does not change much over time. Only the most recently evolved genes have longer exons and they are often associated with active promotor regions, i.e. are part of bidirectional promotors. We have also revisited the possibility that de novo evolution of genes could occur even within existing genes, by making use of an alternative reading frame (overprinting). We find several cases among the annotated Ensembl ORFs, where the new reading frame has emerged at a higher phylostratigraphic level than the original one. We discuss some of these overprinted genes, which include also the Hoxa9 gene where an alternative reading frame covering the homeobox has emerged within the lineage leading to rodents and primates (Euarchontoglires). Conclusions We suggest that the overall trends of gene emergence are more compatible with a de novo evolution model for orphan genes than a general duplication-divergence model. Hence de novo evolution of genes appears to have occurred continuously throughout evolutionary time and should therefore be considered as a general mechanism for the emergence of new gene functions.
Collapse
Affiliation(s)
- Rafik Neme
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstrasse 2, Plön, 24306, Germany
| | | |
Collapse
|
105
|
Abstract
For decades, transposable elements have been known to produce a wide variety of changes in plant gene expression and function. This has led to the idea that transposable element activity has played a key part in adaptive plant evolution. This Review describes the kinds of changes that transposable elements can cause, discusses evidence that those changes have contributed to plant evolution and suggests future strategies for determining the extent to which these changes have in fact contributed to plant adaptation and evolution. Recent advances in genomics and phenomics for a range of plant species, particularly crops, have begun to allow the systematic assessment of these questions.
Collapse
Affiliation(s)
- Damon Lisch
- Department of Plant and Microbial Biology, UC Berkeley, Berkeley, California 94720, USA.
| |
Collapse
|
106
|
Yang L, Zou M, Fu B, He S. Genome-wide identification, characterization, and expression analysis of lineage-specific genes within zebrafish. BMC Genomics 2013; 14:65. [PMID: 23368736 PMCID: PMC3599513 DOI: 10.1186/1471-2164-14-65] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2012] [Accepted: 01/29/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The genomic basis of teleost phenotypic complexity remains obscure, despite increasing availability of genome and transcriptome sequence data. Fish-specific genome duplication cannot provide sufficient explanation for the morphological complexity of teleosts, considering the relatively large number of extinct basal ray-finned fishes. RESULTS In this study, we performed comparative genomic analysis to discover the Conserved Teleost-Specific Genes (CTSGs) and orphan genes within zebrafish and found that these two sets of lineage-specific genes may have played important roles during zebrafish embryogenesis. Lineage-specific genes within zebrafish share many of the characteristics of their counterparts in other species: shorter length, fewer exon numbers, higher GC content, and fewer of them have transcript support. Chromosomal location analysis indicated that neither the CTSGs nor the orphan genes were distributed evenly in the chromosomes of zebrafish. The significant enrichment of immunity proteins in CTSGs annotated by gene ontology (GO) or predicted ab initio may imply that defense against pathogens may be an important reason for the diversification of teleosts. The evolutionary origin of the lineage-specific genes was determined and a very high percentage of lineage-specific genes were generated via gene duplications. The temporal and spatial expression profile of lineage-specific genes obtained by expressed sequence tags (EST) and RNA-seq data revealed two novel properties: in addition to being highly tissue-preferred expression, lineage-specific genes are also highly temporally restricted, namely they are expressed in narrower time windows than evolutionarily conserved genes and are specifically enriched in later-stage embryos and early larval stages. CONCLUSIONS Our study provides the first systematic identification of two different sets of lineage-specific genes within zebrafish and provides valuable information leading towards a better understanding of the molecular mechanisms of the genomic basis of teleost phenotypic complexity for future studies.
Collapse
Affiliation(s)
- Liandong Yang
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei 430072, People's Republic of China
| | | | | | | |
Collapse
|
107
|
Abstract
Our understanding of gene expression has changed dramatically over the past decade, largely catalysed by technological developments. High-throughput experiments - microarrays and next-generation sequencing - have generated large amounts of genome-wide gene expression data that are collected in public archives. Added-value databases process, analyse and annotate these data further to make them accessible to every biologist. In this Review, we discuss the utility of the gene expression data that are in the public domain and how researchers are making use of these data. Reuse of public data can be very powerful, but there are many obstacles in data preparation and analysis and in the interpretation of the results. We will discuss these challenges and provide recommendations that we believe can improve the utility of such data.
Collapse
Affiliation(s)
- Johan Rung
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | |
Collapse
|
108
|
Oh DH, Dassanayake M, Bohnert HJ, Cheeseman JM. Life at the extreme: lessons from the genome. Genome Biol 2012; 13:241. [PMID: 22390828 DOI: 10.1186/gb-2012-13-3-241] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Extremophile plants thrive in places where most plant species cannot survive. Recent developments in high-throughput technologies and comparative genomics are shedding light on the evolutionary mechanisms leading to their adaptation.
Collapse
Affiliation(s)
- Dong-Ha Oh
- Department of Plant Biology, University of Illinois at Urbana-Champaign, 61801, USA
| | | | | | | |
Collapse
|
109
|
Katju V. In with the old, in with the new: the promiscuity of the duplication process engenders diverse pathways for novel gene creation. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:341932. [PMID: 23008799 PMCID: PMC3449122 DOI: 10.1155/2012/341932] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/03/2012] [Indexed: 01/26/2023]
Abstract
The gene duplication process has exhibited far greater promiscuity in the creation of paralogs with novel exon-intron structures than anticipated even by Ohno. In this paper I explore the history of the field, from the neo-Darwinian synthesis through Ohno's formulation of the canonical model for the evolution of gene duplicates and culminating in the present genomic era. I delineate the major tenets of Ohno's model and discuss its failure to encapsulate the full complexity of the duplication process as revealed in the era of genomics. I discuss the diverse classes of paralogs originating from both DNA- and RNA-mediated duplication events and their evolutionary potential for assuming radically altered functions, as well as the degree to which they can function unconstrained from the pressure of gene conversion. Lastly, I explore theoretical population-genetic considerations of how the effective population size (N(e)) of a species may influence the probability of emergence of genes with radically altered functions.
Collapse
Affiliation(s)
- Vaishali Katju
- Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
110
|
Rutter MT, Cross KV, Van Woert PA. Birth, death and subfunctionalization in the Arabidopsis genome. TRENDS IN PLANT SCIENCE 2012; 17:204-12. [PMID: 22326563 DOI: 10.1016/j.tplants.2012.01.006] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2011] [Revised: 01/12/2012] [Accepted: 01/16/2012] [Indexed: 05/08/2023]
Abstract
Arabidopsis thaliana is now a model system, not just for plant biology but also for comparative genomics. The completion of the sequences of two closely related species, Arabidopsis lyrata and Brassica rapa, is complemented by genomic comparisons among A. thaliana accessions and mutation accumulation lines. Together these genomic data document the birth of new genes via gene duplication, transposon exaptation and de novo formation of new genes from noncoding sequence. Most novel loci exhibit low expression, and are undergoing pseudogenization or subfunctionalization. Comparatively, A. thaliana has lost large amounts of sequence through deletion, particularly of transposable elements. Intraspecific genomic variation indicates high rates of deletion mutations and deletion polymorphisms across accessions, shedding light on the history of Arabidopsis genome architecture.
Collapse
Affiliation(s)
- Matthew T Rutter
- Department of Biology, College of Charleston, Charleston, SC 29401, USA.
| | | | | |
Collapse
|
111
|
Oh DH, Dassanayake M, Bohnert HJ, Cheeseman JM. Life at the extreme: lessons from the genome. Genome Biol 2012. [DOI: 10.1186/gb4003] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
|
112
|
Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MTA, Azam S, Fan G, Whaley AM, Farmer AD, Sheridan J, Iwata A, Tuteja R, Penmetsa RV, Wu W, Upadhyaya HD, Yang SP, Shah T, Saxena KB, Michael T, McCombie WR, Yang B, Zhang G, Yang H, Wang J, Spillane C, Cook DR, May GD, Xu X, Jackson SA. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 2011; 30:83-9. [PMID: 22057054 DOI: 10.1038/nbt.2022] [Citation(s) in RCA: 432] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 10/03/2011] [Indexed: 11/08/2022]
Abstract
Pigeonpea is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. We used the Illumina next-generation sequencing platform to generate 237.2 Gb of sequence, which along with Sanger-based bacterial artificial chromosome end sequences and a genetic map, we assembled into scaffolds representing 72.7% (605.78 Mb) of the 833.07 Mb pigeonpea genome. Genome analysis predicted 48,680 genes for pigeonpea and also showed the potential role that certain gene families, for example, drought tolerance-related genes, have played throughout the domestication of pigeonpea and the evolution of its ancestors. Although we found a few segmental duplication events, we did not observe the recent genome-wide duplication events observed in soybean. This reference genome sequence will facilitate the identification of the genetic basis of agronomically important traits, and accelerate the development of improved pigeonpea varieties that could improve food security in many developing countries.
Collapse
Affiliation(s)
- Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
113
|
Wilson BA, Masel J. Putatively noncoding transcripts show extensive association with ribosomes. Genome Biol Evol 2011; 3:1245-52. [PMID: 21948395 PMCID: PMC3209793 DOI: 10.1093/gbe/evr099] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
There have been recent surprising reports that whole genes can evolve de novo from noncoding sequences. This would be extraordinary if the noncoding sequences were random with respect to amino acid identity. However, if the noncoding sequences were previously translated at low rates, with the most strongly deleterious cryptic polypeptides purged by selection, then de novo gene origination would be more plausible. Here we analyze Saccharomyces cerevisiae data on noncoding transcripts found in association with ribosomes. We find many such transcripts. Although their average ribosomal densities are lower than those of protein-coding genes, a significant proportion of noncoding transcripts nevertheless have ribosomal densities comparable to those of coding genes. Most show increased ribosomal association in response to starvation, as has been previously reported for other noncoding sequences such as untranslated regions and introns. In rich media, ribosomal association is correlated with start codons but is not usually consistent and contiguous beyond that, suggesting that translation occurs only at low rates. One transcript contains a 28-codon open reading frame, which we name RDT1, which shows evidence of translation, and may be a new protein-coding gene that originated de novo from noncoding sequence. But the bulk of the ribosomal association cannot be attributed to unannotated protein-coding genes. Our primary finding of extensive ribosome association shows that a necessary precondition for selective purging is met, making de novo gene evolution more plausible. Our analysis is also proof of principle of the utility of ribosomal profiling data for the purpose of gene annotation.
Collapse
Affiliation(s)
- Benjamin A Wilson
- Department of Ecology and Evolutionary Biology, University of Arizona, USA
| | | |
Collapse
|