201
|
Kubicek CP, Steindorff AS, Chenthamara K, Manganiello G, Henrissat B, Zhang J, Cai F, Kopchinskiy AG, Kubicek EM, Kuo A, Baroncelli R, Sarrocco S, Noronha EF, Vannacci G, Shen Q, Grigoriev IV, Druzhinina IS. Evolution and comparative genomics of the most common Trichoderma species. BMC Genomics 2019; 20:485. [PMID: 31189469 PMCID: PMC6560777 DOI: 10.1186/s12864-019-5680-7] [Citation(s) in RCA: 124] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Accepted: 04/09/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The growing importance of the ubiquitous fungal genus Trichoderma (Hypocreales, Ascomycota) requires understanding of its biology and evolution. Many Trichoderma species are used as biofertilizers and biofungicides and T. reesei is the model organism for industrial production of cellulolytic enzymes. In addition, some highly opportunistic species devastate mushroom farms and can become pathogens of humans. A comparative analysis of the first three whole genomes revealed mycoparasitism as the innate feature of Trichoderma. However, the evolution of these traits is not yet understood. RESULTS We selected 12 most commonly occurring Trichoderma species and studied the evolution of their genome sequences. Trichoderma evolved in the time of the Cretaceous-Palaeogene extinction event 66 (±15) mya, but the formation of extant sections (Longibrachiatum, Trichoderma) or clades (Harzianum/Virens) happened in Oligocene. The evolution of the Harzianum clade and section Trichoderma was accompanied by significant gene gain, but the ancestor of section Longibrachiatum experienced rapid gene loss. The highest number of genes gained encoded ankyrins, HET domain proteins and transcription factors. We also identified the Trichoderma core genome, completely curated its annotation, investigated several gene families in detail and compared the results to those of other fungi. Eighty percent of those genes for which a function could be predicted were also found in other fungi, but only 67% of those without a predictable function. CONCLUSIONS Our study presents a time scaled pattern of genome evolution in 12 Trichoderma species from three phylogenetically distant clades/sections and a comprehensive analysis of their genes. The data offer insights in the evolution of a mycoparasite towards a generalist.
Collapse
Affiliation(s)
- Christian P Kubicek
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
- , Vienna, Austria
| | - Andrei S Steindorff
- Departamento de Biologia Celular, Universidade de Brasília, Brasíla, DF, Brazil
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Komal Chenthamara
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
| | - Gelsomina Manganiello
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
- Dipartimento di Agraria, Università degli Studi di Napoli "Federico II", Naples, Portici, Italy
| | - Bernard Henrissat
- CNRS, Aix-Marseille Université, Marseille, France
- INRA, Marseille, France
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Jian Zhang
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China
| | - Feng Cai
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China
| | - Alexey G Kopchinskiy
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria
| | | | - Alan Kuo
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA
| | - Riccardo Baroncelli
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Campus de Villamayor, Calle Del Duero, Villamayor, España
| | - Sabrina Sarrocco
- Department of Agriculture, Food and Environment, University of Pisa, Pisa, Italy
| | | | - Giovanni Vannacci
- Centro Hispano-Luso de Investigaciones Agrarias (CIALE), Departamento de Microbiología y Genética, Universidad de Salamanca, Campus de Villamayor, Calle Del Duero, Villamayor, España
| | - Qirong Shen
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China.
| | - Igor V Grigoriev
- US Department of Energy Joint Genome Institute, Walnut Creek, CA, USA.
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA.
| | - Irina S Druzhinina
- Microbiology and Applied Genomics Group, Research Area Biochemical Technology, Institute of Chemical, Environmental & Bioscience Engineering (ICEBE), TU Wien, Vienna, Austria.
- Jiangsu Provincial Key Lab of Organic Solid Waste Utilization, Nanjing Agricultural University, Nanjing, China.
| |
Collapse
|
202
|
Chekulaeva M, Rajewsky N. Roles of Long Noncoding RNAs and Circular RNAs in Translation. Cold Spring Harb Perspect Biol 2019; 11:cshperspect.a032680. [PMID: 30082465 DOI: 10.1101/cshperspect.a032680] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Most of the eukaryotic genome is pervasively transcribed, yielding hundreds to thousands of long noncoding RNAs (lncRNAs) and circular RNAs (circRNAs), some of which are well conserved during evolution. Functions have been described for a few lncRNAs and circRNAs but remain elusive for most. Both classes of RNAs play regulatory roles in translation by interacting with messenger RNAs (mRNAs), microRNAs (miRNAs), or mRNA-binding proteins (RBPs), thereby modulating translation in trans Moreover, although initially defined as noncoding, a number of lncRNAs and circRNAs have recently been reported to contain functional open reading frames (ORFs). Here, we review current understanding of the roles played by lncRNAs and circRNAs in protein synthesis and discuss challenges and open questions in the field.
Collapse
Affiliation(s)
- Marina Chekulaeva
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| | - Nikolaus Rajewsky
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, 13125 Berlin, Germany
| |
Collapse
|
203
|
Abstract
Genetic, transcriptional, and post-transcriptional variations shape the transcriptome of individual cells, rendering establishing an exhaustive set of reference RNAs a complicated matter. Current reference transcriptomes, which are based on carefully curated transcripts, are lagging behind the extensive RNA variation revealed by massively parallel sequencing. Much may be missed by ignoring this unreferenced RNA diversity. There is plentiful evidence for non-reference transcripts with important phenotypic effects. Although reference transcriptomes are inestimable for gene expression analysis, they may turn limiting in important medical applications. We discuss computational strategies for retrieving hidden transcript diversity.
Collapse
Affiliation(s)
- Antonin Morillon
- ncRNA, Epigenetic and Genome Fluidity, CNRS UMR 3244, Sorbonne Université, PSL University, Institut Curie, Centre de Recherche, 26 rue d'Ulm, 75248, Paris, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Sud, Université Paris Saclay, Gif sur Yvette, France.
| |
Collapse
|
204
|
Durand É, Gagnon-Arsenault I, Hallin J, Hatin I, Dubé AK, Nielly-Thibault L, Namy O, Landry CR. Turnover of ribosome-associated transcripts from de novo ORFs produces gene-like characteristics available for de novo gene emergence in wild yeast populations. Genome Res 2019; 29:932-943. [PMID: 31152050 PMCID: PMC6581059 DOI: 10.1101/gr.239822.118] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 05/13/2019] [Indexed: 12/17/2022]
Abstract
Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in laboratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene-like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.
Collapse
Affiliation(s)
- Éléonore Durand
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Gagnon-Arsenault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Johan Hallin
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Isabelle Hatin
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Alexandre K Dubé
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Lou Nielly-Thibault
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada
| | - Olivier Namy
- Institut de Biologie Intégrative de la Cellule (I2BC), CEA, CNRS, Université Paris-Sud, Université Paris-Saclay, 91190 Gif sur Yvette, France
| | - Christian R Landry
- Institut de Biologie Intégrative et des Systèmes, Département de Biologie, PROTEO, Centre de Recherche en Données Massives de l'Université Laval, Pavillon Charles-Eugène-Marchand, Université Laval, G1V 0A6 Québec, Québec, Canada.,Département de Biochimie, Microbiologie et Bio-informatique, Université Laval, G1V 0A6 Québec, Québec, Canada
| |
Collapse
|
205
|
Wang R, Wang Y, Zhang X, Zhang Y, Du X, Fang Y, Li G. Hierarchical cooperation of transcription factors from integration analysis of DNA sequences, ChIP-Seq and ChIA-PET data. BMC Genomics 2019; 20:296. [PMID: 32039697 PMCID: PMC7226942 DOI: 10.1186/s12864-019-5535-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Chromosomal architecture, which is constituted by chromatin loops, plays an important role in cellular functions. Gene expression and cell identity can be regulated by the chromatin loop, which is formed by proximal or distal enhancers and promoters in linear DNA (1D). Enhancers and promoters are fundamental non-coding elements enriched with transcription factors (TFs) to form chromatin loops. However, the specific cooperation of TFs involved in forming chromatin loops is not fully understood. Results Here, we proposed a method for investigating the cooperation of TFs in four cell lines by the integrative analysis of DNA sequences, ChIP-Seq and ChIA-PET data. Results demonstrate that the interaction of enhancers and promoters is a hierarchical and dynamic complex process with cooperative interactions of different TFs synergistically regulating gene expression and chromatin structure. The TF cooperation involved in maintaining and regulating the chromatin loop of cells can be regulated by epigenetic factors, such as other TFs and DNA methylation. Conclusions Such cooperation among TFs provides the potential features that can affect chromatin’s 3D architecture in cells. The regulation of chromatin 3D organization and gene expression is a complex process associated with the hierarchical and dynamic prosperities of TFs. Electronic supplementary material The online version of this article (10.1186/s12864-019-5535-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ruimin Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yunlong Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xueying Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yaliang Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xiaoyong Du
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China.,Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
206
|
Xu YC, Niu XM, Li XX, He W, Chen JF, Zou YP, Wu Q, Zhang YE, Busch W, Guo YL. Adaptation and Phenotypic Diversification in Arabidopsis through Loss-of-Function Mutations in Protein-Coding Genes. THE PLANT CELL 2019; 31:1012-1025. [PMID: 30886128 PMCID: PMC6533021 DOI: 10.1105/tpc.18.00791] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 02/25/2019] [Accepted: 03/17/2019] [Indexed: 05/07/2023]
Abstract
According to the less-is-more hypothesis, gene loss is an engine for evolutionary change. Loss-of-function (LoF) mutations resulting in the natural knockout of protein-coding genes not only provide information about gene function but also play important roles in adaptation and phenotypic diversification. Although the less-is-more hypothesis was proposed two decades ago, it remains to be explored on a large scale. In this study, we identified 60,819 LoF variants in 1071 Arabidopsis (Arabidopsis thaliana) genomes and found that 34% of Arabidopsis protein-coding genes annotated in the Columbia-0 genome do not have any LoF variants. We found that nucleotide diversity, transposable element density, and gene family size are strongly correlated with the presence of LoF variants. Intriguingly, 0.9% of LoF variants with minor allele frequency larger than 0.5% are associated with climate change. In addition, in the Yangtze River basin population, 1% of genes with LoF mutations were under positive selection, providing important insights into the contribution of LoF mutations to adaptation. In particular, our results demonstrate that LoF mutations shape diverse phenotypic traits. Overall, our results highlight the importance of the LoF variants for the adaptation and phenotypic diversification of plants.
Collapse
Affiliation(s)
- Yong-Chao Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiao-Min Niu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin-Xin Li
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenrong He
- Salk Institute for Biological Studies, Plant Molecular and Cellular Biology Laboratory, La Jolla, California 92037
| | - Jia-Fu Chen
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yu-Pan Zou
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiong Wu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Yong E Zhang
- University of Chinese Academy of Sciences, Beijing 100049, China
- State Key Laboratory of Integrated Management of Pest Insects and Rodents & Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
| | - Wolfgang Busch
- Salk Institute for Biological Studies, Plant Molecular and Cellular Biology Laboratory, La Jolla, California 92037
| | - Ya-Long Guo
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
207
|
Affiliation(s)
- Stephen Branden Van Oss
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA, United States of America
| |
Collapse
|
208
|
|
209
|
Mohd-Assaad N, McDonald BA, Croll D. The emergence of the multi-species NIP1 effector in Rhynchosporium was accompanied by high rates of gene duplications and losses. Environ Microbiol 2019; 21:2677-2695. [PMID: 30838748 DOI: 10.1111/1462-2920.14583] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 02/23/2019] [Accepted: 03/04/2019] [Indexed: 01/28/2023]
Abstract
Plant pathogens secrete effector proteins to manipulate the host and facilitate infection. Cognate hosts trigger strong defence responses upon detection of these effectors. Consequently, pathogens and hosts undergo rapid coevolutionary arms races driven by adaptive evolution of effectors and receptors. Because of their high rate of turnover, most effectors are thought to be species-specific and the evolutionary trajectories are poorly understood. Here, we investigate the necrosis-inducing protein 1 (NIP1) effector in the multihost pathogen genus Rhynchosporium. We retraced the evolutionary history of the NIP1 locus using whole-genome assemblies of 146 strains covering four closely related species. NIP1 orthologues were present in all species but the locus consistently segregated presence-absence polymorphisms suggesting long-term balancing selection. We also identified previously unknown paralogues of NIP1 that were shared among multiple species and showed substantial copy-number variation within R. commune. The NIP1A paralogue was under significant positive selection suggesting that NIP1A is the dominant effector variant coevolving with host immune receptors. Consistent with this prediction, we found that copy number variation at NIP1A had a stronger effect on virulence than NIP1B. Our analyses unravelled the origins and diversification mechanisms of a pathogen effector family shedding light on how pathogens gain adaptive genetic variation.
Collapse
Affiliation(s)
- Norfarhan Mohd-Assaad
- Plant Pathology, Institute of Integrative Biology, ETH, Zurich, 8092 Zurich, Switzerland.,School of Biosciences and Biotechnology, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Bruce A McDonald
- Plant Pathology, Institute of Integrative Biology, ETH, Zurich, 8092 Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, 2000 Neuchâtel, Switzerland
| |
Collapse
|
210
|
Rapid evolution of protein diversity by de novo origination in Oryza. Nat Ecol Evol 2019; 3:679-690. [PMID: 30858588 DOI: 10.1038/s41559-019-0822-5] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Accepted: 01/23/2019] [Indexed: 12/22/2022]
Abstract
New protein-coding genes that arise de novo from non-coding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral non-coding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral non-coding sequences and evidence of translation. Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favoured in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.
Collapse
|
211
|
Khitun A, Ness TJ, Slavoff SA. Small open reading frames and cellular stress responses. Mol Omics 2019; 15:108-116. [PMID: 30810554 DOI: 10.1039/c8mo00283e] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Small open reading frames (smORFs) encoding polypeptides of less than 100 amino acids in eukaryotes (50 amino acids in prokaryotes) were historically excluded from genome annotation. However, recent advances in genomics, ribosome footprinting, and proteomics have revealed thousands of translated smORFs in genomes spanning evolutionary space. These smORFs can encode functional polypeptides, or act as cis-translational regulators. Herein we review evidence that some smORF-encoded polypeptides (SEPs) participate in stress responses in both prokaryotes and eukaryotes, and that some upstream ORFs (uORFs) regulate stress-responsive translation of downstream cistrons in eukaryotic cells. These studies provide insight into a regulated subclass of smORFs and suggest that at least some SEPs may participate in maintenance of cellular homeostasis under stress.
Collapse
Affiliation(s)
- Alexandra Khitun
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Travis J Ness
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Sarah A Slavoff
- Chemical Biology Institute, Yale University, West Haven, CT 06516, USA. and Department of Chemistry, Yale University, New Haven, CT 06520, USA and Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
212
|
Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids. Proc Natl Acad Sci U S A 2019; 116:4400-4405. [PMID: 30765531 PMCID: PMC6410882 DOI: 10.1073/pnas.1817138116] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The diverse antifreeze proteins enabling the survival of different polar fishes in freezing seas offer unparalleled vistas into the breadth of genetic sources and mechanisms that produce crucial new functions. Although most new genes evolved from preexisting genic ancestors, some are deemed to have arisen from noncoding DNA. However, the pertinent mechanisms, functions, and selective forces remain uncertain. Our paper presents clear evidence that the antifreeze glycoprotein gene of the northern codfish originated from a noncoding region. We further describe the detailed mechanism of its evolutionary transformation into a full-fledged crucial life-saving gene. This paper is a concrete dissection of the process of a de novo gene birth that has conferred a vital adaptive function directly linked to natural selection. A fundamental question in evolutionary biology is how genetic novelty arises. De novo gene birth is a recently recognized mechanism, but the evolutionary process and function of putative de novo genes remain largely obscure. With a clear life-saving function, the diverse antifreeze proteins of polar fishes are exemplary adaptive innovations and models for investigating new gene evolution. Here, we report clear evidence and a detailed molecular mechanism for the de novo formation of the northern gadid (codfish) antifreeze glycoprotein (AFGP) gene from a minimal noncoding sequence. We constructed genomic DNA libraries for AFGP-bearing and AFGP-lacking species across the gadid phylogeny and performed fine-scale comparative analyses of the AFGP genomic loci and homologs. We identified the noncoding founder region and a nine-nucleotide (9-nt) element therein that supplied the codons for one Thr-Ala-Ala unit from which the extant repetitive AFGP-coding sequence (cds) arose through tandem duplications. The latent signal peptide (SP)-coding exons were fortuitous noncoding DNA sequence immediately upstream of the 9-nt element, which, when spliced, supplied a typical secretory signal. Through a 1-nt frameshift mutation, these two parts formed a single read-through open reading frame (ORF). It became functionalized when a putative translocation event conferred the essential cis promoter for transcriptional initiation. We experimentally proved that all genic components of the extant gadid AFGP originated from entirely nongenic DNA. The gadid AFGP evolutionary process also represents a rare example of the proto-ORF model of de novo gene birth where a fully formed ORF existed before the regulatory element to activate transcription was acquired.
Collapse
|
213
|
Wang Y, Zeng Z, Liu TL, Sun L, Yao Q, Chen KP. TA, GT and AC are significantly under-represented in open reading frames of prokaryotic and eukaryotic protein-coding genes. Mol Genet Genomics 2019; 294:637-647. [PMID: 30758669 DOI: 10.1007/s00438-019-01535-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 01/31/2019] [Indexed: 01/01/2023]
Abstract
Genomes can be considered a combination of 16 dinucleotides. Analysing the relative abundance of different dinucleotides may reveal important features of genome evolution. In present study, we conducted extensive surveys on the relative abundances of dinucleotides in various genomic components of 28 bacterial, 20 archaean, 19 fungal, 24 plant and 29 animal species. We found that TA, GT and AC are significantly under-represented in open reading frames of all organisms and in intergenic regions and introns of most organisms. Specific dinucleotides are of greatly varied usage at different codon positions. The significantly low representations of TA, GT and AC are considered the evolutionary consequences of preventing formation of pre-mature stop codons and of reducing intron-splicing options in candidate primary mRNA sequences. These data suggest that a reduction of TA and GT occurred on both strands of the DNA sequence at an early stage of de novo gene birth. Interestingly, GT and AC are also significantly under-represented in current prokaryotic genomes, suggesting that ancient prokaryotic protein-coding genes might have contained introns. The greatly varied usages of specific dinucleotides at different codon positions are considered evolutionary accommodations to compensate the unavailability of specific codons and to avoid formation of pre-mature stop codons. This is the first report presenting data of dinucleotide relative abundance to indicate the possible existence of spliceosomal introns in ancient prokaryotic genes and to hypothesize early steps of de novo gene birth.
Collapse
Affiliation(s)
- Yong Wang
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China.
| | - Zhen Zeng
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Tian-Lei Liu
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Ling Sun
- School of Food and Biological Engineering, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Qin Yao
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| | - Ke-Ping Chen
- Institute of Life Sciences, Jiangsu University, 301 Xuefu Road, Zhenjiang, 212013, China
| |
Collapse
|
214
|
Vakirlis N, Hebert AS, Opulente DA, Achaz G, Hittinger CT, Fischer G, Coon JJ, Lafontaine I. A Molecular Portrait of De Novo Genes in Yeasts. Mol Biol Evol 2019; 35:631-645. [PMID: 29220506 DOI: 10.1093/molbev/msx315] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Alex S Hebert
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI
| | - Dana A Opulente
- Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Guillaume Achaz
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,SMILE Group, CIRB UMR7241, Collège de France, Paris, France
| | - Chris Todd Hittinger
- DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Laboratory of Genetics, Genome Center of Wisconsin, J. F. Crow Institute for the Study of Evolution, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI
| | - Gilles Fischer
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative UMR7238, 75005 Paris, France
| | - Joshua J Coon
- Genome Center of Wisconsin, University of Wisconsin-Madison, Madison, WI.,DOE Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, WI.,Department of Biomolecular Chemistry, University of Wisconsin-Madison, Madison, WI.,Department of Chemistry, University of Wisconsin-Madison, Madison, WI.,Morgridge Institute for Research, Madison, WI
| | - Ingrid Lafontaine
- Atelier de BioInformatique, ISyEB UMR7205 Muséum National d'Histoire Naturelle, Paris, France.,Sorbonne Universités, UPMC Univ Paris 06, CNRS, Institut de Biologie Physico-Chimique, Physiologie Membranaire et Moléculaire du Chloroplaste UMR7141, 75005 Paris, France
| |
Collapse
|
215
|
Claverie JM, Abergel C, Legendre M. [Giant viruses that create their own genes]. Med Sci (Paris) 2019; 34:1087-1091. [PMID: 30623766 DOI: 10.1051/medsci/2018300] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Since 2003 and the discovery of Mimivirus, the saga of giant viruses continues with the isolation of new amoeba viruses, which are now divided into seven distinct families, the origin (s) of which are still mysterious and controversial. Thanks to the isolation of 3 new members of the Pandoraviridae family, whose micrometric particles and genomes of more than 2 megabases encroach on the cellular world, we carried out a stringent re-analysis of their gene contents, using a combination of transcriptomic, proteomic and bioinformatic approaches. We concluded that the only scenario capable of accounting for the distribution and the huge proportion of orphan genes ("ORFans") that characterize Pandoraviruses is that they were created de novo within the intergenic regions. This process, perhaps shared among other large DNA viruses, challenges the central paradigm of molecular evolution according to which all genes / proteins have an ancestry history.
Collapse
Affiliation(s)
- Jean-Michel Claverie
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| | - Chantal Abergel
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| | - Matthieu Legendre
- Aix-Marseille université et CNRS, Information génomique et structurale (IGS), UMR7256, Institut de microbiologie de la Méditerranée-IMM-FR 3479, parc scientifique de Luminy, 163, avenue de Luminy, case 934, 13288 Marseille Cedex 09, France
| |
Collapse
|
216
|
McGowan J, Byrne KP, Fitzpatrick DA. Comparative Analysis of Oomycete Genome Evolution Using the Oomycete Gene Order Browser (OGOB). Genome Biol Evol 2019; 11:189-206. [PMID: 30535146 PMCID: PMC6330052 DOI: 10.1093/gbe/evy267] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2018] [Indexed: 01/01/2023] Open
Abstract
The oomycetes are a class of microscopic, filamentous eukaryotes within the stramenopiles–alveolates–rhizaria eukaryotic supergroup. They include some of the most destructive pathogens of animals and plants, such as Phytophthora infestans, the causative agent of late potato blight. Despite the threat they pose to worldwide food security and natural ecosystems, there is a lack of tools and databases available to study oomycete genetics and evolution. To this end, we have developed the Oomycete Gene Order Browser (OGOB), a curated database that facilitates comparative genomic and syntenic analyses of oomycete species. OGOB incorporates genomic data for 20 oomycete species including functional annotations and a number of bioinformatics tools. OGOB hosts a robust set of orthologous oomycete genes for evolutionary analyses. Here, we present the structure and function of OGOB as well as a number of comparative genomic analyses we have performed to better understand oomycete genome evolution. We analyze the extent of oomycete gene duplication and identify tandem gene duplication as a driving force of the expansion of secreted oomycete genes. We identify core genes that are present and microsyntenically conserved (termed syntenologs) in oomycete lineages and identify the degree of microsynteny between each pair of the 20 species housed in OGOB. Consistent with previous comparative synteny analyses between a small number of oomycete species, our results reveal an extensive degree of microsyntenic conservation amongst genes with housekeeping functions within the oomycetes. OGOB is available at https://ogob.ie.
Collapse
Affiliation(s)
- Jamie McGowan
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| | - Kevin P Byrne
- School of Medicine, UCD Conway Institute, University College Dublin, Ireland
| | - David A Fitzpatrick
- Genome Evolution Laboratory, Department of Biology, Maynooth University, Co. Kildare, Ireland.,Human Health Research Institute, Maynooth University, Co. Kildare, Ireland
| |
Collapse
|
217
|
Qi M, Zheng W, Zhao X, Hohenstein JD, Kandel Y, O'Conner S, Wang Y, Du C, Nettleton D, MacIntosh GC, Tylka GL, Wurtele ES, Whitham SA, Li L. QQS orphan gene and its interactor NF-YC4 reduce susceptibility to pathogens and pests. PLANT BIOTECHNOLOGY JOURNAL 2019; 17:252-263. [PMID: 29878511 PMCID: PMC6330549 DOI: 10.1111/pbi.12961] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 06/04/2018] [Indexed: 05/19/2023]
Abstract
Enhancing the nutritional quality and disease resistance of crops without sacrificing productivity is a key issue for developing varieties that are valuable to farmers and for simultaneously improving food security and sustainability. Expression of the Arabidopsis thaliana species-specific AtQQS (Qua-Quine Starch) orphan gene or its interactor, NF-YC4 (Nuclear Factor Y, subunit C4), has been shown to increase levels of leaf/seed protein without affecting the growth and yield of agronomic species. Here, we demonstrate that overexpression of AtQQS and NF-YC4 in Arabidopsis and soybean enhances resistance/reduces susceptibility to viruses, bacteria, fungi, aphids and soybean cyst nematodes. A series of Arabidopsis mutants in starch metabolism were used to explore the relationships between QQS expression, carbon and nitrogen partitioning, and defense. The enhanced basal defenses mediated by QQS were independent of changes in protein/carbohydrate composition of the plants. We demonstrate that either AtQQS or NF-YC4 overexpression in Arabidopsis and in soybean reduces susceptibility of these plants to pathogens/pests. Transgenic soybean lines overexpressing NF-YC4 produce seeds with increased protein while maintaining healthy growth. Pull-down studies reveal that QQS interacts with human NF-YC, as well as with Arabidopsis NF-YC4, and indicate two QQS binding sites near the NF-YC-histone-binding domain. A new model for QQS interaction with NF-YC is speculated. Our findings illustrate the potential of QQS and NF-YC4 to increase protein and improve defensive traits in crops, overcoming the normal growth-defense trade-offs.
Collapse
Affiliation(s)
- Mingsheng Qi
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Wenguang Zheng
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
| | - Xuefeng Zhao
- Laurence H. Baker Center for Bioinformatics and Biological StatisticsIowa State UniversityAmesIAUSA
| | - Jessica D. Hohenstein
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular BiologyIowa State UniversityAmesIAUSA
| | - Yuba Kandel
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Seth O'Conner
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Department of Biological SciencesMississippi State UniversityStarkvilleMSUSA
| | - Yifan Wang
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Chuanlong Du
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Dan Nettleton
- Department of StatisticsIowa State UniversityAmesIAUSA
| | - Gustavo C. MacIntosh
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular BiologyIowa State UniversityAmesIAUSA
| | - Gregory L. Tylka
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Eve S. Wurtele
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Center for Metabolic BiologyIowa State UniversityAmesIAUSA
| | - Steven A. Whitham
- Department of Plant Pathology and MicrobiologyIowa State UniversityAmesIAUSA
| | - Ling Li
- Department of Genetics, Development and Cell BiologyIowa State UniversityAmesIAUSA
- Department of Biological SciencesMississippi State UniversityStarkvilleMSUSA
- Center for Metabolic BiologyIowa State UniversityAmesIAUSA
| |
Collapse
|
218
|
Abstract
De novo genes, that is, protein-coding genes originating from previously noncoding sequence, have gone from being considered impossibly unlikely to being recognized as an important source of genetic novelty in eukaryotic genomes. It is clear that de novo gene evolution is a rare but consistent feature of eukaryotic genomes, being detected in every genome studied. However, different studies often use different computational methods, and the numbers and identities of the detected genes vary greatly. Here we present a coherent protocol for the computational identification of de novo genes by comparative genomics. The method described uses homology searches, identification of syntenic regions, and ancestral sequence reconstruction to produce high-confidence candidates with robust evidence of de novo emergence. It is designed to be easily applicable given the basic knowledge of bioinformatic tools and scalable so that it can be applied on large and small datasets.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland.
| | - Aoife McLysaght
- Department of Genetics, Trinity College Dublin, Smurfit Institute of Genetics, University of Dublin, Dublin, Ireland
| |
Collapse
|
219
|
Translation of Small Open Reading Frames: Roles in Regulation and Evolutionary Innovation. Trends Genet 2018; 35:186-198. [PMID: 30606460 DOI: 10.1016/j.tig.2018.12.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/07/2018] [Indexed: 01/01/2023]
Abstract
The translatome can be defined as the sum of the RNA sequences that are translated into proteins in the cell by the ribosomal machinery. Until recently, it was generally assumed that the translatome was essentially restricted to evolutionary conserved proteins encoded by the set of annotated protein-coding genes. However, it has become increasingly clear that it also includes small regulatory open reading frames (ORFs), functional micropeptides, de novo proteins, and the pervasive translation of likely nonfunctional proteins. Many of these ORFs have been discovered thanks to the development of ribosome profiling, a technique to sequence ribosome-protected RNA fragments. To fully capture the diversity of translated ORFs, we propose a comprehensive classification that includes the new types of translated ORFs in addition to standard proteins.
Collapse
|
220
|
Exaptation at the molecular genetic level. SCIENCE CHINA-LIFE SCIENCES 2018; 62:437-452. [PMID: 30798493 DOI: 10.1007/s11427-018-9447-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 12/01/2018] [Indexed: 12/22/2022]
Abstract
The realization that body parts of animals and plants can be recruited or coopted for novel functions dates back to, or even predates the observations of Darwin. S.J. Gould and E.S. Vrba recognized a mode of evolution of characters that differs from adaptation. The umbrella term aptation was supplemented with the concept of exaptation. Unlike adaptations, which are restricted to features built by selection for their current role, exaptations are features that currently enhance fitness, even though their present role was not a result of natural selection. Exaptations can also arise from nonaptations; these are characters which had previously been evolving neutrally. All nonaptations are potential exaptations. The concept of exaptation was expanded to the molecular genetic level which aided greatly in understanding the enormous potential of neutrally evolving repetitive DNA-including transposed elements, formerly considered junk DNA-for the evolution of genes and genomes. The distinction between adaptations and exaptations is outlined in this review and examples are given. Also elaborated on is the fact that such distinctions are sometimes more difficult to determine; this is a widespread phenomenon in biology, where continua abound and clear borders between states and definitions are rare.
Collapse
|
221
|
Wirthlin M, Lima NCB, Guedes RLM, Soares AER, Almeida LGP, Cavaleiro NP, Loss de Morais G, Chaves AV, Howard JT, Teixeira MDM, Schneider PN, Santos FR, Schatz MC, Felipe MS, Miyaki CY, Aleixo A, Schneider MPC, Jarvis ED, Vasconcelos ATR, Prosdocimi F, Mello CV. Parrot Genomes and the Evolution of Heightened Longevity and Cognition. Curr Biol 2018; 28:4001-4008.e7. [PMID: 30528582 DOI: 10.1016/j.cub.2018.10.050] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Revised: 08/14/2018] [Accepted: 10/23/2018] [Indexed: 10/27/2022]
Abstract
Parrots are one of the most distinct and intriguing groups of birds, with highly expanded brains [1], highly developed cognitive [2] and vocal communication [3] skills, and a long lifespan compared to other similar-sized birds [4]. Yet the genetic basis of these traits remains largely unidentified. To address this question, we have generated a high-coverage, annotated assembly of the genome of the blue-fronted Amazon (Amazona aestiva) and carried out extensive comparative analyses with 30 other avian species, including 4 additional parrots. We identified several genomic features unique to parrots, including parrot-specific novel genes and parrot-specific modifications to coding and regulatory sequences of existing genes. We also discovered genomic features under strong selection in parrots and other long-lived birds, including genes previously associated with lifespan determination as well as several hundred new candidate genes. These genes support a range of cellular functions, including telomerase activity; DNA damage repair; control of cell proliferation, cancer, and immunity; and anti-oxidative mechanisms. We also identified brain-expressed, parrot-specific paralogs with known functions in neural development or vocal-learning brain circuits. Intriguingly, parrot-specific changes in conserved regulatory sequences were overwhelmingly associated with genes that are linked to cognitive abilities and have undergone similar selection in the human lineage, suggesting convergent evolution. These findings bring novel insights into the genetics and evolution of longevity and cognition, as well as provide novel targets for exploring the mechanistic basis of these traits.
Collapse
Affiliation(s)
- Morgan Wirthlin
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA
| | - Nicholas C B Lima
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ 21941-902, Brazil
| | - Rafael Lucas Muniz Guedes
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - André E R Soares
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Luiz Gonzaga P Almeida
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Nathalia P Cavaleiro
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Guilherme Loss de Morais
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Anderson V Chaves
- Programa de Pós-graduação em Manejo e Conservação de Ecossistemas Naturais e Agrários, Instituto de Ciências Biológicas e da Saúde, Universidade Federal de Viçosa, Florestal, Minas Gerais, Brazil
| | - Jason T Howard
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Marcus de Melo Teixeira
- Núcleo de Medicina Tropical, Faculdade de Medicina, Universidade de Brasília, Brasília, DF 70910-900, Brazil
| | - Patricia N Schneider
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| | - Fabrício R Santos
- Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil
| | - Michael C Schatz
- Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Maria Sueli Felipe
- Programa de Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília e Depto. de Biologia Celular, Universidade de Brasilia, Brasilia, DF, Brazil
| | - Cristina Y Miyaki
- Instituto de Biociências, Universidade de São Paulo, R. do Matão, 277, São Paulo, SP 05508-090, Brazil
| | - Alexandre Aleixo
- Coordenação de Zoologia, Museu Paraense Emilio Goeldi, Belém, PA 66040-170, Brazil
| | - Maria P C Schneider
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, PA, Brazil
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY 10065, USA; Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Ana Tereza R Vasconcelos
- Laboratório Nacional de Computação Científica, Rua Getúlio Vargas 333, Quitandinha, Petrópolis, RJ 25651-070, Brazil
| | - Francisco Prosdocimi
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ 21941-902, Brazil.
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health & Science University, Portland, OR 97239, USA.
| |
Collapse
|
222
|
Casola C. From De Novo to "De Nono": The Majority of Novel Protein-Coding Genes Identified with Phylostratigraphy Are Old Genes or Recent Duplicates. Genome Biol Evol 2018; 10:2906-2918. [PMID: 30346517 PMCID: PMC6239577 DOI: 10.1093/gbe/evy231] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/10/2018] [Indexed: 12/11/2022] Open
Abstract
The evolution of novel protein-coding genes from noncoding regions of the genome is one of the most compelling pieces of evidence for genetic innovations in nature. One popular approach to identify de novo genes is phylostratigraphy, which consists of determining the approximate time of origin (age) of a gene based on its distribution along a species phylogeny. Several studies have revealed significant flaws in determining the age of genes, including de novo genes, using phylostratigraphy alone. However, the rate of false positives in de novo gene surveys, based on phylostratigraphy, remains unknown. Here, I reanalyze the findings from three studies, two of which identified tens to hundreds of rodent-specific de novo genes adopting a phylostratigraphy-centered approach. Most putative de novo genes discovered in these investigations are no longer included in recently updated mouse gene sets. Using a combination of synteny information and sequence similarity searches, I show that ∼60% of the remaining 381 putative de novo genes share homology with genes from other vertebrates, originated through gene duplication, and/or share no synteny information with nonrodent mammals. These results led to an estimated rate of ∼12 de novo genes per million years in mouse. Contrary to a previous study (Wilson BA, Foy SG, Neme R, Masel J. 2017. Young genes are highly disordered as predicted by the preadaptation hypothesis of de novo gene birth. Nat Ecol Evol. 1:0146), I found no evidence supporting the preadaptation hypothesis of de novo gene formation. Nearly half of the de novo genes confirmed in this study are within older genes, indicating that co-option of preexisting regulatory regions and a higher GC content may facilitate the origin of novel genes.
Collapse
Affiliation(s)
- Claudio Casola
- Department of Ecosystem Science and Management, Texas A&M University
| |
Collapse
|
223
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
224
|
Silar P, Dauget JM, Gautier V, Grognet P, Chablat M, Hermann-Le Denmat S, Couloux A, Wincker P, Debuchy R. A gene graveyard in the genome of the fungus Podospora comata. Mol Genet Genomics 2018; 294:177-190. [PMID: 30288581 DOI: 10.1007/s00438-018-1497-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Accepted: 09/28/2018] [Indexed: 02/07/2023]
Abstract
Mechanisms involved in fine adaptation of fungi to their environment include differential gene regulation associated with single nucleotide polymorphisms and indels (including transposons), horizontal gene transfer, gene copy amplification, as well as pseudogenization and gene loss. The two Podospora genome sequences examined here emphasize the role of pseudogenization and gene loss, which have rarely been documented in fungi. Podospora comata is a species closely related to Podospora anserina, a fungus used as model in several laboratories. Comparison of the genome of P. comata with that of P. anserina, whose genome is available for over 10 years, should yield interesting data related to the modalities of genome evolution between these two closely related fungal species that thrive in the same types of biotopes, i.e., herbivore dung. Here, we present the genome sequence of the mat + isolate of the P. comata reference strain T. Comparison with the genome of the mat + isolate of P. anserina strain S confirms that P. anserina and P. comata are likely two different species that rarely interbreed in nature. Despite having a 94-99% of nucleotide identity in the syntenic regions of their genomes, the two species differ by nearly 10% of their gene contents. Comparison of the species-specific gene sets uncovered genes that could be responsible for the known physiological differences between the two species. Finally, we identified 428 and 811 pseudogenes (3.8 and 7.2% of the genes) in P. anserina and P. comata, respectively. Presence of high numbers of pseudogenes supports the notion that difference in gene contents is due to gene loss rather than horizontal gene transfers. We propose that the high frequency of pseudogenization leading to gene loss in P. anserina and P. comata accompanies specialization of these two fungi. Gene loss may be more prevalent during the evolution of other fungi than usually thought.
Collapse
Affiliation(s)
- Philippe Silar
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France.
| | - Jean-Marc Dauget
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France
| | - Valérie Gautier
- Univ Paris Diderot, Sorbonne Paris Cité, Laboratoire Interdisciplinaire des Energies de Demain, 75205, Paris Cedex 13, France
| | - Pierre Grognet
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Michelle Chablat
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Sylvie Hermann-Le Denmat
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France.,Ecole Normale Supérieure, 75005, Paris, France
| | - Arnaud Couloux
- CEA, Genoscope, Institut de biologie François Jacob, CP 5706, Evry, France
| | - Patrick Wincker
- CEA, Genoscope, Institut de biologie François Jacob, CP 5706, Evry, France.,CNRS UMR 8030, Evry, France.,Univ. Evry, Université Paris-Saclay, Evry, France
| | - Robert Debuchy
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France.
| |
Collapse
|
225
|
Hellen CUT. Translation Termination and Ribosome Recycling in Eukaryotes. Cold Spring Harb Perspect Biol 2018; 10:cshperspect.a032656. [PMID: 29735640 DOI: 10.1101/cshperspect.a032656] [Citation(s) in RCA: 121] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Termination of mRNA translation occurs when a stop codon enters the A site of the ribosome, and in eukaryotes is mediated by release factors eRF1 and eRF3, which form a ternary eRF1/eRF3-guanosine triphosphate (GTP) complex. eRF1 recognizes the stop codon, and after hydrolysis of GTP by eRF3, mediates release of the nascent peptide. The post-termination complex is then disassembled, enabling its constituents to participate in further rounds of translation. Ribosome recycling involves splitting of the 80S ribosome by the ATP-binding cassette protein ABCE1 to release the 60S subunit. Subsequent dissociation of deacylated transfer RNA (tRNA) and messenger RNA (mRNA) from the 40S subunit may be mediated by initiation factors (priming the 40S subunit for initiation), by ligatin (eIF2D) or by density-regulated protein (DENR) and multiple copies in T-cell lymphoma-1 (MCT1). These events may be subverted by suppression of termination (yielding carboxy-terminally extended read-through polypeptides) or by interruption of recycling, leading to reinitiation of translation near the stop codon.
Collapse
Affiliation(s)
- Christopher U T Hellen
- Department of Cell Biology, State University of New York, Downstate Medical Center, New York, New York 11203
| |
Collapse
|
226
|
Werner MS, Sieriebriennikov B, Prabh N, Loschko T, Lanz C, Sommer RJ. Young genes have distinct gene structure, epigenetic profiles, and transcriptional regulation. Genome Res 2018; 28:1675-1687. [PMID: 30232198 PMCID: PMC6211652 DOI: 10.1101/gr.234872.118] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 09/05/2018] [Indexed: 12/22/2022]
Abstract
Species-specific, new, or "orphan" genes account for 10%-30% of eukaryotic genomes. Although initially considered to have limited function, an increasing number of orphan genes have been shown to provide important phenotypic innovation. How new genes acquire regulatory sequences for proper temporal and spatial expression is unknown. Orphan gene regulation may rely in part on origination in open chromatin adjacent to preexisting promoters, although this has not yet been assessed by genome-wide analysis of chromatin states. Here, we combine taxon-rich nematode phylogenies with Iso-Seq, RNA-seq, ChIP-seq, and ATAC-seq to identify the gene structure and epigenetic signature of orphan genes in the satellite model nematode Pristionchus pacificus Consistent with previous findings, we find young genes are shorter, contain fewer exons, and are on average less strongly expressed than older genes. However, the subset of orphan genes that are expressed exhibit distinct chromatin states from similarly expressed conserved genes. Orphan gene transcription is determined by a lack of repressive histone modifications, confirming long-held hypotheses that open chromatin is important for new gene formation. Yet orphan gene start sites more closely resemble enhancers defined by H3K4me1, H3K27ac, and ATAC-seq peaks, in contrast to conserved genes that exhibit traditional promoters defined by H3K4me3 and H3K27ac. Although the majority of orphan genes are located on chromosome arms that contain high recombination rates and repressive histone marks, strongly expressed orphan genes are more randomly distributed. Our results support a model of new gene origination by rare integration into open chromatin near enhancers.
Collapse
Affiliation(s)
- Michael S Werner
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Bogdan Sieriebriennikov
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Neel Prabh
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Tobias Loschko
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Christa Lanz
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Ralf J Sommer
- Department of Evolutionary Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| |
Collapse
|
227
|
Incipient de novo genes can evolve from frozen accidents that escaped rapid transcript turnover. Nat Ecol Evol 2018; 2:1626-1632. [DOI: 10.1038/s41559-018-0639-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Accepted: 07/09/2018] [Indexed: 11/08/2022]
|
228
|
Willis S, Masel J. Gene Birth Contributes to Structural Disorder Encoded by Overlapping Genes. Genetics 2018; 210:303-313. [PMID: 30026186 PMCID: PMC6116962 DOI: 10.1534/genetics.118.301249] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Accepted: 07/18/2018] [Indexed: 11/18/2022] Open
Abstract
The same nucleotide sequence can encode two protein products in different reading frames. Overlapping gene regions encode higher levels of intrinsic structural disorder (ISD) than nonoverlapping genes (39% vs. 25% in our viral dataset). This might be because of the intrinsic properties of the genetic code, because one member per pair was recently born de novo in a process that favors high ISD, or because high ISD relieves increased evolutionary constraint imposed by dual-coding. Here, we quantify the relative contributions of these three alternative hypotheses. We estimate that the recency of de novo gene birth explains [Formula: see text] or more of the elevation in ISD in overlapping regions of viral genes. While the two reading frames within a same-strand overlapping gene pair have markedly different ISD tendencies that must be controlled for, their effects cancel out to make no net contribution to ISD. The remaining elevation of ISD in the older members of overlapping gene pairs, presumed due to the need to alleviate evolutionary constraint, was already present prior to the origin of the overlap. Same-strand overlapping gene birth events can occur in two different frames, favoring high ISD either in the ancestral gene or in the novel gene; surprisingly, most de novo gene birth events contained completely within the body of an ancestral gene favor high ISD in the ancestral gene (23 phylogenetically independent events vs. 1). This can be explained by mutation bias favoring the frame with more start codons and fewer stop codons.
Collapse
Affiliation(s)
- Sara Willis
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona 85721
| |
Collapse
|
229
|
Bekpen C, Xie C, Tautz D. Dealing with the adaptive immune system during de novo evolution of genes from intergenic sequences. BMC Evol Biol 2018; 18:121. [PMID: 30075701 PMCID: PMC6091031 DOI: 10.1186/s12862-018-1232-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 07/16/2018] [Indexed: 12/26/2022] Open
Abstract
Background The adaptive immune system of vertebrates has an extraordinary potential to sense and neutralize foreign antigens entering the body. De novo evolution of genes implies that the genome itself expresses novel antigens from intergenic sequences which could cause a problem with this immune system. Peptides from these novel proteins could be presented by the major histocompatibility complex (MHC) receptors to the cell surface and would be recognized as foreign. The respective cells would then be attacked and destroyed, or would cause inflammatory responses. Hence, de novo expressed peptides have to be introduced to the immune system as being self-peptides to avoid such autoimmune reactions. The regulation of the distinction between self and non-self starts during embryonic development, but continues late into adulthood. It is mostly mediated by specialized cells in the thymus, but can also be conveyed in peripheral tissues, such as the lymph nodes and the spleen. The self-antigens need to be exposed to the reactive T-cells, which requires the expression of the genes in the respective tissues. Since the initial activation of a promotor for new intergenic transcription of a de novo gene could occur in any tissue, we should expect that the evolutionary establishment of a de novo gene in animals with an adaptive immune system should also involve expression in at least one of the tissues that confer self-recognition. Results We have studied this question by analyzing the transcriptomes of multiple tissues from young mice in three closely related natural populations of the house mouse (M. m. domesticus). We find that new intergenic transcription occurs indeed mostly in only a single tissue. When a second tissue becomes involved, thymus and spleen are significantly overrepresented. Conclusions We conclude that the inclusion of de novo transcripts in the processes for the induction of self-tolerance is indeed an important step in the evolution of functional de novo genes in vertebrates. Electronic supplementary material The online version of this article (10.1186/s12862-018-1232-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cemalettin Bekpen
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Chen Xie
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany
| | - Diethard Tautz
- Max-Planck Institute for Evolutionary Biology, August-Thienemannstr. 2, 24306, Plön, Germany.
| |
Collapse
|
230
|
Liao Y, Zhang X, Li B, Liu T, Chen J, Bai Z, Wang M, Shi J, Walling JG, Wing RA, Jiang J, Chen M. Comparison of Oryza sativa and Oryza brachyantha Genomes Reveals Selection-Driven Gene Escape from the Centromeric Regions. THE PLANT CELL 2018; 30:1729-1744. [PMID: 29967288 PMCID: PMC6139686 DOI: 10.1105/tpc.18.00163] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 05/23/2018] [Accepted: 06/28/2018] [Indexed: 05/03/2023]
Abstract
Centromeres are dynamic chromosomal regions, and the genetic and epigenetic environment of the centromere is often regarded as oppressive to protein-coding genes. Here, we used comparative genomic and phylogenomic approaches to study the evolution of centromeres and centromere-linked genes in the genus Oryza We report a 12.4-Mb high-quality BAC-based pericentromeric assembly for Oryza brachyantha, which diverged from cultivated rice (Oryza sativa) ∼15 million years ago. The synteny analyses reveal seven medium (>50 kb) pericentric inversions in O. sativa and 10 in O. brachyantha Of these inversions, three resulted in centromere movement (Chr1, Chr7, and Chr9). Additionally, we identified a potential centromere-repositioning event, in which the ancestral centromere on chromosome 12 in O. brachyantha jumped ∼400 kb away, possibly mediated by a duplicated transposition event (>28 kb). More strikingly, we observed an excess of syntenic gene loss at and near the centromeric regions (P < 2.2 × 10-16). Most (33/47) of the missing genes moved to other genomic regions; therefore such excess could be explained by the selective loss of the copy in or near centromeric regions after gene duplication. The pattern of gene loss immediately adjacent to centromeric regions suggests centromere chromatin dynamics (e.g., spreading or microrepositioning) may drive such gene loss.
Collapse
Affiliation(s)
- Yi Liao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xuemei Zhang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Bo Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Tieyan Liu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jinfeng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Zetao Bai
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Meijiao Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinfeng Shi
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jason G Walling
- USDA-ARS-MWA-Cereal Crops Research Unit, Madison, Wisconsin 53726
| | - Rod A Wing
- Arizona Genomics Institute, School of Plant Sciences, BIO5 Institute, University of Arizona, Tucson, Arizona 85721
| | - Jiming Jiang
- Department of Horticulture, University of Wisconsin-Madison, Madison, Wisconsin 53706
- Department of Plant Biology, Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Mingsheng Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
231
|
Moyers BA, Zhang J. Toward Reducing Phylostratigraphic Errors and Biases. Genome Biol Evol 2018; 10:2037-2048. [PMID: 30060201 PMCID: PMC6105108 DOI: 10.1093/gbe/evy161] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2018] [Indexed: 01/03/2023] Open
Abstract
Phylostratigraphy is a method for estimating gene age, usually applied to large numbers of genes in order to detect nonrandom age-distributions of gene properties that could shed light on mechanisms of gene origination and evolution. However, phylostratigraphy underestimates gene age with a nonnegligible probability. The underestimation is severer for genes with certain properties, creating spurious age distributions of these properties and those correlated with these properties. Here we explore three strategies to reduce phylostratigraphic error/bias. First, we test several alternative homology detection methods (PSIBLAST, HMMER, PHMMER, OMA, and GLAM2Scan) in phylostratigraphy, but fail to find any that noticeably outperforms the commonly used BLASTP. Second, using machine learning, we look for predictors of error-prone genes to exclude from phylostratigraphy, but cannot identify reliable predictors. Finally, we remove from phylostratigraphic analysis genes exhibiting errors in simulation, which by definition minimizes error/bias if the simulation is sufficiently realistic. Using this last approach, we show that some previously reported phylostratigraphic trends (e.g., younger proteins tend to evolve more rapidly and be shorter) disappear or even reverse, reconfirming the necessity of controlling phylostratigraphic error/bias. Taken together, our analyses demonstrate that phylostratigraphic errors/biases are refractory to several potential solutions but can be controlled at least partially by the exclusion of error-prone genes identified via realistic simulations. These results are expected to stimulate the judicious use of error-aware phylostratigraphy and reevaluation of previous phylostratigraphic findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
232
|
Abstract
De novo genes are very important for evolutionary innovation. However, how these genes originate and spread remains largely unknown. To better understand this, we rigorously searched for de novo genes in Saccharomyces cerevisiae S288C and examined their spread and fixation in the population. Here, we identified 84 de novo genes in S. cerevisiae S288C since the divergence with their sister groups. Transcriptome and ribosome profiling data revealed at least 8 (10%) and 28 (33%) de novo genes being expressed and translated only under specific conditions, respectively. DNA microarray data, based on 2-fold change, showed that 87% of the de novo genes are regulated during various biological processes, such as nutrient utilization and sporulation. Our comparative and evolutionary analyses further revealed that some factors, including single nucleotide polymorphism (SNP)/indel mutation, high GC content, and DNA shuffling, contribute to the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we also provide evidence suggesting the possible parallel origin of a de novo gene between S. cerevisiae and Saccharomyces paradoxus. Together, our study provides several new insights into the origin and spread of de novo genes. Emergence of de novo genes has occurred in many lineages during evolution, but the birth, spread, and function of these genes remain unresolved. Here we have searched for de novo genes from Saccharomyces cerevisiae S288C using rigorous methods, which reduced the effects of bad annotation and genomic gaps on the identification of de novo genes. Through this analysis, we have found 84 new genes originating de novo from previously noncoding regions, 87% of which are very likely involved in various biological processes. We noticed that 10% and 33% of de novo genes were only expressed and translated under specific conditions, therefore, verification of de novo genes through transcriptome and ribosome profiling, especially from limited expression data, may underestimate the number of bona fide new genes. We further show that SNP/indel mutation, high GC content, and DNA shuffling could be involved in the birth of de novo genes, while domestication and natural selection drive the spread and fixation of these genes. Finally, we provide evidence suggesting the possible parallel origin of a new gene.
Collapse
|
233
|
Li Z, Wan X. Long-term evolutionary DNA methylation dynamic of protein-coding genes and its underlying mechanism. Gene 2018; 677:96-104. [PMID: 30031907 DOI: 10.1016/j.gene.2018.07.051] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 07/05/2018] [Accepted: 07/18/2018] [Indexed: 10/28/2022]
Abstract
DNA methylation is an important type of epigenetic modifications for the maintenance of genome functionality and stability. Although there are many studies on DNA methylation patterns, mechanisms, and functions, no study has focused on the evolutionary dynamic of DNA methylation. Here, we present the first genome-wide pattern of evolutionary DNA methylation dynamic in protein-coding genes, by grouping the Arabidopsis thaliana protein-coding genes into several conservation levels representing different evolutionary ages, and by investigating their DNA methylation features for three methylation contexts in both genic and flanking regions. The main results include: in a long-term evolutionary period, (1) genic CHG and CHH methylation levels tend to be decreased over time, which is mainly due to the reductions in the number of siRNA target sites in genes; (2) genic CG methylation levels are firstly reduced and then increased on average over evolutionary time, which is the interactional result of increased proportion and decreased CG methylation level of CG methylated genes; and (3) increased gene length and the stochastic methylation mechanism in CG context may further account for genic CG methylation trend in evolution. The diverse DNA methylation mechanisms in different contexts, together with altered gene length in evolution, could interpret the methylation dynamic of protein-coding genes over evolutionary time. This evolutionary perspective provides a dynamic understanding of the intrinsic relationship between DNA methylation and its functional and evolutionary effects on the genomes.
Collapse
Affiliation(s)
- Ziwen Li
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China
| | - Xiangyuan Wan
- Biology and Agriculture Research Center, School of Chemistry and Biological Engineering, University of Science and Technology Beijing, Beijing 100024, China; Beijing Engineering Laboratory of Main Crop Bio-Tech Breeding, Beijing International Science and Technology Cooperation Base of Biotechnology Breeding, Beijing Solidwill Sci-Tech Co. Ltd., Beijing 100192, China.
| |
Collapse
|
234
|
Jiang M, Dong X, Lang H, Pang W, Zhan Z, Li X, Piao Z. Mining of Brassica-Specific Genes (BSGs) and Their Induction in Different Developmental Stages and under Plasmodiophora brassicae Stress in Brassica rapa. Int J Mol Sci 2018; 19:ijms19072064. [PMID: 30012965 PMCID: PMC6073354 DOI: 10.3390/ijms19072064] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 06/29/2018] [Accepted: 07/13/2018] [Indexed: 11/16/2022] Open
Abstract
Orphan genes, also called lineage-specific genes (LSGs), are important for responses to biotic and abiotic stresses, and are associated with lineage-specific structures and biological functions. To date, there have been no studies investigating gene number, gene features, or gene expression patterns of orphan genes in Brassica rapa. In this study, 1540 Brassica-specific genes (BSGs) and 1824 Cruciferae-specific genes (CSGs) were identified based on the genome of Brassica rapa. The genic features analysis indicated that BSGs and CSGs possessed a lower percentage of multi-exon genes, higher GC content, and shorter gene length than evolutionary-conserved genes (ECGs). In addition, five types of BSGs were obtained and 145 out of 529 real A subgenome-specific BSGs were verified by PCR in 51 species. In silico and semi-qPCR, gene expression analysis of BSGs suggested that BSGs are expressed in various tissue and can be induced by Plasmodiophora brassicae. Moreover, an A/C subgenome-specific BSG, BSGs1, was specifically expressed during the heading stage, indicating that the gene might be associated with leafy head formation. Our results provide valuable biological information for studying the molecular function of BSGs for Brassica-specific phenotypes and biotic stress in B. rapa.
Collapse
Affiliation(s)
- Mingliang Jiang
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Xiangshu Dong
- School of Agriculture, Yunnan University, Kunming 650504, China.
| | - Hong Lang
- Key Laboratory of Northeast Rice Biology and Breeding, Ministry of Agriculture, Rice Research Institute, Shenyang Agricultural University, Shenyang 110866, China.
| | - Wenxing Pang
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Zongxiang Zhan
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Xiaonan Li
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| | - Zhongyun Piao
- College of Horticulture, Shenyang Agricultural University, #120 Dongling Road, Shenyang 110866, China.
| |
Collapse
|
235
|
Klasberg S, Bitard-Feildel T, Callebaut I, Bornberg-Bauer E. Origins and structural properties of novel and de novo protein domains during insect evolution. FEBS J 2018; 285:2605-2625. [PMID: 29802682 DOI: 10.1111/febs.14504] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 04/12/2018] [Accepted: 05/11/2018] [Indexed: 12/11/2022]
Abstract
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.
Collapse
Affiliation(s)
- Steffen Klasberg
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| | - Tristan Bitard-Feildel
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University Muenster, Germany
| |
Collapse
|
236
|
Diversity and evolution of the emerging Pandoraviridae family. Nat Commun 2018; 9:2285. [PMID: 29891839 PMCID: PMC5995976 DOI: 10.1038/s41467-018-04698-4] [Citation(s) in RCA: 84] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 05/17/2018] [Indexed: 02/02/2023] Open
Abstract
With DNA genomes reaching 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infecting pandoraviruses remained up to now the most complex viruses since their discovery in 2013. Our isolation of three new strains from distant locations and environments is now used to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses reveals many non-coding transcripts and significantly reduces the former set of predicted protein-coding genes. Here we show that the pandoraviruses exhibit an open pan-genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggest that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes. Giant viruses are visible by light microscopy and have unusually long genomes. Here, the authors report three new members of the Pandoraviridae family and investigate their evolution and diversity.
Collapse
|
237
|
|
238
|
Jerlström Hultqvist J, Warsi O, Söderholm A, Knopp M, Eckhard U, Vorontsov E, Selmer M, Andersson DI. A bacteriophage enzyme induces bacterial metabolic perturbation that confers a novel promiscuous function. Nat Ecol Evol 2018; 2:1321-1330. [PMID: 29807996 DOI: 10.1038/s41559-018-0568-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Accepted: 05/02/2018] [Indexed: 11/09/2022]
Abstract
One key concept in the evolution of new functions is the ability of enzymes to perform promiscuous side-reactions that serve as a source of novelty that may become beneficial under certain conditions. Here, we identify a mechanism where a bacteriophage-encoded enzyme introduces novelty by inducing expression of a promiscuous bacterial enzyme. By screening for bacteriophage DNA that rescued an auxotrophic Escherichia coli mutant carrying a deletion of the ilvA gene, we show that bacteriophage-encoded S-adenosylmethionine (SAM) hydrolases reduce SAM levels. Through this perturbation of bacterial metabolism, expression of the promiscuous bacterial enzyme MetB is increased, which in turn complements the absence of IlvA. These results demonstrate how foreign DNA can increase the metabolic capacity of bacteria, not only by transfer of bona fide new genes, but also by bringing cryptic bacterial functions to light via perturbations of cellular physiology.
Collapse
Affiliation(s)
- Jon Jerlström Hultqvist
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden. .,Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
| | - Omar Warsi
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Annika Söderholm
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Michael Knopp
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Ulrich Eckhard
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Egor Vorontsov
- Proteomics Core Facility at Sahlgrenska Academy, Gothenburg University, Gothenburg, Sweden
| | - Maria Selmer
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.
| | - Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
239
|
Banerjee S, Chakraborty S. Protein intrinsic disorder negatively associates with gene age in different eukaryotic lineages. MOLECULAR BIOSYSTEMS 2018; 13:2044-2055. [PMID: 28783193 DOI: 10.1039/c7mb00230k] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The emergence of new protein-coding genes in a specific lineage or species provides raw materials for evolutionary adaptations. Until recently, the biology of new genes emerging particularly from non-genic sequences remained unexplored. Although the new genes are subjected to variable selection pressure and face rapid deletion, some of them become functional and are retained in the gene pool. To acquire functional novelties, new genes often get integrated into the pre-existing ancestral networks. However, the mechanism by which young proteins acquire novel interactions remains unanswered till date. Since structural orientation contributes hugely to the mode of proteins' physical interactions, in this regard, we put forward an interesting question - Do new genes encode proteins with stable folds? Addressing the question, we demonstrated that the intrinsic disorder inversely correlates with the evolutionary gene ages - i.e. young proteins are richer in intrinsic disorder than the ancient ones. We further noted that young proteins, which are initially poorly connected hubs, prefer to be structurally more disordered than well-connected ancient proteins. The phenomenon strikingly defies the usual trend of well-connected proteins being highly disordered in structure. We justified that structural disorder might help poorly connected young proteins to undergo promiscuous interactions, which provides the foundation for novel protein interactions. The study focuses on the evolutionary perspectives of young proteins in the light of structural adaptations.
Collapse
Affiliation(s)
- Sanghita Banerjee
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| | | |
Collapse
|
240
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Scherer S, Neuhaus K. The Novel Anaerobiosis-Responsive Overlapping Gene ano Is Overlapping Antisense to the Annotated Gene ECs2385 of Escherichia coli O157:H7 Sakai. Front Microbiol 2018; 9:931. [PMID: 29867840 PMCID: PMC5960689 DOI: 10.3389/fmicb.2018.00931] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 04/23/2018] [Indexed: 12/26/2022] Open
Abstract
Current notion presumes that only one protein is encoded at a given bacterial genetic locus. However, transcription and translation of an overlapping open reading frame (ORF) of 186 bp length were discovered by RNAseq and RIBOseq experiments. This ORF is almost completely embedded in the annotated L,D-transpeptidase gene ECs2385 of Escherichia coli O157:H7 Sakai in the antisense reading frame -3. The ORF is transcribed as part of a bicistronic mRNA, which includes the annotated upstream gene ECs2384, encoding a murein lipoprotein. The transcriptional start site of the operon resides 38 bp upstream of the ECs2384 start codon and is driven by a predicted σ70 promoter, which is constitutively active under different growth conditions. The bicistronic operon contains a ρ-independent terminator just upstream of the novel gene, significantly decreasing its transcription. The novel gene can be stably expressed as an EGFP-fusion protein and a translationally arrested mutant of ano, unable to produce the protein, shows a growth advantage in competitive growth experiments compared to the wild type under anaerobiosis. Therefore, the novel antisense overlapping gene is named ano (anaerobiosis responsive overlapping gene). A phylostratigraphic analysis indicates that ano originated very recently de novo by overprinting after the Escherichia/Shigella clade separated from other enterobacteria. Therefore, ano is one of the very rare cases of overlapping genes known in the genus Escherichia.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany
| | | | - Siegfried Scherer
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Institute for Food & Health, Technical University of Munich, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Technical University of Munich, Freising, Germany.,Core Facility Microbiome/NGS, Institute for Food & Health, Technical University of Munich, Freising, Germany
| |
Collapse
|
241
|
Abstract
Phylostratigraphy, originally designed for gene age estimation by BLAST-based protein homology searches of sequenced genomes, has been widely used for studying patterns and inferring mechanisms of gene origination and evolution. We previously showed by computer simulation that phylostratigraphy underestimates gene age for a nonnegligible fraction of genes and that the underestimation is severer for genes with certain properties such as fast evolution and short protein sequences. Consequently, many previously reported age distributions of gene properties may have been methodological artifacts rather than biological realities. Domazet-Lošo and colleagues recently argued that our simulations were flawed and that phylostratigraphic bias does not impact inferences about gene emergence and evolution. Here we discuss conceptual difficulties of phylostratigraphy, identify numerous problems in Domazet-Lošo et al.’s argument, reconfirm phylostratigraphic error using simulations suggested by Domazet-Lošo and colleagues, and demonstrate that a phylostratigraphic trend claimed to be robust to error disappears when genes likely to be error-resistant are analyzed. We conclude that extreme caution is needed in interpreting phylostratigraphic results because of the inherent biases of the method and that reanalysis using genes exhibiting no error in realistic simulations may help reduce spurious findings.
Collapse
Affiliation(s)
- Bryan A Moyers
- HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|
242
|
Yeasmin F, Yada T, Akimitsu N. Micropeptides Encoded in Transcripts Previously Identified as Long Noncoding RNAs: A New Chapter in Transcriptomics and Proteomics. Front Genet 2018; 9:144. [PMID: 29922328 PMCID: PMC5996887 DOI: 10.3389/fgene.2018.00144] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 04/09/2018] [Indexed: 11/13/2022] Open
Abstract
Integrative analysis using omics-based technologies results in the identification of a large number of putative short open reading frames (sORFs) with protein-coding capacity within transcripts previously identified as long noncoding RNAs (lncRNAs) or transcripts of unknown function (TUFs). sORFs were previously overlooked because of their diminutive size and the difficulty of identification by bioinformatics analyses. There is now growing evidence of the existence of potentially functional micropeptides produced from sORFs within cells of diverse species. Recent characterization of a few of these revealed their significant divergent roles in many fundamental biological processes, where some also show important relationships with pathogenesis. Recent works therefore provide new insights for exploring the wealth of information that may lie within sORF-encoded short proteins. Here, we summarize the current progress and view of micropeptides encoded in sORFs of protein-coding genes.
Collapse
Affiliation(s)
- Fouzia Yeasmin
- Isotope Science Centre, The University of Tokyo, Tokyo, Japan
| | - Tetsushi Yada
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | | |
Collapse
|
243
|
Forterre P. Viruses in the 21st Century: From the Curiosity-Driven Discovery of Giant Viruses to New Concepts and Definition of Life. Clin Infect Dis 2018; 65:S74-S79. [PMID: 28859344 DOI: 10.1093/cid/cix349] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The curiosity-driven discovery of giant DNA viruses infecting amoebas has triggered an intense debate about the origin, nature, and definition of viruses. This discovery was delayed by the current paradigm confusing viruses with small virions. Several new definitions and concepts have been proposed either to reconcile the unique features of giant viruses with previous paradigms or to propose a completely new vision of the living world. I briefly review here how several other lines of research in virology converged during the last 2 decades with the discovery of giant viruses to change our traditional perception of the viral world. This story emphasizes the power of multidisciplinary curiosity-driven research, from the hospital to the field and the laboratory. Notably, some philosophers have now also joined biologists in their quest to make sense of the abundance and diversity of viruses and related capsidless mobile elements in the biosphere.
Collapse
Affiliation(s)
- Patrick Forterre
- Institut Pasteur, Département de Microbiologie, Paris; and Institut Intégré de Biologie Cellulaire, Département de Microbiologie, Centre National de la Recherche Scientifique, Université Paris-Saclay, France
| |
Collapse
|
244
|
Hartmann FE, Rodríguez de la Vega RC, Brandenburg JT, Carpentier F, Giraud T. Gene Presence-Absence Polymorphism in Castrating Anther-Smut Fungi: Recent Gene Gains and Phylogeographic Structure. Genome Biol Evol 2018; 10:1298-1314. [PMID: 29722826 PMCID: PMC5967549 DOI: 10.1093/gbe/evy089] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/30/2018] [Indexed: 12/14/2022] Open
Abstract
Gene presence-absence polymorphisms segregating within species are a significant source of genetic variation but have been little investigated to date in natural populations. In plant pathogens, the gain or loss of genes encoding proteins interacting directly with the host, such as secreted proteins, probably plays an important role in coevolution and local adaptation. We investigated gene presence-absence polymorphism in populations of two closely related species of castrating anther-smut fungi, Microbotryum lychnidis-dioicae (MvSl) and M. silenes-dioicae (MvSd), from across Europe, on the basis of Illumina genome sequencing data and high-quality genome references. We observed presence-absence polymorphism for 186 autosomal genes (2% of all genes) in MvSl, and only 51 autosomal genes in MvSd. Distinct genes displayed presence-absence polymorphism in the two species. Genes displaying presence-absence polymorphism were frequently located in subtelomeric and centromeric regions and close to repetitive elements, and comparison with outgroups indicated that most were present in a single species, being recently acquired through duplications in multiple-gene families. Gene presence-absence polymorphism in MvSl showed a phylogeographic structure corresponding to clusters detected based on SNPs. In addition, gene absence alleles were rare within species and skewed toward low-frequency variants. These findings are consistent with a deleterious or neutral effect for most gene presence-absence polymorphism. Some of the observed gene loss and gain events may however be adaptive, as suggested by the putative functions of the corresponding encoded proteins (e.g., secreted proteins) or their localization within previously identified selective sweeps. The adaptive roles in plant and anther-smut fungi interactions of candidate genes however need to be experimentally tested in future studies.
Collapse
Affiliation(s)
- Fanny E Hartmann
- Department Génétique et Ecologie Evolutives, Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Ricardo C Rodríguez de la Vega
- Department Génétique et Ecologie Evolutives, Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Jean-Tristan Brandenburg
- Department Génétique et Ecologie Evolutives, Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Fantin Carpentier
- Department Génétique et Ecologie Evolutives, Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| | - Tatiana Giraud
- Department Génétique et Ecologie Evolutives, Ecologie Systématique Evolution, Bâtiment 360, Univ. Paris-Sud, AgroParisTech, CNRS, Université Paris-Saclay, Orsay, France
| |
Collapse
|
245
|
Hartmann FE, Croll D. Distinct Trajectories of Massive Recent Gene Gains and Losses in Populations of a Microbial Eukaryotic Pathogen. Mol Biol Evol 2018; 34:2808-2822. [PMID: 28981698 PMCID: PMC5850472 DOI: 10.1093/molbev/msx208] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Differences in gene content are a significant source of variability within species and have an impact on phenotypic traits. However, little is known about the mechanisms responsible for the most recent gene gains and losses. We screened the genomes of 123 worldwide isolates of the major pathogen of wheat Zymoseptoria tritici for robust evidence of gene copy number variation. Based on orthology relationships in three closely related fungi, we identified 599 gene gains and 1,024 gene losses that have not yet reached fixation within the focal species. Our analyses of gene gains and losses segregating in populations showed that gene copy number variation arose preferentially in subtelomeres and in proximity to transposable elements. Recently lost genes were enriched in virulence factors and secondary metabolite gene clusters. In contrast, recently gained genes encoded mostly secreted protein lacking a conserved domain. We analyzed the frequency spectrum at loci segregating a gene presence–absence polymorphism in four worldwide populations. Recent gene losses showed a significant excess in low-frequency variants compared with genome-wide single nucleotide polymorphism, which is indicative of strong negative selection against gene losses. Recent gene gains were either under weak negative selection or neutral. We found evidence for strong divergent selection among populations at individual loci segregating a gene presence–absence polymorphism. Hence, gene gains and losses likely contributed to local adaptation. Our study shows that microbial eukaryotes harbor extensive copy number variation within populations and that functional differences among recently gained and lost genes led to distinct evolutionary trajectories.
Collapse
Affiliation(s)
- Fanny E Hartmann
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
246
|
Lu TC, Leu JY, Lin WC. A Comprehensive Analysis of Transcript-Supported De Novo Genes in Saccharomyces sensu stricto Yeasts. Mol Biol Evol 2018; 34:2823-2838. [PMID: 28981695 PMCID: PMC5850716 DOI: 10.1093/molbev/msx210] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Novel genes arising from random DNA sequences (de novo genes) have been suggested to be widespread in the genomes of different organisms. However, our knowledge about the origin and evolution of de novo genes is still limited. To systematically understand the general features of de novo genes, we established a robust pipeline to analyze >20,000 transcript-supported coding sequences (CDSs) from the budding yeast Saccharomyces cerevisiae. Our analysis pipeline combined phylogeny, synteny, and sequence alignment information to identify possible orthologs across 20 Saccharomycetaceae yeasts and discovered 4,340 S. cerevisiae-specific de novo genes and 8,871 S. sensu stricto-specific de novo genes. We further combine information on CDS positions and transcript structures to show that >65% of de novo genes arose from transcript isoforms of ancient genes, especially in the upstream and internal regions of ancient genes. Fourteen identified de novo genes with high transcript levels were chosen to verify their protein expressions. Ten of them, including eight transcript isoform-associated CDSs, showed translation signals and five proteins exhibited specific cytosolic localizations. Our results suggest that de novo genes frequently arise in the S. sensu stricto complex and have the potential to be quickly integrated into ancient cellular network.
Collapse
Affiliation(s)
- Tzu-Chiao Lu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | - Jun-Yi Leu
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan
| | - Wen-Chang Lin
- Graduate Institute of Life Sciences, National Defense Medical Center, Taipei, Taiwan.,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
247
|
Translation of neutrally evolving peptides provides a basis for de novo gene evolution. Nat Ecol Evol 2018; 2:890-896. [DOI: 10.1038/s41559-018-0506-6] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 02/16/2018] [Indexed: 01/29/2023]
|
248
|
Baalsrud HT, Tørresen OK, Solbakken MH, Salzburger W, Hanel R, Jakobsen KS, Jentoft S. De Novo Gene Evolution of Antifreeze Glycoproteins in Codfishes Revealed by Whole Genome Sequence Data. Mol Biol Evol 2018; 35:593-606. [PMID: 29216381 PMCID: PMC5850335 DOI: 10.1093/molbev/msx311] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
New genes can arise through duplication of a pre-existing gene or de novo from non-coding DNA, providing raw material for evolution of new functions in response to a changing environment. A prime example is the independent evolution of antifreeze glycoprotein genes (afgps) in the Arctic codfishes and Antarctic notothenioids to prevent freezing. However, the highly repetitive nature of these genes complicates studies of their organization. In notothenioids, afgps evolved from an extant gene, yet the evolutionary origin of afgps in codfishes is unknown. Here, we demonstrate that afgps in codfishes have evolved de novo from non-coding DNA 13-18 Ma, coinciding with the cooling of the Northern Hemisphere. Using whole-genome sequence data from several codfishes and notothenioids, we find higher copy number of afgp in species exposed to more severe freezing suggesting a gene dosage effect. Notably, antifreeze function is lost in one lineage of codfishes analogous to the afgp losses in non-Antarctic notothenioids. This indicates that selection can eliminate the antifreeze function when freezing is no longer imminent. In addition, we show that evolution of afgp-assisting antifreeze potentiating protein genes (afpps) in notothenioids coincides with origin and lineage-specific losses of afgp. The origin of afgps in codfishes is one of the first examples of an essential gene born from non-coding DNA in a non-model species. Our study underlines the power of comparative genomics to uncover past molecular signatures of genome evolution, and further highlights the impact of de novo gene origin in response to a changing selection regime.
Collapse
Affiliation(s)
- Helle Tessand Baalsrud
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Ole Kristian Tørresen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Monica Hongrø Solbakken
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Walter Salzburger
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
- Zoological Institute, University of Basel, Basel, Switzerland
| | - Reinhold Hanel
- Institute of Fisheries Ecology, Johann Heinrich von Thünen Institute, Federal Research Institute for Rural Areas, Forestry and Fisheries, Hamburg, Germany
| | - Kjetill S Jakobsen
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| | - Sissel Jentoft
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, Oslo, Norway
| |
Collapse
|
249
|
Fouché S, Plissonneau C, Croll D. The birth and death of effectors in rapidly evolving filamentous pathogen genomes. Curr Opin Microbiol 2018; 46:34-42. [PMID: 29455143 DOI: 10.1016/j.mib.2018.01.020] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 01/15/2018] [Accepted: 01/31/2018] [Indexed: 11/19/2022]
Abstract
Plant pathogenic fungi and oomycetes are major risks to food security due to their evolutionary success in overcoming plant defences. Pathogens produce effectors to interfere with host defences and metabolism. These effectors are often encoded in rapidly evolving compartments of the genome. We review how effector genes emerged and were lost in pathogen genomes drawing on the links between effector evolution and chromosomal rearrangements. Some new effectors entered pathogen genomes via horizontal transfer or introgression. However, new effector functions also arose through gene duplication or from previously non-coding sequences. The evolutionary success of an effector is tightly linked to its transcriptional regulation during host colonization. Some effectors converged on an epigenetic control of expression imposed by genomic defences against transposable elements. Transposable elements were also drivers of effector diversification and loss that led to mosaics in effector presence-absence variation. Such effector mosaics within species was the foundation for rapid pathogen adaptation.
Collapse
Affiliation(s)
- Simone Fouché
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland
| | - Clémence Plissonneau
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland; UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Avenue Lucien Bretignières, BP 01, Thiverval-Grignon F-78850, France
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, CH-2000 Neuchâtel, Switzerland.
| |
Collapse
|
250
|
Hücker SM, Vanderhaeghen S, Abellan-Schneyder I, Wecko R, Simon S, Scherer S, Neuhaus K. A novel short L-arginine responsive protein-coding gene (laoB) antiparallel overlapping to a CadC-like transcriptional regulator in Escherichia coli O157:H7 Sakai originated by overprinting. BMC Evol Biol 2018; 18:21. [PMID: 29433444 PMCID: PMC5810103 DOI: 10.1186/s12862-018-1134-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 01/31/2018] [Indexed: 11/10/2022] Open
Abstract
Background Due to the DNA triplet code, it is possible that the sequences of two or more protein-coding genes overlap to a large degree. However, such non-trivial overlaps are usually excluded by genome annotation pipelines and, thus, only a few overlapping gene pairs have been described in bacteria. In contrast, transcriptome and translatome sequencing reveals many signals originated from the antisense strand of annotated genes, of which we analyzed an example gene pair in more detail. Results A small open reading frame of Escherichia coli O157:H7 strain Sakai (EHEC), designated laoB (L-arginine responsive overlapping gene), is embedded in reading frame −2 in the antisense strand of ECs5115, encoding a CadC-like transcriptional regulator. This overlapping gene shows evidence of transcription and translation in Luria-Bertani (LB) and brain-heart infusion (BHI) medium based on RNA sequencing (RNAseq) and ribosomal-footprint sequencing (RIBOseq). The transcriptional start site is 289 base pairs (bp) upstream of the start codon and transcription termination is 155 bp downstream of the stop codon. Overexpression of LaoB fused to an enhanced green fluorescent protein (EGFP) reporter was possible. The sequence upstream of the transcriptional start site displayed strong promoter activity under different conditions, whereas promoter activity was significantly decreased in the presence of L-arginine. A strand-specific translationally arrested mutant of laoB provided a significant growth advantage in competitive growth experiments in the presence of L-arginine compared to the wild type, which returned to wild type level after complementation of laoB in trans. A phylostratigraphic analysis indicated that the novel gene is restricted to the Escherichia/Shigella clade and might have originated recently by overprinting leading to the expression of part of the antisense strand of ECs5115. Conclusions Here, we present evidence of a novel small protein-coding gene laoB encoded in the antisense frame −2 of the annotated gene ECs5115. Clearly, laoB is evolutionarily young and it originated in the Escherichia/Shigella clade by overprinting, a process which may cause the de novo evolution of bacterial genes like laoB. Electronic supplementary material The online version of this article (10.1186/s12862-018-1134-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarah M Hücker
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Fraunhofer ITEM-R, Am Biopark 9, 93053, Regensburg, Germany
| | - Sonja Vanderhaeghen
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Isabel Abellan-Schneyder
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Romy Wecko
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Svenja Simon
- Department of Computer and Information Science, University of Konstanz, Box 78, 78457, Konstanz, Germany
| | - Siegfried Scherer
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.,ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany
| | - Klaus Neuhaus
- Chair for Microbial Ecology, Wissenschaftszentrum Weihenstephan, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany. .,Core Facility Microbiome/NGS, ZIEL - Institute for Food & Health, Technische Universität München, Weihenstephaner Berg 3, 85354, Freising, Germany.
| |
Collapse
|